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Criminal Behavior of Discharged Mental Patients: 
A Critical Appraisal of the Research 


Judith Godwin Rabkin 


А Epidemiology of Mental Disorders Research Unit 
- А Е Psychiatric Institute, New York, New York 
' 5 New York State Office of Mental Health 
Continuing debate on the dangerousness of mental patients between mental 
health representatives and community members justifies a critical appraisal of 
(7 the available evidence. Included for review are epidemiological, prospective 
T studies of arrests and convictions among discharged mental patients in com- 
i " patison: to arrestsvand convictions of the general public and their change over 
- time. Patients with arrest records prior to hospitalization were found to have 
arrest rates aftey discharge that far exceeded those of the general public or of. - 
othér patients: As the number of patients with priqr records has increased over ~ 
. time, postdischarge rates for patients considered as a single group have iñ- c 
creased accordingly, although patients without prior records continue to have 
. postdischarge arrest rates equal to or lower than those of the general public. 


For at least 25 years, investigators of pub- the tenets of the community mental health 
lic attitudes toward mental illness have re- movement and have adopted policies pro- 
ported that most people dislike, distrust, and moting community-based treatment, negative 
fear the mentally ill. Although it has become public attitudes about mental illness and men- 
less socially acceptable to acknowledge such tal patients have served as a real and per- 
attitudes today, they continue to prevail sistent obstacle to the fulfillment of these 
(Rabkin, 1974, 1976). When these attitude goals. Communities resist the placement in 
ktudies were first conducted during the late hotels and boarding houses of older chronic 
40s and early 1950s, interest was moti- patients discharged after years of hospitaliza- 
vated by the desire of mental health profes- tion, often on the grounds that such ex-pa- 
sionals to encourage troubled people to seek tients are unsupervised, uncontrolled, and un- 
‘psychiatric help earlier in the course of their suitable neighbors. At the same time, local 
difficulties and to ease the reentry into the treatment facilities for both chronic and 

community of patients released from distant younger acute patients are resisted for fear | 

mental hospitals. More recently, as state and that additional mentally ill people will be at- | 
local mental health agencies have accepted tracted to the neighborhood, thus compound- 
= ing the situation. Such resistance has been 
vocal, effective, and widespread, leading to 
Requests for reprints should be sent to Judith God- the passage of municipal ordinances and legal 

| 


win Rabkin, Epidemiology of Mental Disorders Re- Е 2 

search Unit, Psychiatric Institute, 722 West 168th barriers to ne establishment of local fa- : 

Street; New York, New York. 10032, cilities (Aviram & Segal, 1973; Segal & 
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и "Ayirgm,. 1976)*t. has been estimated that 

/*"*for every community program that is estab- 
Jishéd and continues ‘to operate, another has 
been ‘prohibited or closed because of com- 
munity - opposition" (Piasecki, Note 1). 
Clearly, community opposition continues to 

.. be a major problem for those charged with 

-..the.task"6f opening new community facilities 
to treat the mentally ill and retarded. 

Investigators have defined two major 
sources of apprehensiveness that seem largely 
to account for the pervasiveness and tenacity 
of public concerns: People perceive mental pa- 
tients as both unpredictable and dangerous. 
Unpredictable behavior is always unsettling 
to social groups because order and stability 
are threatened. When people are identified 
as unpredictable, they lose credibility and 
social standing, and others eventually try to 
avoid them (Cumming & Cumming, 1965; 
Nunnally, 1961). However, the most in- 
fluential factor behind negative attitudes is 
the perception of the mental patient as dan- 
gerous (Cocozza & Steadman, Note 2). This 
is the cornerstone of public apprehensions and 
a crucial stumbling block in communications 
between members of the public and repre- 
sentatives for mental health groups. 

It is widely believed by community mem- 
bers that mental patients are likely to dis- 
play impulsive, violent, assaultive, and other- 
wise socially disruptive behavior. In public 
meetings held to consider the applications of 
new psychiatric facilities in the neighbor- 
hood, many community speakers openly ex- 
press such fears. The vast majority of mental 
health professionals respond by declaring that 
Such fears are groundless, that mental pa- 
tients are actually less likely to commit 
crimes than other people, and that the local 
opponents are speaking from prejudice rather 
than fact. Indeed, in such settings, speakers 
for the mental health establishment have be- 
come quite eloquent in condemning the un- 
charitable and reactionary attitudes of the 
communities that have expressed resistance 
to new psychiatric facilities. 

Most mental health professionals do be- 
lieve that mental patients have relatively 
few encounters with the law and that on the 
infrequent occasions when they are arrested, 


` 


it is for the minor offenses such as vagrancy, 
loitering, or public intoxication, It is further- 
more assumed that such charges stem largely 
from socially inept and unacceptable behavior 
like urinating on lampposts or wandering aim- 
lessly in the street rather than from purposive 
criminal acts such as robbery. Psychiatric 
textbooks, general medical magazines (Farns- 
worth, 1977), and psychiatric reviews (Gule- 
vich & Bourne, 1970) generally concur in 
their evaluation of the scientific literature on 
the criminal behavior of the mentally ill: 
This literature is regarded as sparse and in- 
consistent but is overall supportive of the 
position that patients commit fewer crimes 
than the general population. This seems to be 
the prevailing belief not only among mental 
health professionals but also among most edu- 
cated, liberal, and thoughtful people. 

Debate on these conflicting perceptions of 
the dangerousness of mental patients persists 
between mental health representatives and 
members of neighborhoods slated for the 
opening of psychiatric facilities. Neither side 
has resorted to examination of the available 
evidence, although much heat and contro- 
versy have been generated. Such a review 
seems appropriate and timely and is the ob- 
ject of the present undertaking. 

The questions to be considered concern 
the prevalence of arrests among former hos- 
pitalized patients and former outpatients 
treated for psychiatric disorders in compari- 
son to arrest rates of the general public. The 
separate literature on the prevalence of men 
tal illness among criminals is not inclu 
nor is a review of efforts to predict dan, 
ousness among mental patients, criminal de- 
fendants, or the criminally insane. The studies 
included for review are epidemiological, pro- 
spective studies, and each is addressed to 
one or more of the following questions: 

1. Do discharged mental patients cur- 
rently have higher arrest rates than members 
of the general population? (a) Have these 
rates changed over time? (b) What factors 
have contributed to such changes, if ob- 
served? 

2. What are the best predictors of post- 
discharge arrests? 
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3. What is the association between arrest 
risk and diagnostic category? 

4. Are discharged mental patients more 
likely to be arrested for certain types of 
crime? 

5. Does hospitalization reduce the probabil- 
ity of recidivism among patients with prior 
arrest records? 

After a brief review of pertinent methodo- 
logical issues, each study is presented criti- 
cally in some detail. Studies are classified as 
early or recent and are reviewed in chrono- 
logical order. Because it may be difficult to 
keep track of the details of several studies 
simultaneously, two tables have been pre- 
pared as an aid. Table 1 outlines the design 
of each study, and Table 2 presents selected 
results. Following presentation of the studies, 
cumulative findings are summarized accord- 
ing to the questions just posed, and the state 
of the evidence for each is evaluated. Finally, 
some implications are drawn with respect to 

| the relevance of these findings for efforts to 
| further develop community psychiatry pro- 
grams. 


The question under consideration here is a 
deceptively simple one: Are formerly hos- 
pitalized mental patients engaged in criminal 
activity with lesser or greater frequency than 

other people? The appropriate research de- 
“gion is necessarily a prospective one * (some- 
called a cohort design): Two groups of 
e, one with a history of psychiatric hos- 
italization and one without but otherwise 
imilar, are followed for equal periods of 
Üme to obtain for each group counts of 
police encounters, arrests, convictions, and 
| incarcerations. Because it has been claimed 
that mentally disturbed citizens are treated 
differently by the criminal justice system than 
are other defendants, a retrospective or case- 
‘control design in which the histories of 
convicted offenders are searched for evidence 
f mental illness is not as effective, although 
t is considerably simpler and less costly. The 
current review therefore focuses on prospec- 
„Буе studies. 


] Methodological Considerations 


Since arrests for criminal acts are com- 
paratively rare events and because the ex- 
pected incidence of arrests was unknown at 
the start, investigators were obliged to study 
a large number of people at risk. Among the 
studies reviewed, samples ranged from 310 
to nearly 100,000 patients discharged from 
the same or similar hospitals within a 1- or 
2-year time span. Frequency of arrests within 
a specified follow-up period ranging from 
1} to 53 years was computed from police 
records, and then annual arrest rates were 
compared with figures compiled at the catch- 
ment area, county, state, or national level 
for the general population of similar age. 
Comparative rates of arrests for patients and 
the general population served as the basis 
for conclusions about the relative incidence of 
criminal activity among discharged mental 
patients. 

A number of methodological problems be- 
come apparent in the course of reviewing 
studies in this field. Some problems, such as 
lack of equivalence in the demographic and 
psychiatric characteristics of patient samples, 
are more or less unavoidable when one con- 
siders together any series of studies con- 
ducted by different people in different places 
at different times. Moreover, such heteroge- 
neity may be regarded as an asset in the 
sense that replicated findings can be gen- 
eralized to a broader segment of the popula- 
tion of patients. Other difficulties, such as 
extrapolating rates based on few cases, are 
common to any epidemiological study of rare 
events and can be dealt with by collapsing 
categories to enlarge the number of cases 
used to generate rates and by seeking pat- 
terns rather than focusing on separate rates 
by specific category. 

Apart from such general considerations, 
there are a number of problems specific to 
the field under study that warrant mention. 


1 The term prospective is used here in the sense ad- 
vocated by Lilienfeld (1976) in which two groups, 
one with and one without a certain characteristic, are 
followed and rates of a subsequent disorder or event 
are computed for each. In this case, the two groups 
are hospitalized mental patients and some subset of 
the general public, and the events in question are ar- 
rest rates within a given follow-up period. 
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One issue concerns the differential probability 
that mentally disturbed and other people will 
be arrested, sent to jail, convicted, and in- 
carcerated. (Studies in this review excluded 
from consideration those defendants who were 
hospitalized by court order for psychiatric 
observation, as well as those classified as 
criminally insane.) It has been argued that 
convictions and even arrests are inadequate 
measures of crime among mental patients be- 


Table 2 


Results of Studies of Arrest Rates of Discharged Mental Patients: Samples, Comparative 
Rates, Overrepresented Diagnostic Groups, and Most Common Offenses 


cause the seriously disturbed defendant ami 
those with histories of hospitalization 
rehospitalized instead of arrested. Recent 
dence supports this contention. Both Le 
(1970) and Lagos, Perlmutter, and Saexingei 
(1977) found extremely high rates of violen 
or illegal behavior presented as part of 
basis for hospitalization in the admissio 
notes of randomly selected inpatients. Levin 
reported that 71 of 100 patients had com 


Arrests per 1,000 


Total arrests for assaults and 


per 1,000 homicides only 
Diagnostic and А 
sex distribution General General Overrepresented Most common 
Authors of sample Patients public Patients public diagnostic groups offenses 
Ashley (1922) 37 
Pollock (1938) 6.9 98.5 
Cohen & Freeman 26 
(1945) 
Brill & Malzberg Males only 12 49 13 38 Alcoholics, drug All гони as- 
(1962) abusers, psycho- saults and homi- 
pathic personalities cides 
Rappeport & Lassen Males only 
(1965) 
Rappeport & Lassen Females only 
(1966) 
Giovannoni & Gurel Males only; 3.27 94 
(1967) 95% chronic 
schizo- 
phrenics 
Sosowsky (Note3) — Statewide 2.968 E 
sample 
Sosowsky (Note 3) San Mateo 278 18 мыса 
County 11.78 Ai ar% | 
sample; 73% arrests 
schizo- 
phrenics 
Zitrin, Hardesty, 47% schizo- 73 1.59 Alcoholics and ad- 
Burdock, & Dross- —phrenics (U.S.) dicts (schizophren- 
man (1976) 6.26 ics had excess rates 
(local for violent crime 
with bodily harm 
but not for total 
arrests) 
Durbin, Pasewark, Маје rates 3.0 1.6 Addicts and person- Drug offenses 
& Albers (1977) only; 69% ality disorders 
alcoholics 
Steadman, Melick, 1968 dischar; 73.5 27.5 5.6 2.3 Personality disorders Drug, property, 
& Cocozza sample; 29% у and substance and violent crim 
(Note 5) schizo- abusers 
phrenics 
d 
Steadman, Melick, 1975 disch: 98.5 32.5 12.0 3.6 Paranoid states and Property, sex, anc 
Ж сое sample; 32% personality dis- violent crimes 
(Note 5) schizo- orders 
phrenics 


* Conviction rate. 


b By arrests, not persons. 


CRIMINAL BEHAVIOR 7 


mitted illegal acts, including 23 acts judged 
to be felonies, in the course of episodes lead- 
ing to their hospitalization, none of which 
were prosecuted. Similarly, Lagos and his 
colleagues found that 3696 of 321 patients 
manifested some form of violent behavior in 
the episode leading to hospitalization, of 
whom only 2.6% were prosecuted. Despite 
validity problems in these data sources, in- 
cluding the possibility that admitting per- 
sonnel unduly emphasize violence to justify 
the decision to admit, it does seem that many 
dangerous, violent, and illegal acts committed 
by distressed and distraught people lead to 
hospitalization rather than arrest. As a re- 
sult, arrest rates may underestimate іге- 
quency of violence among the mentally ill. 

Another source of error in use of arrest 
and conviction rates to represent frequency 
and type of illegal behavior of the mentally 
ill concerns the response of the criminal 
justice system. If indeed charged instead of 
hospitalized, the disturbed defendant may be 
acquitted on grounds of insanity. Further, the 
charge may be reduced. In some states, de- 
fendants accused of serious crimes (felonies) 
cannot be committed to a psychiatric hos- 
pital except for brief observation if an in- 
sanity plea is under consideration. A com- 
mon solution is the reduction of a charge to 
a lesser offense (a misdemeanor) to enable 
civil commitment of a disturbed defendant 

. (Paull & Malek, 1974). This procedure ap- 
plies only to psychotic individuals; soci- 
opathy, alcoholism, and drug abuse are not 
legally acceptable grounds for either acquittal 
or reduction of charges for commitment pur- 
poses. One result is the lowering of overall 
arrest rates for the mentally ill. Another is 
the reduction of the severity of charges 
pressed against schizophrenics, who then ap- 
pear less prone to criminal acts than non- 
psychotic diagnostic groups. 

In an effort to obtain less biased measures 
of criminal activity of mental patients, one 
may seek to enumerate all police contacts 
rather than just those culminating in arrests. 
Apart from the practical difficulties of ob- 
taining such information, however, one can- 
not equate police contact with criminal ac- 
tivity. First, most police contacts do not 


lead to arrests. The Federal Bureau of In- 
vestigation (1976) indicated that only 21% 
of index crimes? were cleared during 1976. 
A crime is cleared when law enforcement 
agencies have identifed the offender, have 
sufficient evidence to charge him or her, and 
have actually taken him or her into custody. 
Clearance rates vary widely by type of of- 
fense, ranging from 79% of murder offenses 
to 14% of motor vehicle thefts. Police clear 
a high percentage of crimes against the per- 
son, both because of the more intense in- 
vestigative effort made and because of the 
greater availability of witnesses to identify 
the perpetrator and to testify against him 
or her. In their review of dispositions of en- 
counters between discharged mental patients 
and the police, Giovannoni and Gurel (1967) 
also found that only a minority of police 
encounters culminated in arrests. Although 
such low clearance rates reflect to some ex- 
tent limitations of law enforcement resources 
and procedural constraints, they are neces- 
sarily less than 100% because several people 
may be suspected in the same case and only 
one is ultimately charged. 

Another objection to the use of police 
contacts as an index of criminal activity is 
the possibility of differential “whistle blow- 
ing" behavior for ex-mental patients and 
other citizens. Former patients may be more 
likely to be under police surveillance because 
of their known status; their social ineptnesss 
may heighten their visibility, or unfriendly 
neighbors may call the police with complaints 
for purposes of harassment. A more basic 
difficulty is the definition of criminal activity, 
which is necessarily determined only by a 
judge in court, so that police contacts may 
be interesting to study, but are insufficient 
evidence in themselves. 

A similar argument may be made with re- 
spect to arrest records. Here also, the de- 


2 Index crimes include seven offenses selected be- 
cause of their seriousness, frequency of occurrence, 
and likelihood of being reported to the police. They 
are murder, forcible rape, robbery, aggravated as- 
sault, burglary, larceny-theft, and motor vehicle theft. 
The Federal Bureau of Investigation’s annual national 
reports refer to these seven crimes only. 
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fendant may be innocent. However, it seems 
likely that a sufficient number of mental pa- 
tients are taken out of court channels before 
completion of a trial and passage of a verdict 
so that use of conviction rates alone gives a 
misleadingly low estimate of the criminal ac- 
tivity of the mentally ill (Zitrin, Hardesty, 
Burdock, & Drossman, 1976). In all of the 
studies reviewed here, with one exception, 
measures of criminal behavior were based on 
arrest rates. The exception, a pair of studies 
conducted by Sosowsky (Note 3), found con- 
viction and incarceration rates highly cor- 
related with arrest rates. 

Another problem concerns computation of 
crime rates by specific category. There is no 
national criminal code, and definitions and 
classifications of offenses vary between juris- 
dictions and over time within a given jurisdic- 
tion. A particular offense may be a felony in 
one state and a misdemeanor in another, or it 
may be reclassified as one or the other 5 
or 10 years later. It is therefore difficult to 
compare arrest rates, for example, for con- 
cealed weapons between Wyoming and New 
York. One solution is to consider broader 
categories of crime rather than specific of- 
fenses, although here the problem becomes 
what system of classification to use in de- 
veloping such categories. It may be to cir- 
cumvent such difficulties that the majority of 
recent investigators chose to emphasize the 
relative occurrence of violent crimes, defined 
as homicides plus assaults, in which the 
charges are less ambiguous, the proportion of 
arrested perpetrators is comparatively high, 
and the task of classification is less difficult. 

The foregoing issues are related to the 
task of defining and counting cases, the re- 
sults of which constitute the numerator of a 
rate. There are also problems in selecting an 
appropriate denominator, which is defined as 
all cases at risk for the event in the numer- 
ator. Almost all investigators have computed 
the denominators of their patient rates as 
the number of discharged patients in their 
sample. Because of the large sample sizes 
used in most studies, the size of error in 
using this denominator is probably minor, 
especially if the sample is not broken down 
into subgroups. However, some patients die, 


move to other states, or spend time away 
from the community in jails, mental or medi- 
cal hospitals, nursing homes, or elsewhere. In 
each instance, they are no longer “at risk" for 
arrest. In only one study (by Giovannoni & 
Gurel, 1967) was an effort made to determine 
the actual population at risk; others overesti- 
mated the denominator size and thus ob- 
tained lower arrest rates. Another prevalent 
error is the inclusion of patient arrests in 
general population figures derived from Fed- 
eral Bureau of Investigation (FBI) records. 
Only Sosowsky (Note 3) excluded treatment 
patients from his population arrest rates. 

An additional source of error is lack of 
genuine equivalence between patient and gen- 
eral population groups. No one has ever 
claimed that state hospital patients (who con- 
stitute the vast majority of patients studied 
in this field) are representative of the general 
population. At best, they represent the less 
fortunate members of society who may be 
collectively described as lacking social status, 
financial resources, occupational skills, and 
often family ties. If mental patients in pub- 
lic facilities were compared to their peers in 
these terms, each of which is associated with 
the distribution of both arrests and mental 
illness in the general population, fewer differ- 
ences might be found. 

Although it is true that studies in this re- 
search area are individually and collectively 
incomplete and in many respects insufficient 
in their designs and analyses, their review 
appears warranted on several grounds. First, 
they represent the only available empirical 
evidence in an area of tremendous public con- 
cern. At the present time intense policy de- 
bates are being conducted regarding the rela- 
tive rights of patients and of communities 
into which they are discharged. The cumula- 
tive evidence that is derived from these 
studies may help to clarify major elements 
in these debates. In addition, summarization 
and criticism of this literature should serve 
as a guide to future research by indicating 
important, unresolved questions and suggest- 
ing more fruitful methods for their investiga- 
tion. 

In my opinion, the methodological issues 
raised here are both relevant and significant, 
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but are probably not major sources of weak- 
ness in the group of studies to be reviewed. 
The direction of bias is not consistent. Al- 
though no study is flawless, no problem is 
universal, so that agreement of results across 
studies lends robustness to their conclusions, 
the foregoing difficulties notwithstanding. 


Literature Review 
Early Studies ? 


Between 1922 and 1955, four studies were 
conducted regarding the subsequent arrest 
rates of paroled or discharged mental pa- 
tients, These studies have been accepted un- 
critically and quoted extensively during the 
past 50 years to substantiate the prevalent 
belief that mental patients are less dangerous 
and less inclined to criminal activity than 
other people. 

In 1922 the superintendent of the Middle- 
town State Homeopathic Hospital in New 
York State published a brief report describ- 
ing the subsequent careers of 1,000 patients 
paroled from his hospital during the previous 
10 years (Ashley, 1922). Ashley noted that 
“as the parole period up to three years ago 
covered only a period of from one to six 
months the data are not as complete as might 
be desired" (p. 64). This passage is quoted 
to show the basis for computation of period 
prevalence rates, which is evidently not en- 
tirely clear. The length of parole during the 
last 3 years of the study was not reported, 
nor was the distribution of paroles over 
time. Assuming an equal distribution and 
using the mean parole duration, one is led to 
conclude that the report was based on a 3- 
month follow-up for 700 of the 1,000 cases. 
Later publications referring to a 1-year parole 
period serve as the basis for the conjecture 
that the parole period during the last 3 years 
of Ashley's study did not exceed 1 year. 

Ashley carefully tallied recovery rates, dis- 
charge status, economic status, social adjust- 
ment, and readmissions of his 1,000 parolees, 
of whom two thirds were female. Sixty-four 
percent were either partly or fully restored 
to their former social roles, including the 
4696 that were recorded as cured. Although 
nearly one quarter of the parolees were in- 


volved in social conflicts serious enough to be 
brought to the attention of the hospital au- 
thorities, only 12 were arrested. The source 
of arrest figures is not given, but it seems 
likely that they were obtained from reports 
of relatives or neighbors to parole officers. 
The 12 arrests were for offenses including 
"vagrancy, assault and battery, forgery, 
swindlery or profiteering” (p. 65). 

Twelve arrests for 1,000 patients during a 
mean time period of 3 months after hospital 
release would be equivalent to an annual rate 
of 37 per 1,000.* Another 223 parolees were 
reported in Ashley's words to have engaged 
in "antisocial acts either resulting from or 
conducive to further mental trouble" (p. 65), 
but they were not arrested. One may conjec- 
ture that only 12 of 235 known antisocial 
acts resulted in arrest because of the acces- 
sibility of parole officers as negotiators or 
problem solvers or because of the reluctance 
of others to press charges against parolees. 
Arrest rates for the general population of 
that time were not reported, so one cannot 
determine the extent to which the observed 
rates depart from expected rates, 

Two other points regarding Ashley’s ar- 
rest figures warrant consideration. First, two 
thirds of his sample were female, and women’s 
arrest rates historically have been much lower 
than men’s. In addition, parole status is not 
the same as the unconditional discharge of 


*For the reader's convenience and to facilitate 
comparisons, all rates cited here are expressed as per 
1,000 even if the authors used a different basis. 

* The rate of 37 per 1,000 is derived by estimating 
that 7076 of the sample had a mean follow-up of 3 
months and the rest a mean follow-up of 1 year. 
Thus, 70% of 12 was multipled by 4 to give an an- 
nual basis and then added to the remaining 3096 
(70% X 12 = 84 X 4 = 33.6 + 3.6 = 37.2). This esti- 
mate is vastly greater than the rate computed and re- 
ported by Pollock (1938) in his article that sum- 
marizes Ashley's (1922) findings and that has since 
been widely quoted. Pollock interpreted the passage 
cited above to mean that Ashley's cases *were under 
observation on the average nearly 5 years," yielding 
an "average annual rate of arrests . . . of 24 per 
1,000" (p. 239). Unless Pollock had access to addi- 
tional information from Ashley other than the 1922 
article, which Pollock does not mention, it would 
seem that his estimate is based on a misreading of 
Ashley’s article. 
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contemporary patients. Parole could be re- 
voked by the unilateral decision of hospital 
representatives, so patients may have been 
more careful about their behavior and more 
closely supervised by family members than 
patients described in recent studies. 

In 1938, Pollock published a study of the 
legal offenses committed by patients paroled 
during fiscal 1937 from all New Vork State 
hospitals. At that time, patients considered 
eligible for release were paroled for 1 year 
before discharge. “To be deemed eligible for 
parole, the patient must be harmless, trac- 
table, and able to mingle with others without 
causing trouble, . . . No patient is placed on 
parole unless there is a safe and suitable home 
to receive him and a relative or friend to 
welcome and care for him” (p. 241). Evi- 
dently patients were rigorously screened be- 
fore their parole in terms of social as well as 
psychiatric criteria. 

Pollock wrote his article to counteract the 
“frequently made charge that the paroling 
of mental patients by State hospitals is a 
dangerous procedure” (p. 236), and his data 
certainly look convincing although their 
sources are ambiguous. In 1937, nearly 
65,000 patients were distributed among the 
20 state facilities, and on an average day 
there were 5,833 patients on parole, includ- 
ing a slight excess of males. Pollock reported 
that paroled patients were arrested for 40 of- 
fenses, including 29 misdemeanors and 11 
felonies; all but 3 of the misdemeanors were 
committed by men. Dividing 40 by 5,833, he 
obtained an arrest rate of 6.9 per 1,000. Un- 
fortunately, Pollock did not indicate the 
source of his arrest data or the time period 
covered. Since this rate is compared to that 
of the state’s general population in 1937, the 
implication is that the 40 arrests were all 
those committed by all patients paroled dur- 
ing fiscal 1937, Accordingly, this rate is sub- 
stantially lower than that of the general pub- 
lic over 14 years of age in New York State, 
which was 98.5 per 1,000 in 19375 or 14 
times higher than the arrest rate of paroled 
patients. As Pollock put it, whereas 40 ar- 
rests were made among parolees, 576 arrests 
were made among an average normal group of 
like size in the general population. 


Pollock cited, as further evidence of the 
arrest rate of parolees, a 10-year follow-up 
study of 741 patients paroled from two New 
York State hospitals conducted under the di- 
rection of R. Fuller and apparently never 
published. In his one-paragraph summary of 
the findings, Pollock reported that “among 
the 289 male patients there were 19 arrests 
and among the 452 female patients five ar- 
rests. The annual rate of arrests among males 
was 0.7% and among females 0.1%” (p. 
240). No further information was provided. 

Pollock attributed these low arrest rates to 
two factors: the extreme care exercised in 
screening patients for parole and their con- 
stant surveillance while on parole. Not only 
did social workers regularly visit the patients 
to interview them as well as their families 
but patients were required to attend parole 
clinics at specified intervals. When the parole 
officer thought it necessary, the patient was 
fetched by the hospital car and promptly re- 
admitted. Pollock concluded that “the hos- 
pital methods are well planned and are pro- 
ducing beneficial results to the patients and 
communities served” (p. 242). The only rea- 
son to entertain some doubt about his figures, 
which as indicated are undocumented, is his 
treatment of Ashley’s annual arrest rates 
(discussed earlier). 

Like Ashley and Pollock before them, 
Cohen and Freeman (1945) were associated 
with a state hospital and were distressed by 
public resistance to the presence of dis- 
charged mental patients in the community. 
They were eager to promote patient ac- 
ceptance by showing that they were not 
dangerous and toward this end conducted a 
follow-up study of 1,676 patients who were 
paroled and discharged from one of three 
state hospitals in Connecticut between 1940 
and 1944. (This is the only early study not 
based on New York State patients.) Patients 
were followed for an average period of 2 
years during which 87 or 5.2% were ar- 
rested, yielding an annual arrest rate of 26 


5 This is a peculiarly high figure compared to gen- 
eral population arrest rates subsequently reported 
(1947 through 1975), which are never more than half 
as high. 
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per 1,000. It occurred to Cohen and Freeman 
that patients who were in trouble with the 
law after discharge might also have police 
records before their hospitalization. They 
found that 314 or 18.4% of their patients 
had such prior records, and all but 6 of the 
patients arrested after discharge came from 
this group. In other words, patients without 
prior arrest records were almost never in 
legal difficulty after hospital discharge, 
whereas 93% of those subsequently arrested 
had records. Both before and after hospitaliza- 
tion, more than two thirds of arrests were for 
“drunkenness and breach of peace." Exclud- 
ing sex offenses for which no population 
rates were available, the total annual felony 
rate per 1,000 for discharged patients was .9, 
compared to a rate of 13.7 per 1,000 for the 
general population. From these figures Cohen 
and Freeman concluded that “these patients 
who have left mental hospitals are not as 
dangerous to the community as those who 
have never been judged mentally ill” (p. 699). 

Cohen and Freeman considered the etio- 
logical significance of mental status in the 
production of criminal behavior but reached 
no firm conclusions. They suggested that some 
arrests seem related to mental illness and 
that “once the patient is hospitalized and re- 
leased, the arrest-precipitating behavior is 
either no longer present or is under some 
social control,” (p. 699) referring in the 
latter point to control by other people. On 
the other hand, they felt that many of their 
recidivists were “inadequately adjusted for 
reasons which are not directly related to their 
mental illness at all. ... [They] seem merely 
to have been following old anti-social pat- 
terns" (p. 700). In subsequent literature re- 
views such as that by Rappeport, Lassen, and 
Hay (1967), their idea that hospitalization 
produces a reduction in criminality is em- 
phasized, but the notion of coexisting patterns 
of mentally ill and antisocial behavior is 
overlooked. 

The next, and perhaps most influential, 
study in this series of early investigations 
was conducted for the New York State De- 
partment of Mental Hygiene by Brill and 
Malzberg (1962). Their report, based on 
state hospital discharges during fiscal 1947, 


was circulated within the department in 
1954 but was not otherwise disseminated until 
1962. Even then, it was distributed in the 
form of a supplementary mailing by the 
American Psychiatric Association rather than 
published in a standard journal. Nonetheless, 
it became widely known and often cited, both 
because of its large scope and because of its 
investigation of historical variables associ- 
ated with arrest rates as well as of the rates 
themselves. 

In their work, Brill and Malzberg studied 
the arrest rates of 10,247 male patients over 
age 15 released from New York State men- 
tal hospitals during fiscal 1947. Arrests at 
any time before and for 54 years after hos- 
pital release were traced by means of a finger- 
print registry in Albany for approximately 
half the sample Arrest rates for patients 
were compared with those of all males over 
age 15 in New York State for 1947. 

Patients were divided into two categories: 
those with a prior history of arrests and 
those without. Arrest rates after hospital re- 
lease were computed for both groups sepa- 
rately and together. The patients with prior 
arrest records (15%) turned out to have 
dramatically higher subsequent arrest rates 
than the other patients as well as than the 
general population, whereas the patients with- 
out arrest histories (85%) had considerably 
lower rates than the general population for 
misdemeanors, all felonies, crimes of violence, 
and all crime categories combined. When the 
patients were considered together as a single 
group, their felony arrest rates were higher 
and their total arrest rates weré lower than 
those of the general population. 

As seen in Table 3, the patients fall into 
two very different groups depending on ar- 
rest history. As a group, patients with prior 
arrests had subsequent arrest rates ranging 
from 6 to 8 times higher than those of the 
general population and 9 to 16 times higher 
than those of the other patients. Although 


5 These patients (N = 5,354) were admitted after 
fingerprinting became a routine part of the admis- 
sions procedure. Arrest rates for the remaining 4,893 
were calculated based on demographic and historical 
characteristics. 
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Table 3 


Annual Arrest Rates per 1,000 for Male Patients Released From New York 


and for the General State Population in 1947 
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State Hospitals 


——————————————————— 


Patients with Patients without Total General 
Offense arrest records arrest records patients population 
Misdemeanors 32.8 2.0 6.7 45.8 
Felonies (all) 26.3 1.6 5.5 3.3 
Homicides and assaults only 5.3 6 1.3 8 
All categories 60.0 3.6 12.2 49.1 


Note, These data are from Brill and Malzberg (1962). 


constituting only 15% of the patient group, 
their excess of arrests in the felony category, 
especially for violent crimes (homicide and 
assault), was sufficient to influence the arrest 
rate of all patients combined compared to 
that of the general population. This excess 
was due to only one third of the patients with 
prior arrests; 66% of these patients and 98% 
of the other patients were not arrested after 
hospital release. 

A critical issue that Brill and Malzberg 
made some effort to address, which has not 
been dealt with by other investigators either 
before or since, is whether the recidivism rate 
among ex-convicts who have been in psychi- 
atric hospitals is the same, lower, or higher 
than that of people with arrest histories who 
have not been hospitalized. Based оп in- 
formation from unnamed “expert criminolo- 
gists,” Brill and Malzberg reported an over- 
all annual rearrest rate of 5% among pa- 
roled ex-convicts, which is similar to the 6% 
rate among released mental patients with ar- 
rest records. The unnamed experts also re- 
ported that only 5% of the general popula- 
tion in 1947 had arrest records, in contrast to 
15% of the patient population. 

Brill and Malzberg also looked into the 
demographic and psychiatric characteristics of 
patients in relation to arrest patterns. Over- 
all, the same factors associated with re- 
cidivism in the general population were 
found to characterize rearrested patients, in- 
cluding unmarried status, youth, alcoholism, 
drug addiction, and residence in delinquency 
areas. Rearrested patients were overrepre- 
sented in the diagnostic categories of alcohol- 
ism, drug abuse, and psychopathic personality 
and significantly underrepresented among the 


major functional psychiatric disorders. Symp- 
tom severity and length of hospital stay were 
negatively correlated with recidivism. As one 
might expect, lower arrest rates prevailed 
among voluntary admissions. Brill and Malz- 
berg concluded that “statistically the group 
of previously arrested patients behaves more 
like a segment of the correctional population 
than a primarily psychiatric one” (p. 7). 

In summary, their data show that mental 
illness and psychiatric hospitalization do not 
raise the probability of subsequent arrest 
above that existing before hospitalization and 
do not create such a tendency if it did not 
previously exist. This is an observation of 
major and enduring significance, and it is 
perhaps ultimately all that must be said on 
the subject of crime and mental illness. Their 
data demonstrate that crime rates for ex-pa- 
tients without arrest records are lower than 
those for the general population and are in- 
flated by the presence of patients with police 
records. In 1947, and as one sees even more 
so today, men with arrest records constitute 
a larger proportion of the patient popula- 
tion than of the general population, and it is 
this that contributes to the higher arrest rates 
of mental patients when they are considered 
as one group. 

The preceding four studies (sometimes 
counted as five when the paragraph about Ful- 
ler’s findings, cited in Pollock, 1938, is consid- 
ered separately) together constitute the foun- 
dation for psychiatric reassurances that former 
mental patients are no threat or danger to 
their neighbors. In fact, the first two studies 
concur in their results and interpretations, 
whereas the third and fourth identify the 
major factor contributing to higher arrests 
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among discharged patients, providing a frame- 

- work within which to integrate the apparently 
contradictory findings of studies published 
before and after 1965. 

The first three studies are difficult to eval- 
uate because so little information is pro- 
vided with respect to record sources, case 
representativeness and selection, and com- 
putation of rates. Nevertheless, even, if the 
authors minimized patient arrests and used 
high estimates of population arrest rates to 
yield an exaggerated contrast, it is reason- 
ably certain that mental patients discharged 
before World War II were less often arrested 
than were members of the general public. 


Recent Studies 


Since 1965, eight American studies includ- 
ing nine samples were designed to contribute 
further empirical evidence to the question of 
the dangerousness of discharged mental pa- 
tients. Each study found that arrest or con- 
viction rates of former mental patients equaled 
or exceeded those of the general population in 
at least some crime categories when patients 
were considered as a homogeneous group. Al- 
though each study has some limitations and 
all do not provide equivalent data, they are 
cumulatively persuasive in their evidence that 
patterns of arrests among former mental pa- 
tients are very. different from and far higher 
than those reported earlier, both in absolute 
terms and in comparison to the general public. 

In terms of temporal sequence, the studies 
of Rappeport and Lassen (1965, 1966) serve 
as a bridge between the earlier studies and 
contemporary ones, since they investigated 
arrest rates of patients discharged during the 
fiscal years of 1947 and 1957 from all Mary- 
land psychiatric hospitals. Findings for men 
and women were published separately (in 
1965 and 1966, respectively). Rappeport and 
Lassen used a data set superior to data sets 
of other studies in that it included discharges 
from federal and private as well as state fa- 
cilities. The total sample consisted of 708 men 
and 693 women in 1947 and 2,152 men and 
2,129 women in 1957. Arrest records for 5 
years preceding and 5 years following hospital 
discharge in 1947 and 1957 were collected 


from each police jurisdiction in the state. 
Comparative arrest rates for the state popula- 
tion for equivalent time periods were obtained 
from the FBI for males and females over age 
15 separately. 

Arrests were recorded for only five offenses, 
all of which were violent crimes against per- 
sons: murder, negligent manslaughter, rape, 
robbery, and aggravated assault. The obtained 
arrest rates are thus not comparable to total 
arrest rates reported in other studies. Fur- 
ther, they counted arrests only in the 5 years 
preceding this hospitalization rather than dur- 
ing the lifetime of the patient, as in the Brill 
and Malzberg (1962) study. 

For each of the five crime categories, pa- 
tient rates were equivalent to or higher than 
those of all Maryland residents. Male rates for 
robbery and female rates for aggravated as- 
sault were particularly high. Arrest patterns 
were not notably different between 1947 and 
1957 for either males or females, despite ex-: 
pectations that newer treatment techniques 
would have reduced posthospitalization ar- 
rests. The authors drew two major conclu- 
sions. First, they found no support for the 
contention that mental patients “are to any 
great extent less involved in criminal behavior 
than those in the general community” (Rap- 
peport & Lassen, 1965, p. 779). Second, they 
concluded, as did Brill and Malzberg, that 
recidivism among patients with histories of 
prior arrests is “not unlike that seen in the 
general community” (p. 779). In summary, 
they observed that “we as psychiatrists may 
be biased when we malign others for sug- 
gesting that some of our patients represent 
a threat to the community” (p. 779). 

Rappeport and Lassen attempted to analyze 
arrests by diagnostic category; thus for each 
of the four time periods studied (before and 
after 1947 discharges and before and after 
1957 discharges) they tallied arrests by diag- 
nostic category. The two categories that ac- 
counted for the largest proportion of arrests 
in all four time periods were those of alcohol 
intoxication and schizophrenia. Since Rap- 
peport and Lassen did not say how diagnoses 
were distributed in the entire sample, one can- 
not determine whether these diagnoses are 
underrepresented or overrepresented in the 
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subset of patients who were arrested. Fur- 
ther, only percentages were reported, which 
can be grossly misleading, as in the break- 
down for females arrested after their 1947 re- 
lease. The authors showed in tabular form 
that 33% of arrested females were diagnosed 
as manic-depressive, 33% as neurotic, and 
33% as mental defective. In the footnote of 
another table they stated that only three ar- 
rests were made of female patients during this 
time period, so that the diagnostic categories 
each contain a single case. 

These lacunae are partly offset by the pre- 
sentation of arrest rates per 100 within diag- 
nostic category for each of the four time pe- 
riods. In these tables and in the text the mis- 
leading effects of the previous presentation 
of percent arrests by diagnosis are diminished, 
as one can see that the arrest rates per 100 
for both alcohol intoxication and schizo- 
phrenia no longer appear outstanding, and 
that in fact no consistent association between 
arrest rate and diagnosis emerges across the 
four time periods. Some effort at statistical 
analysis would have been helpful in this con- 
text. As presented, this material may have 
some utility in defining patterns within the 
data set, but it is not otherwise informative. 

Analyses by type of crime are also of 
doubtful value in Rappeport and Lassen’s 
studies (1965, 1966), since in several cate- 
gories the number of events was so few that 
the reliability of computed rates must seri- 
ously be questioned. For example, no more 
than two arrests for murder by male patients 
were recorded in any single time period; con- 
sequently, estimation of rates per 100,000 
based on such rare occurrences is misleading 
because of the high probability of error. How- 
ever, the cumulative rates for all five crime 
categories combined are based on sufficient 
numbers to be relatively stable, and the au- 
thors’ conclusions seem reasonable and justi- 
fiable. 

The next study, by Giovannoni and Gurel 
(1967), is distinguished by the national dis- 
tribution of the patients studied, its excep- 
tionally comprehensive pursuit of patients’ 
status at follow-up, its restriction to the single 
diagnostic category of chronic schizophrenia 


among males, and its analysis of all police 


contacts and their dispositions, not only those 
that ended in arrests. Unlike any other re- © 
searchers in this field, Giovannoni and Gurel $ 
computed arrest rate denominators by meticu- 
lously measuring “in-community days" and 
thus precisely determining the number of pa- 
tients at risk for arrest in a given period of 
time. Unfortunately, criminal records preced- 
ing hospitalization were not reviewed, so that 
recidivism, which others have regarded as a 
major component of postdischarge arrest 
rates, could not be measured. 

Giovannoni and Gurel studied the subse- 
quent arrest records of 1,142 males under 60 
years of age, 95% of whom were chronic 
schizophrenics who were released from Vet- 
erans’ Administration hospitals in 12 states 
during 1956 and who remained out of the 
hospital for at least 30 consecutive nights. For 
each patient, days spent in the community 
were computed by subtracting from the total 
those days spent in penal or psychiatric fa- 
cilities. Follow-up information was gathered 
from periodic interviews as well as from in- 
stitutional records. Outcome data consisted of 
all “socially hazardous" incidents in which pa- 
tients were involved with police and not only 
those leading to arrest, since the authors were 
interested in seeing how often ex-patients who 
violated the law were rehospitalized rather 
than arrested. 

During the 4-year follow-up, 156 of the 
1,142 patients were involved in 192 offenses. 
In 48% of the incidents, the patient was sent 
to jail; of these patients, 40% were rehos- 
pitalized either directly or after a brief stay 
in jail. In 12% of the incidents other actions 
were taken including fines, probation, and 
dismissal. In the absence of a similar analysis 
of incident disposition of nonpatient police en- 
counters, one cannot be sure that use of arrest 
rates alone underestimates the number of in- 
cidents in which patients were involved, but 
these figures suggest such a likelihood. 

Offenses were analyzed by type, and rates 
for each type were compared to an average of 
national rates from 1957 to 1960 for cities 
with over 25,000 people, obtained from FBI 
records. It was necessary to use national fig- 
ures despite their known limitations because 
the patients had distributed themselves 
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. Table 4 


. Conviction Rates per 1,000 for Violent Crimes in California in 1971 
ТИЕШЕ cux a E 


Former Former Nontreated state 
state hospital outpatients in population 
Offense patients county program over age 15 
Assault 1.96 2.57 i3 
Homicide 98 .20 10 
Total no. 99,361 143,322 13,982,155 


Note. These data are from Sosowsky (Note 3). 


throughout the country. When offenses were 
grouped into crimes against persons, property, 
and morals, crimes against persons accounted 
for twice as many incidents (27%) as did 
crimes against property (12%). Another com- 
mon offense was drunkenness. This is not sur- 
prising because two thirds of those arrested 
were rated as problem drinkers by interview- 
ers. Since one is not told how many of the 
patients who were not arrested were also 
rated as problem drinkers, it is not clear 
whether problem drinking is a useful predictor 
of arrests in this chronically ill population; 
but in Giovannoni and Gurel’s (1967) study, 
problem drinking seems to be a common 
characteristic of patients who were arrested. 
Estimated annual rates of offense were com- 
puted by crime category for patients and 
compared to general population rates. Despite 
the low average annual number in each cate- 
gory (ranging from zero to two), which 
raises serious questions about the reliability 
of each of the rates, an overall pattern of dif- 
ferences between patient and general popula- 
tion rates emerged that seems more substantial 
and suggests true underlying differences. Pa- 
tient rates were higher (compared to the gen- 
eral population) for several violent crimes 
against persons (homicide, aggravated as- 
sault, and robbery but not forcible rape) 
and much lower for crimes against property 
(petit and grand larceny, burglary, and auto 
theft). For example, the estimated annual 
rates per 1,000 for homicide were .99 for pa- 
tients and .05 for the general population, 
whereas burglary rates for patients were .65, 
compared to 5.1 for the general population. 
These findings are of interest first because 
the patients were chronic schizophrenics, a 


category infrequently associated with arrests 
and convictions in other studies. Second, the 
patients were discharged from 12 different hos- 
pitals that varied in the rigor of their dis- 
charge criteria among other things, and they 
were arrested under the laws of many states, 
ruling out these two factors as possible ex- 
planations for the obtained results, These 
findings are not congruent with those of Rap- 
peport and Lassen (1965, 1966) concerning 
arrest rates by specific charge within the cate- 
gory of violent crime, but both studies show 
cumulatively high arrest rates for crimes 
against persons committed by discharged men- 
tal patients compared to the general popula- 
tion. 

The California State Department of Health 
sponsored two studies of crime and mental ill- 
ness that were conducted by Sosowsky (Note 
3). He first undertook a statewide study of 
arrest, conviction, and incarceration rates for 
violent crimes (homicide and assault) of 
former state hospital patients and outpatients 
at state-supported facilities compared to the 
nontreated population of California. In the 
second study he reviewed the criminal records 
of a cohort of state hospital patients in San 
Mateo County (California) and compared 
their arrest and conviction rates with those 
of the nonhospitalized county population in 
1974. 

In the first study, Sosowsky listed all pa- 
tients over age 15 who received California 
mental health funds (inpatient and outpa- 
tient) between 1966 and 1970 and matched 
them against the names of those convicted in 
the state for homicide and assault in 1971. 
Conviction rates were computed as shown in 
Table 4. 
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As the author noted, these rates did not 
take into account ex-patients arrested and 
ex-offenders treated in other states or those 
patients not at risk for becoming offenders 
because of death or movement out of state, 
but in view of his sample sizes these factors 
are probably insignificant. His results show 
that former patients are convicted at a sig- 
nificantly higher rate than are the nontreated 
population. Combining the two categories of 
offense, ex-hospitalized patients’ rates were 
seven times and outpatient rates six times 
greater than those of nontreated California 
residents. Analysis by age and sex showed this 
excess to be true for all age groups and both 
sexes, especially for those under 20 and over 
50. Female excess was even greater than that 
of males compared to the nontreated general 
population, although male rates for both pa- 
tients and nonpatients were higher than equiv- 
alent female rates in absolute terms. 

Tn his second study, Sosowsky analyzed ar- 
rests and convictions of 301 San Mateo 
County residents admitted to state hospitals 
between June 1972 and December 1973. The 
sample included over 95% of all patients that 
lived in San Mateo County who were admitted 
to state psychiatric facilities during this pe- 
riod. State criminal justice records were 
searched for all arrests and convictions of 
these patients from 1966 through 1973. It 
seems likely that most arrests preceded hos- 
pitalization, since the admission period fell at 
the end of the 8-year criminal record survey. 
In fact, 28 patients were still hospitalized 
when data analysis was begun in March 1974. 

Patient arrest and conviction rates were 
computed for all offenses and for violent and 
nonviolent offenses separately. Violent offenses 
included those involving direct bodily harm 
(murder, assault, and rape) and potential for 
harm (robbery, burglary, crimes against chil- 
dren, threatening violence, possession of bur- 
glar's tools or weapons, kidnapping, arson, and 
rioting). Nonviolent offenses included all mis- 
demeanors except violations of the motor ve- 
hicle code. Patient arrest and conviction rates 
were compared to those of the nonhospitalized 
San Mateo County population during the 
same time period. This is the only study in 
which the nontreated rather than the total— 


nontreated plus treated—population was used 
as a comparison group. 

Of the hospitalized patients studied, 6796 
were male, most were between the ages of 20 
and 39, 21% were nonwhite, and 73% were 
diagnosed as schizophrenic. Compared to the 
county's general population, males, nonwhites, 
and those in the 20-40-year-old age range 
were considerably overrepresented. A total of | 
47% of the patients were arrested at least 
once between 1966 and 1974, either before, 
after, or both before and after hospitalization. 
On the average, each patient was arrested 3.6 
times over the period surveyed. Of the total 
group, 23.6% were charged with a violent 
offense and 23.6% were charged with a non- 
violent offense (misdemeanor) only. 

Since the variables of age, sex, and ethnicity 
are each independently associated with higher 
arrest rates in the general population accord- 
ing to FBI findings (e.g., Federal Bureau of 
Investigation, 1976) and since the patients 
were disproportionately young, male, and 
black, the sample was predisposed toward 
higher arrest rates than those of the general 
public. However, Sosowsky analyzed arrest 
rates for violent crimes within these categories, 
and patients continued to have much higher 
rates than their age, sex, and ethnic counter- 
parts in the general population. For example, 
black male patients had an arrest rate of 6%, 
compared to 2.5% for black males in the 
county population. Patient excess therefore 
cannot be accounted for by skewed distribu- 
tions of demographic variables. 

In a subsequent analysis (Sosowsky, 1978) 
of the second study, Sosowsky raised the is- 
sue of the relation between diagnosis and 
crime category. He presented a table showing 
that 73% of the sample were schizophrenic, 
whereas 80% of patients arrested for violent 
offenses were so diagnosed. In contrast, only 
70% of patients arrested for nonviolent of- 
fenses were schizophrenic. In the absence of 
statistical analysis, the very appearance of 
this table suggests a meaningful relationship 
between diagnosis and arrest category, but in 
fact these differences are not statistically sig- 
nificant (using a chi-square, the probability of 
the first distribution is .16 and that of the 
second is .45). 
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After analyzing arrest rates for all offenses 
combined, Sosowsky (Note 3) compared rates 
of arrests and convictions for the violent 
crimes of homicide and assault taken sepa- 
rately, for patients and for the county popula- 
tion. He found an enormous excess among the 
former. Patient arrests were 15 times higher 
and conviction rates 29 times higher than 
equivalent rates for the county population. 
Computed as rates per 1,000 for arrests and 
convictions, the arrest rate of patients was 
27.80, compared to 1.80 for the county popula- 
tion. The patient conviction rate was 11.70, 
whereas the county population conviction rate 
was .41. 

Together, Sosowsky's studies (Note 3) of- 
fer powerful support for the proposition that 
both outpatients and hospital patients in state- 
supported facilities have higher arrest and 
conviction rates for violent crimes than do 
nontreated California residents, and presum- 
ably, the general public. Unfortunately, these 
data do not contribute to an understanding of 
the causal direction between mental illness and 
crime because Sosowsky did not classify ar- 
rests in terms of whether they preceded or 
followed psychiatric hospitalization. He also 
did not analyze the recidivism rates of dis- 
charged patients with prior arrest histories. 
Thus, although the magnitudes of the arrest 
rates he reported for patients are impressive, 
they are difficult to interpret and they cannot 
be used to understand the relationships be- 
tween the criminal justice and mental health 
systems or between criminal and disordered 
behavior. 

The investigation of patient arrest rates 
conducted at Bellevue Psychiatric Hospital 
in New York City by Zitrin et al. (1976) 
achieved a certain notoriety even before it was 
published. After its presentation at a psy- 
chiatric meeting, a local organization con- 
cerned with the rights of mental patients 
threatened to bring suit against the authors on 
the grounds that their intent from the start 
was to discredit patients and to promote their 
involuntary detention in hospitals. 

Zitrin and his colleagues studied 867 male 
and female patients admitted to this acute- 
care municipal hospital between July 1969 
and July 1971 who lived in the hospital’s 


catchment area (14th to 42nd Streets in Man- 
hattan). Arrest records were located for each 
patient for the 2 years preceding and follow- 
ing this admission. The patient sample in- 
cluded a slight majority of males and an age 
range of 10 to 80, although over three quarters 
were between age 20 and age 50. Multiple hos- 
pitalizations were common; the 867 patients 
had a total of 2,000 admissions to Bellevue 
during the 2-year study. One patient was ad- 
mitted 22 times. The authors reported that 
nearly half of the patients were diagnosed as 
schizophrenic, 7% as alcoholic, 6% as drug 
dependent, 8% as neurotic, and 32% as other. 
However, the diagnostic picture was often 
blurred, and there were notable overlaps be- 
tween the categories of schizophrenia and 
substance abuse. For example, of the 42 pa- 
tients with a primary diagnosis of schizo- 
phrenia who were among the 85 patients ar- 
rested for crimes of violence, 20 had a history 
of drug abuse, 8 had been drug dependent and 
alcoholic, and 2 others had previously been 
diagnosed as drug addicts. In other words, 
three quarters of the patients labeled schizo- 
phrenic in this subgroup previously were iden- 
tified as substance abusers. Similarly, of the 
10 alcoholics arrested for violent crimes, 4 had 
been diagnosed as schizophrenic on at least 
one previous occasion, and 1 was a drug ad- 
dict. Evidently, at least among mental pa- 
tients arrested for violent crimes, the bound- 
aries between these three diagnostic groups 
(schizophrenia, alcoholism, and drug depen- 
dence) are fluid, and computation of com- 
parative incidence rates may be misleading. 
A similar overlap was identifiable among 
schizophrenics and substance abusers in this 
sample who were not subsequently arrested 
(Zitrin, Note 4). 

Overall, 202 or 23% of the patients studied 
were arrested one or more times during the 4- 
year period reviewed. Compared to the 45% 
arrest rate Sosowsky (Note 3) reported for 
his cohort of state hospital patients, this fig- 
ure does not seem extreme, except for the fact 
that Sosowsky recorded arrests during an 8- 
year period surrounding hospitalization, 
whereas the Bellevue period of record sur- 
veillance was only 4 years. 

Patient offenses were classified as violent 
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Table 5 а 
Age-Adjusted Arrest Rates per 1,000 in 1972 
for Violent Crimes 


Bellevue 

US, catchment Patient 

Offense cities area sample 
Murder AS, 45 30 

Aggravated 

assault 1.44 5.81 7.02 
Robbery 1.06 8.81 8.38 
Rape 47 57 1.15 


Note. These data are from Zitrin, Hardesty, Bur- 
dock, and Drossman (1976). 


crimes involving direct bodily harm (murder, 
assault, forcible rape, and sexual abuse), 
violent crimes involving potential for harm 
(robbery, burglary, weapons possession, pos- 
session of burglar’s tools, arson, inciting to 
riot), and nonviolent crimes (presumably all 
misdemeanors except for simple assault, which 
is included with other forms of assault as a 
violent crime). The inclusion of violent crimes 
involving potential for harm in the violent 
crime category extends its meaning to some- 
what less severe offenses than the term as 
used by others. 

Using this classification, 85 or 10% of the 
867 patients were arrested for violent crimes 
and 13.5% for nonviolent crimes. These 85 
patients committed a total of 155 crimes of 
violence, the most common being assault, fol- 
lowed in order of frequency by burglary, 
weapons possession, and robbery. The age dis- 
tribution of the violent offenders peaked in 
the 20-30-year interval, whereas the modal 
age of nonviolent offenders fell into the 30— 
40-year interval. 

The 85 patients arrested for violent crimes 
during the 4 years under review were arrested 
a total of 366 times for various offenses—an 
average of 4 times each. Twenty-eight of them 
were arrested only during the 2 years before 
admission, 21 were arrested only after dis- 
charge, and 36 of this subgroup were arrested 
both before and after this hospital stay. 
Among the 117 patients arrested for non- 
violent offenses, a larger proportion was ar- 
rested before the hospitalization than after 
the hospitalization (Zitrin, Note 4). 

Zitrin and his colleagues analyzed crime 
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categories in terms of the four diagnostic 
groups selected for study. Schizophrenics con- 
stituted about half of the sample and ac- 
counted for half of all arrests. Differences did 
emerge in relation to type of crime. Schizo- 
phrenics constituted two thirds of patients ar- 
rested for violent crimes involving bodily 
harm (21 out of 30), but were underrepre- 
sented among arrests for nonviolent offenses, 
These results agree with the findings of Gio- 
vannoni and Gurel (1967) that schizophrenic 
patients, when they behave disruptively, are 
indeed arrested and charged with criminal con- 
duct rather than simply rehospitalized. 

Both alcoholics and drug abusers were 
heavily overrepresented in all crime cate- 
gories, as Brill and Malzberg (1962) ob- 
served earlier. Constituting only 7% and 6% 
of the patient sample, respectively, they ac- 
counted for nearly one third of all arrests, Ad- 
dicts were more likely to be charged with 
violent crimes and alcoholics with misde- 
meanors, Like schizophrenics, patients diag- 
nosed as neurotic were proportionately repre- 
sented among total arrests and also within 
crime categories, 

The final analysis reported was a compari- 
son of patient arrest rates with those of resi- 
dents in the same catchment area and with 
1972 rates for cities nationally (Federal Bu- 
reau of Investigation, 1973), excluding chil- 
dren under age 15 from all groups. As Table 
5 shows, arrest rates both for patients and for 
all residents of the Bellevue catchment area 
are far higher than those of pooled residents 
of all American cities. There is comparatively 
less difference between patients and their 
catchment area neighbors. The authors did 
not report the statistical significance of these 
differences, but it seems from inspection that 
across the four categories the difference is neg- 
ligible, although for the offenses of rape and 
aggravated assault patient rates are higher. 
Since this catchment area group is most nearly 
comparable to the patient sample in terms of 
the social and demographic characteristics of 
all the studies reviewed, the diminished con- 
trast observed here is noteworthy. 

Zitrin et al. (1976) concluded that their 
data do not support the commonly held be- 
lief among mental health professionals that 
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the mentally ill commit fewer crimes than the 
general population. They urge provision of 
more effective aftercare services, together with 
the introduction of a provisional discharge 
category in which community stay is con- 
tingent on clinic attendance. In agreement 
with Brill and Malzberg, and Giovannoni and 
Gurel, they attribute the growing arrest rates 
among patients to “the increasing diversion of 
arrested persons from the criminal justice 
channels to mental hospitals” (p. 147). 

A report of arrest rates among Wyoming 
state hospital patients provides a contrast of 
setting to the largely urban, industrialized 
areas previously investigated. Durbin, Pase- 
wark, and Albers (1977) studied the arrest 
records of all patients aged 18 to 64 (286 men 
and 175 women) admitted to the only state 
facility in 1969, excluding patients remanded 
by criminal order for psychiatric evaluation. 
This sample was unusual diagnostically in that 
63% of the male patients had a primary diag- 
nosis of alcoholism and another 6% had such 
a secondary diagnosis. Arrest rates for male 
and female patients separately were derived 
from police records covering the 5 years be- 
fore and the 5 years after hospitalization, from 
1964 to 1973. Unlike investigators in any 
other study except that by Giovannoni and 
Gurel (1967), Durbin used a correction factor 
in computing arrest rates based on the mean 
number of days patients were hospitalized 
during this 10-year period. 

Arrest rates of the patient sample were 
compared with those of the general Wyoming 
population aged 18 to 64, obtained from state 
records, for men and women separately. From 
both sets of rates, minor misdemeanors were 
excluded, including public intoxication, driv- 
ing and traffic violations, gambling, vagrancy, 
and suspicion. In addition, rates were based 
on number of arrests, not on number of per- 
sons arrested. For both reasons, these rates 
are not directly comparable to those of other 
studies. Since only five female patients were 
arrested during the study period, analyses of 
results were based on males only. 

Analysis of arrest by crime category is of 
doubtful reliability in view of the few cases 
in each of the 17 categories listed. Perhaps 
more meaningful is the pattern that emerges 


across categories: Patient arrest rates were 
higher in all categories of violent crime (crime 
against persons), four out of five types of 
crime against property, and drug offenses, 
They were lower for “white collar” crimes 
such as forgery, counterfeiting, and embezzle- 
ment, 

The distribution of diagnoses among males 
in this Wyoming sample is uncommon (diag- 
noses of females were not given). Nearly 10% 
had a primary or secondary diagnosis of alco- 
holism, 14% were schizophrenic, 10% were 
personality disorders, and 3.5% were drug 
abusers. Surprisingly, in light of others’ find- 
ings of an association between alcohol and 
crime, alcoholics were not overrepresented 
among arrested patients, nor were schizo- 
phrenics. Further, in this sample schizo- 
phrenics committed no violent crimes, in con- 
trast to Zitrin et al.’s (1976) findings. The 
two groups with excessive arrest rates were 
patients with personality disorders and drug 
abusers. 

Demographic variables associated with pa- 
tient arrests included sex, age, marital status, 
and type of hospitalization. Single men aged 
18 to 24 had a disproportionate share of ar- 
rests, as did those who were involuntarily hos- 
pitalized. Women constituted 38% of the pa- 
tient group, but accounted for only 5% of 
the arrests. 

Comparing arrest rates before and after hos- 
pitalization, two thirds of the arrests occurred 
before, Of the 49 men arrested during the 10- 
year period, 6 had no prior arrests and 35 had 
no subsequent arrests. In other words, the 
high recidivism noted elsewhere is not ap- 
parent here. Durbin et al. (1977) also noted 
a clustering of arrests in the year preceding 
hospitalization, suggesting that the disruptive 
behavior leading to the arrests may also have 
led to the decision to hospitalize. Here, as 
elsewhere, it is observed that hospitalization 
"may often serve as a diversionary adjunct to 
the criminal justice system" (p. 83). 

The most recent investigation of arrest 
rates of discharged patients was conducted at 
the request of the New York State Commis- 
sioner of Mental Hygiene. Steadman, Melick, 
and Cocozza (Note 5) designed their study 
both to serve as a follow-up to Brill and Malz- 
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berg's (1962) 1947 study and to compare ar- 
rests of patients discharged in 1968 and 1975, 
before and after introduction of more liberal 
and rapid discharge policies in the state hos- 
pital system. 

The 1968 sample consisted of every 14th 
patient aged 18 or more who was discharged 
from state facilities in that year, producing a 
study group of 1,920. In 1975, every 18th 
patient was selected, creating a sample of 
1,938. In both samples, a small majority was 
male, 70% were white, 22% were black, and 
7% were Puerto Rican. Most had not com- 
pleted high school. Half in 1968 and 43% in 
1975 lived in New York City. The most com- 
mon diagnostic label was schizophrenia (27% 
in 1968 and 3295 in 1975), followed by para- 
noid states (20% in both samples) and sub- 
stance abuse (16% in 1968 and 199% in 
1975). Over 80% of all the patients were re- 
leased within 6 months of the current hos- 
pital admission. Nineteen percent of the 1968 
sample and 26% of the 1975 sample had a 
history of prior arrests. 

Lifetime histories of psychiatric hospitaliza- 
tion and criminal records before the current 
hospitalization were obtained from state 
agencies. The median follow-up period for 
both groups was 19 months (the briefest of 
any study reviewed). Patient arrest rates 
were compared with those of the general pop- 
ulation of New York State over age 17. 

Patients’ total annual arrest rates were 
found to be more than double those of the 
general population in both years studied. In 
1968 the patient rate per 1,000 was 73.5, 
compared to 27.5 for the general population. 
In 1975 the discrepancy was even greater: 
98.5 for patients and 32.5 for the general 
population. Rates for both increased between 
occasions, but the patient rate accelerated 
more rapidly. 

Analysis of arrests by crime category 
showed patient rates exceeding public rates 
within every category in both years with one 
exception: approximately equal rates for sex 
crimes in 1968. Greatest excess occurred in 
arrest rates for property, sex, and violent 
crimes in the 1975 sample and in arrest rates 
for drug, property, and violent crimes in the 
1968. sample. Düring -the-19 months after 
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discharge, .9% of the 1968 sample and 1.7% 
of the 1975 sample committed violent crimes 
(homicides and assaults)—a small but in- 
creasing proportion. 

The authors analyzed demographic, psychi- 
atric, and historical factors associated with 
postdischarge arrests, first singly and then 
simultaneously. Taken one at a time, several 
variables were found to be significantly cor- 
related with arrests, including youth, shorter 
hospital stays (presumably because the dis- 
orders were less serious or more transient), 
sex (fewer females arrested), ethnicity (in 
1968 both blacks and Puerto Ricans had dis- 
proportionately high rates; in 1975 only 
blacks did), and admitting diagnosis. Person- 
ality disorders and substance abusers had 
the highest arrest rates. Although schizo- 
phrenics constituted about 25% of the sample 
in 1968 and 32% in 1975, only 3% and 6% 
were arrested in the respective years. 

When multivariate analysis was applied to 
these data, diagnosis dropped out as a mean- 
ingful predictor; most of its relationship to 
arrest rates was accounted for by the varia- 
bles of age and sex: Most personality dis- 
orders and substance abusers are young 
males. Using multiple regression analysis 
with the three variables with the highest zero 
order correlations to arrest rates—total prior 
arrests, age, and admitting diagnosis—Stead- 
man et al. (Note 5) found that only the first 
two made independent contributions, to- 
gether accounting for 1346 of the variance in 
subsequent arrest behavior. Interaction effects 
were not analyzed as such. 

In this study, as in virtually every other 
that looked at such data, a history of arrest 
before hospitalization turned out to be the 
best single predictor of postdischarge arrests 
so far identified. Here, as elsewhere, when 
arrest records of patients with and without 
prior arrests were separately analyzed and 
compared with those of the general popula- 
tion, postdischarge arrests of those with 
prior records substantially exceeded general 
population rates, whereas patients with no 
such history had lower rates than the gen- 
eral population in five out of the six crime 
categories studied (the exception was prop- 
erty crimes). Furthermore, Steadman and his 
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colleagues found a gradient such that patients 
with three or more prior arrests had more 
postdischarge arrests than those with two 
arrests, who in turn had more postdischarge 
arrests than those with one arrest. Not only 
was prior arrest itself a useful predictor but 
a pattern of similar offenses was also ob- 
served. Thus, the best predictor of subse- 
quent arrest for violent crime was arrest for 
one or more violent crimes preceding hos- 
pitalization. 

The predictive utility of a prior arrest his- 
tory is widely recognized in the criminal 
justice literature. FBI studies (Federal Bu- 
reau of Investigation, 1973) of recidivism 
among over 200,000 persons arrested for 
federal offenses in the period 1970 through 
1972 showed that 65% had been arrested at 
least twice, with an average of four arrests 
during the preceding 5 years. Furthermore, of 
these repeat offenders, 44% were rearrested 
in states other than that where first arrested. 
For the repeat offenders in 1972 alone, 55% 
of prior arrests took place in another state. 
Frequency of prior arrests was negatively 
associated with age; offenders under age 20 
were arrested on the average every 3 months, 
whereas those over age 40 were arrested less 
than every 2 years. Parenthetically, such 
high interstate mobility among repeat of- 
fenders suggests the possibility of significant 
underenumeration of arrests among mental 
patients in those studies that relied only on 
single-state arrest data. 

In Steadman et al.’s study, the total arrest 
rates for all patients combined are inflated by 
the presence of patients with criminal records. 
More such persons are found among the 
patient population of New York State mental 
hospitals than among the general popula- 
tion, and it seems to be this excess that ac- 
counts for higher total patient arrest rates. 
In addition, the proportion of patients ad- 
mitted with arrest records has increased strik- 
ingly over time, from Brill and Malzberg’s 
(1962) finding of 15% for male patients in 
1947 to 32% of male patients in 1968 (Stead- 
man et al., Note 5). By 1975, 4 out of 10 
male patients had police records prior to their 
hospitalization. Approximately one third of 
the patients with arrest histories at each time 


period were arrested after discharge, com- 
pared to 296—496 of patients without prior 
records who were subsequently arrested. As 
Steadman and his colleagues concluded, 
“Prior criminality rather than mental illness 
appears to be the primary explanation for 
the increasing arrest rates” (p. 16) observed. 


Evaluation of the Evidence 


Many questions about criminal behavior 
of mental patients have been raised and par- 
tially answered by the studies reviewed. It is, 
however, difficult to evaluate the evidence 
generated by each study when they are con- 
sidered consecutively. In this section, there- 
fore, I summarize relevant findings to clarify 
how much has been learned and what further 
information is needed. 


Do Mental Patients Currently Have Higher 
Arrest Rates Than Members of the 
General Population? 


Today and over the past 20 years, mental 
patients discharged from public facilities as 
a group have total arrest rates for all crimes 
that equal or exceed public rates with which 
they have been compared. Arrest and convic- 
tion rates for the subcategory of violent 
crimes were found to exceed general popula- 
tion rates in every study in which they were 
measured, The certainty of these findings 
must be tempered by consideration of design 
limitations including nonequivalent compari- 
son groups and lack of attention to variables 
such as age and social class. 


Have These Rates Changed Over Time? 


The arrest rates of paroled and discharged 
patients based on pre-1950 records consist- 
ently were found to be lower than those re- 
ported for the general population, There has 
been a pronounced relative as well as abso- 
lute increase in arrests of mental patients 
since then; that is, whereas arrest rates for 
both patients and the general public have 
increased, the rate of acceleration for Dass 
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What Factors Have Contributed to This 
Upward Trend in Arrest Rates? 


At least two developments may account for 
the observed changes, both of which are 
fundamentally related to the evolving social 
role of the mental hospital in our society. 
First, there is the change in hospital policies 
regarding involuntary admission and retention 
of patients and the likelihood and timing of 
their discharge. In the past, there were com- 
paratively few limitations regarding grounds 
for involuntary civil commitment. In con- 
trast, recent legal reforms have largely re- 
stricted such commitment to situations in 
which the prospective patient is perceived as 
potentially dangerous to himself or others 
(Cocozza & Steadman, Note 2). Although 
psychiatric predictions of dangerousness are 
admittedly inaccurate (Steadman, 1973; 
Stone, 1975), it does seem probable that 
these restricted criteria alter the nature of 
the patient population and enhance the 
probability that discharged patients may be 
at greater risk for subsequent dangerous be- 
havior. 

Duration of commitment and discharge 
policies have also changed. Forty years ago, 
patients were often committed for decades 
or for life in large state facilities, physically 
removed from their communities of origin. As 
Pollock (1938) described in detail, patients 
who were released had to meet a series of 
requirements, not only with respect to their 
psychiatric status and social functioning but 
also regarding available familial resources. 
Once these recovered, socially acceptable, 
and cooperative patients with welcoming and 
protective friends and relatives were pa- 
roled, their progress was monitored by an 
elaborate, extensive, and compulsory system 
of home and clinic visits. If parole authorities 
were notified of a change in status, the patient 
could be rehospitalized at once. Although 
New York State’s hospital system may have 
been more cautious about discharges than 
others (Giovannoni & Gurel, 1967), this gen- 
eral system prevailed nationally. 

Today in New York State facilities as else- 
where, the average patient stay is 30 days. In 
acute-care units, patients are usually dis- 
charged within 2 weeks. Nationally since the 


1950s hospital stays have become progres- 
sively shorter, and hospital populations have 
declined although admissions have not, lead- 
ing to many more discharged patients in the 
community. Since the late 1960s heavy em- 
phasis has been placed on community-based 
treatment on a voluntary basis. As soon as 
the acute symptoms of hospitalized patients 
subside, usually with the help of psycho- 
tropic medication, patients are discharged. 
There is no compulsory aftercare or follow-up. 
In short, virtually all patients admitted in the 
past several years are promptly returned to 
the community, The state mental hospital’s 
traditional role as a “warehouse for the un- 
wanted” has been transformed into a brief 
way station for the most acutely disturbed. 
The second major change concerns a shift 
in the way the civil machinery distributes 
disruptive members of the community. In the 
past, a large majority of offenders were sent 
to jail regardless of their mental status, Since 
World War II it has become increasingly 
common for offenders who appear to be men- 
tally disturbed or who have a history of 
psychiatric hospitalization to be dispatched 
to the mental health rather than to the 
criminal justice system. This phenomenon 
seems related to the fact that, in most juris- 
dictions, the criminal justice system is inun- 
dated with far more defendants than can be 
handled. Such overcrowding evidently has 
encouraged the use of psychiatric hospitali- 
zation as an alternative method of removing 
disturbing people from the community. Con- 
sequently, an increasing number of mental 
patients have police records. It is reasonable 
to expect for this reason alone that, as all 
patients are promptly discharged, mental 
patients returning to the community will, as 
a group, have a higher arrest rate than in 
the past when the patient population had a 
different composition. As Giovannoni and 
Gurel (1967) have stated the point, “It is 
primarily in the way mental hospitals are 
utilized by the community, and particularly 
as this influences the kinds of patients ad- 
mitted and the number and kinds of patients 
released, that one is likely to find the major 
sources of variation in ex-patient crime rate" 


(p. 151). 
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What Are the Best Predictors of 
Postdischarge Arrests? 


It has been repeatedly and convincingly 
demonstrated that the small subset of pa- 
tients who have prior criminal records ac- 
counts for a large majority of postdischarge 
arrests. This is the single best predictor so 
far identified. Other predictors, which are 
significantly associated with postdischarge 
arrests but not large in their effect, include 
male sex, youth, and unmarried status. Short 
hospital stay and diagnoses of alcoholism, 
drug abuse, and personality disorder are also 
associated with arrest risk when analyzed 
separately, but their predictive value seems 
to be largely accounted for by their relation- 
ship to sex and age, as Steadman et al. (Note 
5) demonstrated in their multivariate analy- 
ses, Despite the consistency with which these 
predictors have been identified, their effect 
size is evidently small. Much remains to be 
learned in the prediction of disruptive, il- 
legal behavior by mental patients after their 
return to the community. 


What is the Association Between Arrest Risk 
and Diagnostic Category? 


It is generally believed that patients diag- 
nosed as personality disorders, alcoholics, and 
drug abusers are the most likely to display 
antisocial and aggressive behavior. There is 
less consensus about whether these three 
diagnostic labels describe mental illness. In 
some epidemiological studies of true preva- 
lence rates, investigators systematically ex- 
clude these categories (e.g, Gove & Tudor, 
1973) and focus exclusively on neuroses and 
functional and organic psychoses. In addi- 
tion, states vary in their policies of admit- 
ting patients with a primary diagnosis of 
alcoholism more than once to state facilities, 
so that their proportion of the total patient 
population varies by area as a function of 
administrative policy. 

In those studies that generated interpreta- 
ble findings about crime rates of different 
diagnostic groups in relation to their size in 
the patient sample under study, alcoholics, 
addicts, and personality disorders were in 
each case found to have excess rates (Brill & 


Malzberg, 1962; Durbin et al., 1977; Zitrin 
et al., 1976; Steadman et al., Note 5). The 
consistency of these results lends them credi- 
bility. The effects of age and sex must be 
considered, however, in view of Steadman et 
al.’s observations regarding their associations. 
Because two of the four studies dealing with 
diagnostic differences were based on male 
samples only (Brill & Malzberg, 1962; Dur- 
bin et al., 1977), sex is evidently not a 
critical factor. The role of age remains un- 
clear. Giovannoni and Gurel (1967) found, 
in their group of older, chronic schizophrenic 
males, that nearly three quarters of those 
arrested had drinking problems. More direct 
evidence would be available by studying dif- 
ferential diagnostic roles in an age-stratified 
sample to see if the differences prevail at 
various ages, At present, one may conclude 
that patients with diagnoses of personality 
disorders, alcoholism, and drug dependency 
have disproportionate arrest rates. Whether 
this is caused by the nature of their disorders 
or by demographic characteristics associated 
with their distribution in the general popula- 
tion remains unclear, 

Evidence is less consistent regarding the 
subsequent criminal activity of patients diag- 
nosed as schizophrenic. In those studies in 
which schizophrenics constituted half or less 
of the patient sample, their arrest rates were 
not disproportionately high. In the most 
detailed analysis of arrests by diagnostic 
group, conducted by Zitrin et al, schizo- 
phrenics were found overrepresented among 
those patients who committed violent crimes 
with bodily harm but not overrepresented in 
terms of overall arrest rates for all crime 
categories combined. In two samples in which 
all or most of the patients were schizophrenic 
(Giovannoni & Gurel, 1967; Sosowsky, Note 
3, San Mateo sample), arrest rates were much 
higher for patients than for control popula- 
tions, especially for violent crimes; that is, 
schizophrenic samples had the same excess of 
crimes, compared to control groups, that 
characterized samples with predominantly 
nonschizophrenic patients. From the prelimi- 
nary evidence available, it seems that arrest 
rates of schizophrenics do not exceed those 
of other diagnostic groups, with the possible 
exception of arrests for violent crimes (homi- 
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cides and assaults) in which their rates may 
be higher. Critical and uncontrolled are the 
factors of age and social class, which require 
evaluation before firmer conclusions can be 
derived. 


Are Mental Patients More Likely To Be 
Arrested for Certain Types of Crime? 


Traditionally, it was believed that mental 
patients, when arrested at all, were charged 
with minor offenses such as loitering, va- 
grancy, or public intoxication. It was an un- 
pleasant surprise to learn not only that pa- 
tient arrest rates were the same or greater 
than those of the general population in re- 
cent years but that this excess was particu- 
larly pronounced in the category of felonies, 
and specifically, of violent crimes or crimes 
against persons, Six investigators, using eight 
patient samples, found higher arrest (and/or 
conviction) rates for the crimes of homicide 
and assault among patients than among con- 
trol groups (see Table 2). The magnitude of 
this excess ranged from 14 to 29 times greater 
than general population rates. No studies 
reported contrary findings. The consistency 
and size of the differences across patient sam- 
ples that vary in time, place, and diagnostic 
composition make the differences convincing. 
It seems reasonable to conclude that mental 
patients are more likely to be arrested for 
assaultive and sometimes lethal behavior than 
are other people. 

No clear pattern emerges regarding the 
relative frequency with which other crimes 
are charged against mental patients. Rappe- 
port and Lassen (1965, 1966) found the high- 
est arrest rates for robbery by males and 
for aggravated assault by females. Giovan- 
noni and Gurel (1967) reported that after 
violent crimes, which accounted for 2796 of 
police contacts among their patients, the next 
most common offenses were drunkenness 
(2396), motor vehicle thefts (1395), and 
crimes against property (12%). Durbin et 
al. (1977) reported a high incidence of drug 
offenses among their predominantly alcoholic 
patients, and Steadman et al. (Note 5) found 
the most common offenses in their 1968 sam- 
ple to be drug, property, and violent crimes; 
the most common offenses in their 1975 sam- 


ple were property, sex, and violent crimes, In 
short, apart from assaultive behavior, men 
patients are apparently no more likely to 
arrested for some crimes than others, judging 
from the limited evidence at hand. 


Does Hospitalization Reduce the Probability 
of Recidivism Among Patients With 
Prior Arrest. Records? 


This question is interesting because a posi- 
tive finding would provide support for the 
notion of a causal association between mental 
status and crime, in contrast to the opinion 
of several investigators that disturbed and 
criminal behaviors may coexist but are 
causally independent. No definite answers can 
be generated by analysis of recidivism rates 
among patients in the absence of data re- 
garding recidivism among nonhospitalized 
offenders. The only authors who tried to as- 
sess the latter were Brill and Malzberg 
(1962), but they did not have access to 
reliable recidivism data for the general crimi- 
nal population. 

Despite the absence of control data, several 
authors have discussed the issue of recidivism 
rates among patients. Cohen and Freeman 
(1945) were the first to suggest that hospital- 
ization may reduce recidivism, based on their 
finding that only 26% of patients arrested 
before being hospitalized were also arrested 
after discharge, The only others to be im- 
pressed with a decline in arrests after hos- 
pitalization among patients with records were 
Durbin and his colleagues (1977). Of the 43 
men in their sample who were arrested in 
the 5 years before hospitalization, only 8, or 
19%, were also arrested within 5 years after 
discharge. They concluded that “factors as- 
sociated with hospitalization . . . may have 
influenced the reduction in arrest rates after 
hospitalization" (p. 83). 

Other investigators who discussed recidi- 
vism disagreed. Rappeport and Lassen (1965) 
stated, “As best we can interpret our data, 
there was a tendency toward recidivism not 
unlike that seen in the general community." 
Brill and Malzberg (1962) and Steadman e 
al. (Note 5) interpreted their findings simi- 
larly. Zitrin et al, (1976) only reported 
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Table 6 
Postdischarge Arrest Rates of Mental Patients With and Without Prior Arrest Records 
Percentage Percentage 
arrested arrested 
Years of No. of patients Follow-up with prior without prior 
Authors discharge with prior records period records records 
Cohen & Freeman 1940 314 М = 2 26% <1% 
(1945) to (18% of sample) years 
1944 
Brill & Malzberg 1947 803 5} years 34% 2% 
(1962) males 
Zitrin, Hardesty, 1969 64 2 years 56% — 
Burcock, & Drossman to (only those arrested 
1976) 1971 for violent crimes) 
Steadman, Melick, 1968 343 19 months 31% 2% 
& Cocozza males 
(Note 5) 
Steadman, Melick, 1975 435 19 months 29% 4% 
& Cocozza males 
(Note 5) 
Durbin, Pasewark, 1969 43 5 years 19% 3% 
& Albers 
(1977) 


recidivism rates for the 85 patients in their 
sample arrested for crimes of violence for 2 
years before or after the hospitalization 
under study. One third were arrested only 
before, one quarter only after, and 42% both 
before and after their hospital stay, produc- 
ing a total recidivism rate of 56% among 
patients who committed one or more crimes 
of violence. This rate is the highest of any 
reported, as seen in Table 6. 

In summary, reported recidivism rates for 
arrests among discharged mental patients 
range from 19% to 56%. These rates were 
derived from different periods of record sur- 
veillance, include different crime categories, 
and were gathered over a period of 30 years. 
In contrast, only 296—495 of patients without 
prior arrest records were arrested within 5 
years after discharge. This analysis of 
recidivism rates reinforces the conclusion 
noted earlier that a history of prior arrests 
is a useful predictor of arrest risk after dis- 
charge. Without equivalent recidivism rates 
for the nonhospitalized criminal population, 
no conclusions are warranted regarding the 


impact of hospitalization on the risk of fu- 
ture antisocial behavior. 


Conclusions 


From the information presently available, 
it seems that discharged mental patients as a 
group are not significantly less likely than 
others to exhibit dangerous or illegal behavior. 
At the present time there is no evidence that 
their mental status as such raises their ar- 
rest risk; rather, antisocial behavior and men- 
tally ill behavior apparently coexist, particu- 
larly among young, unmarried, unskilled, poor 
males, especially those belonging to ethnic 
minorities. It is unlikely that most people 
would care to have such neighbors even in 
the absence of a history of psychiatric hos- 
pitalization. 

The major factor associated with increases 
in arrest rates of discharged mental patients 
in recent years is the increased proportion of 
mental patients who have arrest histories 
before their hospitalization. For males in 
New York State, for example, this proportion 
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has risen from 15% to 40% in the past 30 
years. This change of hospital clientele seems 
to represent an alteration in application of 
the civic machinery according to which the 
mental health system is being increasingly 
used as an adjunct to the criminal justice 
System. 

Arrests are fairly infrequent events even 
when mental patients are inappropriately 
considered as a single group. In the 18 months 
after discharge from New York State hos- 
pitals in 1975, for example, 90% of patients 
were not arrested. When patients with arrest 
histories, primary diagnoses of substance 
abuse, and personality disorders are consid- 
ered separately, the remainder of the patient 
group appears to be considerably less dan- 
gerous than are those members of the gen- 
eral public who are not mentally ill, 

It is recommended that future studies em- 
phasize selection of equivalent comparison 
groups, integration of patient data with ar- 
rest and recidivism rates of the nonhospital- 
ized population, greater consideration of geo- 
graphic issues, and improved data analysis. 
The majority of both mental patients in pub- 
lic institutions and criminals come from simi- 
lar population subgroups. Both have dispro- 
portionate numbers of poor, unskilled, un- 
educated, unmarried young men, many of 
whom belong to low-status ethnic groups. If 
arrest rates of mental patients were compared 
to those of their nonhospitalized peers using 
these criteria, it is quite possible that the 
observed excess of arrests of discharged men- 
tal patients would no longer be apparent. 
This does not detract from the validity of 
the conclusion that as a group mental pa- 
tients are more likely to be arrested, espe- 
cially for crimes of violence, than is the total 
population, the majority of whom are obvi- 
ously not poor, unmarried males that belong 
to ethnic minorities. 

Based on the limited evidence available, I 
conclude that patients discharged from men- 
tal hospitals are not, by virtue of their psy- 
chiatric disorders or hospitalization experi- 
ence, more prone to engage in criminal activ- 
ity than are people demographically similar 
to them who do not have a history of mental 
illness. Although patients considered as a 
group do have higher arrest rates than non- 


patients considered as a group, it is largely 
because the patients include in their midst a 
disproportionate share of people with prior 
police records, The most immediately obvious 
method of reducing criminal activity among 
discharged mental patients is to reexamine 
and modify current procedures that con- 
tribute inappropriately to the use of mental 
hospitals as alternatives to the criminal 
justice system. 
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Comment on Banks’s “White Preference in Blacks: 
A Paradigm in Search of a Phenomenon” 


John E. Williams 
Wake Forest University 


Banks concluded from the studies he reviewed that the evaluative preference 
and self-identification responses of Afro-American children toward stimulus al- 
ternatives representing light- and dark-skinned persons conformed to simple 
chance rather than indicating a “white preference in blacks.” This interpreta- 
tion is challenged as misleading because of Banks's dismissal of the importance 
of comparisons by race in the literature cited and because of his failure to cite 
a number of relevant studies of race and color bias, the results of which are 


inconsistent with his conclusion. 


In a recent Psychological Bulletin article, 
Banks (1976) reviewed a number of studies 
in which young Afro-American * children had 
made preference and self-identification re- 
sponses to light- and dark-skinned persons, 
as represented by dolls, puppets, line draw- 
ings, and photographs. Banks concluded that 
the findings of these studies, viewed in toto, 
could be attributed to chance and, hence, 
provided no systematic evidence of a bias 
favoring light-skinned persons among Afro- 
American children. Having been active in 
this area of research, we feel that Banks's 
conclusion is misleading and requires a reply. 
We are concerned primarily with two mat- 
ters; first, Banks's abrupt dismissal of the 
relevance of comparing the responses of Afro- 
American children to those of Euro-Ameri- 
can children and other groups, with a conse- 
quent distortion of the significance of much 
of the research cited; second, Banks's omis- 
sion of the findings of additional relevant 
studies that are in conflict with his conclu- 
sion, particularly recent and more methodo- 
logically sophisticated studies of preschool 
children's attitudes toward race and color. 

We begin by acknowledging that Banks 
was correct in pointing out that in most of 
the studies he cited the racial preference and 


Requests for reprints should be sent to John E. 
Williams, Department of Psychology, Wake Forest 
University, Winston-Salem, North Carolina 27109. 


Copyright 1979 by the American Psychological Association, Inc. 0033-2909/79/8601-0028500.75 


28 


J. Kenneth Morland 
Randolph-Macon Woman's College 


self-identification responses of Afro-Americat 
children can be attributed statistically to 
chance. This was the case even when а mai 
jority of the Afro-American children ga 


more than half of preschool Afro-America 
respondents indicated that they would pref et 
to play with the Euro- rather than with 
Afro-American models, that they looked morg 
like the Euro- than the Afro-American mod: 
els, and that their mothers looked more lik 
the Euro- than the Afro-American model 
(Morland, 1962, 1963). This was also true 
for another study Banks cited in which pre 
school Afro-Americans who had demonstrated 
they could use racial classification terms 
correctly were asked to which race they them 
selves belonged: 52% responded correctly, 
16% did not know or refused to say, ап 
32% responded incorrectly (Morland, 1958) 
In other words, half of the Afro-America 


1 We prefer to use the terms Afro-American 2 
Euro-American rather than black and white to des 
ignate these racial categories for reasons we hav 
explained elsewhere (Williams & Morland, 1976, 
x-xi). | 

a such studies, the racial classification term 
best known to the children are checked empirical 
at the time of each testing. The racial designatio! 
of Euro-Americans as white has not changed; ho 
ever, racial designations of Afro-Americans havi 
changed. In the 1958 study it was colored; later 
shifted to Negro; now it is black. 
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` children made correct selí-classification by 


race, and half did not. Subsequent studies of 
preschool Afro-American children have 
yielded similar results (Morland, 1963, p. 
239; 1969, p. 368; Savory, Note 1). 

One might assume from such apparently 
inconsistent responses that racial self-classifi- 
cation is unimportant in American society 
and/or that preschool American children are 
not old enough to recognize and attach sig- 
nificance to racial differences. There is con- 
siderable evidence, however, to show that 
racial classification has been and continues to 
be important in American society, and, fur- 
ther, the assumption can be supported that a 
societal norm exists that calls for Americans 
to identify with and be proud of the race to 
which society says they belong (Williams & 
Morland, 1976, pp. 3-32, 251). Are preschool 
American children old enough to have learned 
to recognize racial differences and to know 
their own racial classification? It is at this 
point that the comparison by race becomes 
important, In the Morland studies referred to 
above, Euro-American preschoolers did not 
respond randomly; rather, they chose the 
Euro- over the Afro-American models in 
every response at statistically significant 
levels, according to the criterion percentages 
employed by Banks (1976, Table 1, p. 1180). 
The Euro-American preschoolers chose the 
Euro- rather than the Afro-American models 
as the ones they preferred to play with, the 
ones they looked more like, the ones they 
would rather be, and the ones their mothers 
looked more like; and of those who had dem- 
onstrated that they knew how to use racial 
classification terms, 99.5% said they were 
“white” (Morland, 1958, 1962, 1963). It is 
clear from these findings that preschool 
American children can learn to recognize 
race differences and make correct racial self- 
classification, 

While Banks (1976) realized that differ- 
ences in response by race had been found, he 
appears to have dismissed the relevance of 
comparisons by race because of his disagree- 
ment with what he saw as their implicit 
rationale, namely that “same-race choices of 
white subjects may be believed to represent 
an a priori standard of rational behavior” or 
«а standard of mental health" (p. 1180). 


There is, of course, a more general rationale 
for making these comparisons by race. It is 
the desire to increase understanding of the 
development of racial preference and iden- 
tity by seeing if they are significantly related 
to the race of the respondent. The same 
rationale applies to comparisons by age, sex, 
socioeconomic status, and region, which are 
found in the studies Banks cites. If it is 
granted that racial classification is impor- 
tant in American society, that a societal 
norm for own-race identification exists, and 
that preschool American children are old 
enough to recognize and attach significance 
to racial differences, the question arises from 
the studies Banks reviewed as to what else 
there is about American society and the na- 
ture of racial bias that leads to such different 
responses in young American children by race. 
Banks avoided dealing with this question, 
which we believe deserves serious pursuit. We 
have done this by comparing racial identity 
and attitude responses of children in other 
societies, noting how social structure, norms, 
and reactions to color are related to racial 
self-identity and attitudes (Morland, 1969; 
Morland & Williams, 1969; Williams, Mor- 
land, & Underwood, 1970; Morland, Note 
2). From these and other studies, we have 
constructed what we hope will be a useful 
theoretical model for increasing understand- 
ing of the development of race bias in young 
children (Williams & Morland, 1976, pp. 
280-283). 

Related to Banks's (1976) dismissal of 
comparison by race is the inaccuracy that 
comes in citing the 1958 and 1963 studies by 
Morland to support his assertion: “Reliance 
upon white comparative frames has largely 
perpetuated the notion of black self-rejection 
.. .? (p. 1185). In neither of these studies 
was any mention made of racial self-rejection, 
and it is indeed strange that Banks failed at 
this point to bring in the 1962 study by Mor- 
land, which Banks used in other places in his 
article, That study explicitly stated as one of 
its conclusions: “Preference for one race did 
not imply rejection of the other” (Morland, 
1962, p. 279). This coriclusion was based on 
the racial acceptance measure in which no 
choice was involved. Both the Afro- and Euro- 
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American respondents readily accepted models 
of both races, and rejection based on race was 
exceedingly rare. This absence of racial self- 
rejection has also been explicitly pointed out 
in othe: studies by Morland (1966, p. 26; 
1969, pp. 364-366; Note 3, pp. 5-6; Note 4, 
p. 6) and in the review by Williams and Mor- 
land (1976, pp. 191-192). 

In analyzing the studies he reviewed, Banks 
did not make a distinction between responses 
of preschool and in-school Afro-Americans. We 
have discovered that unless this is done, the 
conclusions can be misleading. In our analysis 
of a number of studies, including several of 
those cited by Banks, we found that in-school 
Afro-American children were significantly 
more likely than preschool Afro-Americans to 
prefer and identify with Afro-American models 
(Williams & Morland, 1976, pp. 176-201). 
For example, less than 2% of in-school Afro- 
American children of high racial classification 
ability gave incorrect racial self-classification 
responses. This was in marked contrast to the 
preschool Afro-Americans of high racial clas- 
sification ability of whom as many as 30% 
gave incorrect racial self-classification re- 
sponses. This difference between preschool and 
in-school Afro-American children questions 
Banks's (1976) contention that the high level 
of own-race preference and identification by 
Euro-American children could be an expres- 
sion of their “ethnocentrism” (pp. 1180, 
1185). By the time Afro-American children 
enter public school, they evidently respond in 
a similar way to Euro-American children in 
own-race preference and identification. Rather 
than being “ethnocentric,” these responses can 
be accounted for as a recognition by the chil- 
dren of the racial category to which American 
society says they belong and as an expression 
of the American norm that persons should be 
proud of and accept their racial affiliation. 

Banks’s (1976) paper was submitted in 
1975 and the studies he reviewed appear to 
cover the years from 1939 through 1973 
(Table 3, p. 1184). We have already cited 
several studies of that period that were not 
referred to, for example, those by Morland in 
1966 and 1969, Banks*also omitted a number 
of research studies of racial attitudes in Afro- 
American children published during 1973— 
1975, studies that used different techniques 


from those he cited: Best, Smith, Graves, 
Williams, 1975; Mabe and Williams, 1975: 
Spencer and Horowitz, 1973; Williams, Be: 
and Boswell, 1975; Williams, Best, Boswell, 
Mattson, and Graves, 1975; and Williams, 
Williams, and Beck, 1973, Also neglected were 
several doctoral dissertations and master’s 
theses: Baugher, 1973; Н. P. McAdoo, 1970; 
J. L. McAdoo, 1970; Skinto, 1969; Vocke, | 
1971; Walker, 1971; Whiteside, 1975. We 
have reviewed the findings from the forego- 
ing studies elsewhere (Williams & Morland, 
1976). Without exception, these studies have 
provided evidence of a tendency for young 
Afro-American children to evaluate representa- 
tions of light-skinned persons more favorably 
than those of dark-skinned persons. 

Three of these studies will be noted for il- 
lustrative purposes. H. P. McAdoo (1970) ad- 
ministered the 12-item racial attitude pro- 
cedure devised by Williams and Roberson 
(1967) to preschool Afro-American children 
in Michigan and Mississippi. It was found 
that the mean scores in both groups departed 
significantly from chance in the direction of: 
pro-light-skin bias. A second study was con- 
ducted by Spencer and Horowitz (1973), who 
studied the modification of color and race bias 
among Afro- and Euro-American preschoolers 
in Kansas. Although the modification proce- 
dures were found to be generally effective, the 
authors reported that both Afro- and Euro- 
American children in the nontreated control 
group displayed a bias for light-skinned per- 
sons throughout the study. A third study is 
that by Williams, Best, Boswell, Mattson, and 
Graves (1975), who described the develop: 
ment and standardization of the Preschool 
Racial Attitude Measure П (PRAM II). In 
this 24-item picture-story procedure, the child 
selects between drawings of light- and dark- 
skinned persons as the one described in an 
accompanying story that contains 1 of 24 
evaluative adjectives. The PRAM II scores 
have a 24-point range and, thus, provide à 
more sensitive measure of attitude than the 
“one-item tests" reviewed by Banks (1976; 
Table 2, p. 1182). When PRAM II was ad- 
ministered to Afro-American preschoolers in 
North Carolina, the mean scores were fount 
to depart significantly from chance in the di 
rection of pro-light-skin bias but not to as €x- 
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treme a degree as among Euro-American chil- 
dren of comparable age. 

Additional light has been thrown on the 
development of race bias in children through 
research on racial attitudes in other societies. 
As illustrations, we note the recent studies em- 
ploying the PRAM II procedure with pre- 
school children in France, Italy, Germany, 
and Japan, who have had little direct contact 
with dark-skinned persons or with concepts of 
race associated with color (Best, Field, & 
Williams, 1976; Best, Naylor, & Williams, 
1975; Iwawaki, Sonoo, Williams, & Best, 
1978). The findings from these studies indi- 
cated that these foreign children also displayed 
pro-light-skin bias, which suggests to us that 
contact with racial concepts of the type en- 
countered in the United States is not necessary 
for the development of such bias. 

Banks (1976) stated in the abstract of his 
article that his concern was with the “choice 
behavior among blacks toward white and 
black stimulus alternatives” (p. 1179). In 
view of this, it is surprising that he made no 
reference to studies of the evaluative responses 
of Afro-American children to the colors black 
and white (Skinto, 1969; Stabler, Johnson, 
Berke, & Baker, 1969; Stabler, Johnson, & 
Jordan, 1971; Vocke, 1971; Williams, Bos- 
well, & Best, 1975; Williams & Rousseau, 
1971). As an illustration of the findings from 
these studies, Williams and Rousseau (1971) 
administered a 12-item picture-story proce- 
dure to Afro-American preschoolers who were 
asked to choose between a white or a black 
animal as the one described in the associated 
story. It was found that the children displayed 
a significant tendency toward the association 
of positive evaluative adjectives with the color 
white and negative evaluative adjectives with 
the color black. Similar findings were reported 
by Williams, Boswell, and Best (1975), al- 
though the degree of prowhite bias among 
Afro-American children was not as extreme as 
that found among Euro-American children of 
comparable age. We think it is instructive 
also to note that a similar prowhite bias has 
been documented among preschool-aged chil- 
dren in a number of other countries including 
England, Scotland, France, Germany, Italy, 
and Japan (Best, Field, & Williams, 1976; 
Best, Naylor, & Williams, 1975; Dent, 1976; 


Iwawaki, Sonoo, Williams, & Best, 1978). 
These findings appear to parallel the findings 
from studies with young adults, which point 
to a pan-cultural tendency to evaluate white 
more positively than black (Adams & Osgood, 
1973; Williams, Morland, & Underwood, 
1970). 

We agree with Banks that none of these 
findings should be interpreted as evidence of 
“racial self-rejection” by Afro-American chil- 
dren, a view that is supported by the research 
findings on race and self-esteem that have 
failed to indicate any consistent differences be- 
tween Afro- and Euro-American children, Our 
theory, which is elaborated elsewhere (Wil- 
liams & Morland, 1976), is that when pro- 
light-skin bias and pro-white-color bias are 
encountered in preschool children these phe- 
nomena can be viewed, most parsimoniously, 
as reflections of a general prolight/antidark 
bias. We propose that this bias originates in 
most young humans as a result of early learn- 
ing experiences (e.g., with the light of day and 
dark of night) and is subsequently reinforced 
through social processes, occurring in almost 
all cultures, in which light is used to symbolize 
“goodness” and dark to symbolize “badness.” 

By way of summary, we feel that Banks’s 
refusal to deal with comparisons by race in 
the studies he reviewed and his failure to | 
consider the findings from other more meth- 
odologically sophisticated studies of attitudes 
toward race and color have led him to an 
over-simplified and misleading conclusion. Pro- 
light-skin bias among preschool Afro-Ameri- 
cans is not a particularly powerful phenome- 
non, It exists, however, and needs to be recog- 
nized by all persons who are concerned with 
the development of young children in our 
multiracial American society. 
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]t has been contended that, in contrast to the argument set forth by Banks 
(1976), the phenomenon of white preference does obtain among blacks, albeit 
minimally, and further that such a phenomenon is important in its relationship 
to and implications for the social adaptation of such persons. The present paper 
argues, however, that even within those examples set forth of the significant 
demonstration of white preference in blacks, insufficient evidence is presented 
to reject the null hypothesis of nonpreference. It is further considered whether 
empirical evidence supports the validity of such a phenomenon as a measure of 
self-concept or social behavior preferences and whether therefore the "impor- 
tance" of white preference as it obtains in blacks, or differentiates blacks from 


whites, can be sustained. 


White preference among blacks is, as stated 
by Williams and Morland (1979), “not a par- 
ticularly powerful phenomenon” (p. 31). A 
serious question is whether it is a phenomenon 
at all (see Banks, 1976), although it must be 
conceded that it can be expected, as much as 
any other event of behavior, at least to occur 
in some of the people some of the time. In 
this regard, Williams and Morland (1979) 
have presented certain evidence omitted from 
an earlier discussion by Banks, which they 
argue provides support both for the existence 
and the importance of the phenomenon of 
white preference in blacks. However, we 
should consider this evidence in light of the 
accepted criteria by which we separate those 
things that exist some of the time from those 
that have reliable and predictable occurrence 
in the behavior of a specifiable population and 
those things that are felt to be important from 
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those that have a demonstrable validity, Even 
if it were considered for the moment that 
such a phenomenon does exist, the question of 
its meaning and importance would remain. 
Since in the analysis of importance, the ques- 
tion of existence must inevitably be raised, we 
focus here primarily upon the question of 
importance with respect to the three specific 
examples of that evidence cited by Williams 
and Morland (McAdoo, 1970; Spencer & 
Horowitz, 1973; Williams, Best, Boswell, 
Mattson, & Graves, 1975) as supportive of 
both the existence and the importance of 
white preference in blacks. For this purpose 
we might take, for example, the framework 
suggested by Cronbach and Meehl (1955) in 
which there are three levels at which we may 
ask the question of whether white preference, 
as measured within the common paradigms, is 
a valid phenomenon among blacks." 


1 Webster's New Collegiate Dictionary (1974) de- 
fines the word "important" as pertaining to some- 
thing that is "valuable in content or relationship," 
which is close to the concept of validity as it applies 
to the content or criterion significance of psychologi- 
cal measures. б 
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Content Validity 


The content validity of a measure of white 
preference would rely upon the specification 
of a universe of content that is accepted as 
defining the phenomenon and the demonstra- 
tion that the measure constitutes a sample 
of that universe. Such measures as those in 
which subjects are asked to express choices 
of dolls, puppets, and hypothetical other chil- 
dren may, in this sense, derive their validity 
solely from the fact that they sample re- 
sponses directly representative of those to 
which they wish to generalize in the real 
world. However, the overall content validity 
of such measures as indicative of white prefer- 
ence in blacks would rest upon both the repre- 
sentativeness of the stimulus items themselves 
and the ability of such items to evoke white- 
preference choices. One instrument whose 
ability in this regard has already been asserted 
is described by Williams, Best, Boswell, Matt- 
son, and Graves (1975). 

In their procedure, subjects are asked to 
make preferential selections between white- 
and black-representative stimuli. 

Williams et al. (1975) reported a binomial 
test of probability for the within-subjects fre- 
quencies of white-stimulus choices across the 
24 replicated items of their Preschool Racial 
Attitude Measure II (PRAM) instrument. 
Moreover, they reported that 60% of white 
subjects selected the white-representative 
stimuli more often than would be expected by 
chance across the 24 within-subjects replica- 
tions, as did 39% of black children. However, 
their contention of the significance of the 
choice behavior of 39% of blacks who chose 
the white stimuli on 17 or more of the items, 
while intuitively compelling, is statistically 
inappropriate, due to the nonindependence 
(within-subjects derivation) of those fre- 
quency observations, While this within-sub- 
jects replication of preference choices may 
yield a more reliable categorization of re- 

spondents into white-preference and other 
classifications than did the various one-re- 
sponse procedures of past research, it is these 
between-subjects category frequencies that are 
appropriately testable by binomial and related 
analyses. In this regard, fully 72% of white 


W. C. BANKS, G. V. McQUATER, AND J. A. ROSS 


children chose the white stimuli at (within 
subjects) rates that led Williams et al. (1975) 
to label them as indicating “definite” or “prob 
able” bias. While 52% of black subjects fe 
within these two categories, fully 58% could 
have been expected to do so by chance alon 

Williams and Morland have also referred 
to the “significant” white-preference fre 
quencies reported by Spencer and Horowitz 
(1973). The choices of the control-group 
blacks to which Williams and Morland refel 
as having displayed white preference were re 
ported only in terms of the mean percentage 0 
white choices (approximately 7095). Within 
a sample of eight, this magnitude of frequen 
cies across subjects would not reject the null 
hypothesis, since a percentage of roughly 85 
would be required at the .05 level. However, 
since the data were reported as mean percent 
of choices within subjects, we might instead 
attempt to assess its significance by a /-(еѕї 
comparison against 50%. Using the error ter 
from the analysis of variance reported b; 
Spencer and Horowitz (1973), this test would 
yield a £ value of approximately 1.31, hardly 
approaching significance even at the .05 level. 
Consequently, the phenomenon of white pref- 
erence in blacks fails to obtain in this example 
as well. In fact, the instrument devised by 
Williams et al. (1975) may contain stimulus 
items fully representative of the domain to 
which we may wish to generalize, but the 
evidence derived from its application provides 
empirical support for a content-valid measure 
of nonpreference in blacks, quite contrary to 
the assertion of Williams and Morland. 


Criterion Validity 


At another level, however, it may be argued: 
that such a strict reliance upon the measured 
phenomenon as inherently content valid sim- 
ply regresses to the analysis and arguments al- 
ready set forth by Banks (1976) and cur- 
rently under criticism. Rather, within the 
framework of criterion validity, the phenome- 
non of white preference as measured within 
these paradigms may derive its importance 
from the fact that it concurs with racial self 
concepts (as a concurrent criterion) or that 
it predicts natural events of racial selection 
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and preference among black children in a real- 
world domain outside the laboratory (as a pre- 
dictive criterion). 

Empirically, it can be said to do neither. 
For example, in the study by McAdoo (1970) 
to which Williams and Morland have referred, 
black children from the North and the South 
were measured concurrently for self-concepts 
and for racial preferences. In this instance, 
the Williams et al. (1975) PRAM measure 
of racial preference, in fact, was found to be 
unrelated to northern black children’s self-con- 
cepts. And contrary to the assumptions of con- 
current-criterion validity that might derive 
from that instrument, measured white prefer- 
ence among southern black youngsters was 
directly related not to negative self-concepts 
but to positive self-concepts. 

This is not an isolated example of the lack 
of criterion validity of preference measures 
for blacks. Porter (1971) investigated the 
racial preferences of black and white children 
and the patterns of their play and friendship 
preferences in the natural social setting of 
school. Although children from both samples 
were found to indicate racially influenced 
biases in their choice behavior within the 
laboratory procedure (similar to that used by 
Williams et al., 1975), Porter reported that 
“race seemed to play little part in determining 
the [predictive criterion of actual] friendship 
patterns" (p. 167). 


Construct. Validity 


It might be argued in response to the points 
discussed above that the phenomenon of white 
preference in blacks lacks sufficient conceptual 
specificity to permit a test of its importance 
or meaning by so precise a set of content- or 
criterion-validation hypotheses. We would con- 
cur. Were it assumed rather that such mea- 
sured phenomena represent some underlying 
hypothetical construct of personality, it would 
remain to be demonstrated that such an in- 
ferred construct has validity. The validation 
of white preference in blacks as representing 
the construct, for instance, of negative selí- 
concept could proceed via systematic observa- 
tions of positive correlations between it and 
logically convergent behavioral phenomena 
and negative correlations between its occur- 


rence and that of logically divergent phenom- 
ena among blacks. In this regard, one might 
expect that blacks would with reasonable con- 
sistency express a conception of their own 
qualities and abilities for successful function- 
ing as lower than those of white persons; one 
might expect, as well, to find that success in 
such domains as school would relate pro- 
foundly to such racially influenced self-assess- 
ments among blacks. Certain empirical evi- 
dence stands in opposition to both the former 
(see, e.g., Wylie, 1963; Wylie & Hutchins, 
1967) and the latter (Coleman, Campbell, 
Hobson, McPartland, Mood, Weinfield, & 
York, 1966; Guggenheim, 1969; Hunt & 
Hardt, 1969; Wolkon, 1971) notion. Simi- 
larly, within a population whose qualities of 
racial identification and valuation are sup- 
posed to work so destructively against a sense 
of personal optimism and worth, one would 
hardly expect aspirations toward intellectual/ 
academic excellence and occupational/socio- 
economic ascendance to obtain. Yet they do 
and most often in measure equal to or beyond 
that of persons whose sense of racial identity 
ought to place them in a relatively superior 
position. (See Boyd, 1952; Brook, 1974; 
Ducette & Wolk, 1973; Gist & Bennett, 1963; 
Phillips, 1972.) 

In summary, scant evidence exists of the 
tendency of blacks to express preferential eval- 
uative orientations toward white characteris- 
tics. Furthermore, the validity of such a phe- 
nomenon as a measurement of content or pre- 
dictive significance for white preference within 
the real world of social choices, self-esteem, or 
racial pride seems equally unsupported by em- 
pirical evidence; even (or, perhaps, especially) 
in the case of those examples of research cited 
by Williams and Morland. Why should we 
retain for the analysis of behavior in blacks 
either (a) a conceptual/methodological para- 
digm in which the null hypothesis pertaining 
to an empirical phenomenon fails consistently 
to be rejected or (b) a sense of the importance 
of a phenomenon whose meaning fails to be 
verified within any of the existing validation 
paradigms? Neither the evidence reviewed 
earlier by Banks (1976) nor that specifically 
cited by Williams and Morland suggests that 


we should. 
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black preschool children. Unpublished doctoral dis- 
sertation, University of Michigan, 1970. 
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Using Quasi F to Prevent Alpha Inflation 
Due to Stimulus Variation 


John L. Santa, John J. Miller, and Marilyn L. Shaw 
Rutgers—The State University 


The nominal alpha level may be very inflated in much of the published litera- 
ture where the conventional F test is used. This alpha inflation is often caused 
by ignoring stimulus variation or treating it as a fixed effect. The present article 
illustrates this problem in a variety of areas and discusses the use of Quasi F 
ratios as a means of achieving generality over both subjects and stimuli. Monte 
Carlo experiments are reported that examine the performance of the Quasi F 
in a variety of realistic situations in which the data violate distribution and 
homogeneity of variance assumptions. In general, the Quasi F has proved to 


be robust. 


It has long been traditional in psychology to 
treat subjects as a random effect in analysis of 
variance (АМОУА). The rationale is simple: 
For each experiment, we select from the popu- 
lation a small sample of subjects, but we want 
our results to generalize beyond this sample. 
Rarely are we interested in presenting our 
data as being pertinent only to the particular 
individuals studied. Both Clark (1973) and 
Coleman (1964) before him have argued that 
the logic that compels us to treat subjects as 
a random effect might also lead us to treat 
stimulus variation as a random effect; that is, 
in many situations psychologists should seek 
generality beyond both the stimuli and sub- 
jects of an experiment. This reasoning led 
Clark to recommend an aNova model in which 
both subjects and stimulus items are treated 
as random effects. Such models are perfectly 
possible to create, but have the disadvantage 
that many hypotheses can no longer be tested 
using the standard F distribution. Instead, 
models with two random effects often neces- 
sitate the use of Quasi F ratios, which permit 
one to form analytic tests on the assumption 
of the standard linear model. Unfortunately, 
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the exact distribution of the Quasi F statistic 
is unknown. However, when degrees of free- 
dom are appropriately adjusted, the conven- 
tional F distribution can be used to approxi- 
mate the Quasi F statistic (see Winer, 1971, 
p. 377). 

Clark applied the Quasi F analysis to a 
large body of semantic memory research and 
demonstrated that treatment effects can be 
significant when stimulus variation is not con- 
sidered but nonsignificant when both stimuli 
and subjects are treated as random effects. 
Following the publication of Clark's article, 
the Quasi F analysis has become common, in 
fact, almost obligatory for research in seman- 
tic memory. 

Clark's arguments about stimulus variation 
and the Quasi F have, however, been vir- 
tually ignored in other areas of psychology. 
In the present article, we argue that the 
Quasi F analysis is appropriate to many 
areas of psychological research and that its 
use should not be restricted to psycholinguis- 
tic investigation of semantic memory. Fur- 
thermore, we demonstrate that the Quasi F 
is quite robust with respect to violations of 
distribution and heteroscedasticity assump- 
tions. 


Applicability of the Quasi F 
The Quasi F analysis is potentially appli- 
cable in any experiment that employs a sam- 
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ple of stimuli, items, or materials drawn from 
a larger possible population. Exceptions to 
this exist whenever stimulus variation is to- 
tally confounded with subject variation or in 
case study experiments of a particular item 
or stimulus set. Excluding these exceptions, 
psychologists are usually at least implicitly 
concerned with generality beyond the sample 
of materials they happen to be using. Most 
experiments would be considered uninterest- 
ing if their results were obtainable for only 
one subject or even one sample of subjects. 
Similarly, results that obtain with only one 
sample of stimulus materials are of question- 
able interest in psychology. Consequently, 
psychologists should be as concerned with 
generality over stimulus materials as they are 
with generality over subjects. 

A few examples might serve to emphasize 
the variety of situations in which a Quasi F 
analysis is potentially appropriate. Consider 
first the situation Clark used to illustrate the 
importance of item variation. Suppose an 
experimenter wanted to test a theory claim- 
ing nouns are more perceptible than verbs. He 
or she might proceed by selecting a sample of 
subjects and assessing each subject's percepti- 
bility threshold for a set of 10 nouns and a 
set of 10 verbs. The experimenter would 
then calculate the average threshold for 

-nouns and verbs for each subject and would 

find that every subject exhibited a lower 
threshold for nouns than for verbs. The 
original hypothesis that nouns are more per- 
ceptible than verbs would appear to be sup- 
ported by the data, but few psychologists 
would be convinced by this experiment. After 
all, the result might only hold for the single 
sample of 10 nouns and verbs. In fact, the 
same result could be obtained if only 1 noun 
were more perceptible than 1 particular verb. 
So one sees in this simple case that item 
variation is obviously important whenever 
items are confounded with (nested within) 
treatments. Such situations appear in many 
areas of psychology other than language re- 
search. 

For example, a number of psychologists 
have recently been interested in whether hu- 
man faces are perceived differently from 
other objects such as buildings. Such experi- 
ments compare reaction time or response 


thresholds to an arbitrarily chosen set of 
building pictures versus a set of face pictures, 
Clearly, individual stimulus pictures are 
nested within treatment condition, and treat- 
ment differences could arise simply from 
differences in particular item perceptibility, 
A similar situation exists for a clinical psy- 
chologist interested in the effects of alcohol 
on homosexual and heterosexual arousal. A 
typical experiment might involve a group of 
subjects given alcohol and a control group 
given soda water. Subjects might then view a 
set of slides, each slide depicting a homo- 
sexual or heterosexual act. The dependent 
variable in such an experiment might be the 
amount of time spent viewing each slide. 
Again, conclusions regarding the effects of 
alcohol on type of sexual arousal would be 
confounded by individual stimuli nested 
within the homosexual and heterosexual slide 
groups—confounded in the sense that the main 
effect of slide type or the interaction with | 
alcohol could arise from one slide or a small 
number of slides. 

Consider next a social psychologist in- 
terested in the effects of sex bias in advertis- 
ing. A typical experiment might involve 
groups of males and females reading and 
evaluating a set of job advertisements. The 
set of advertisements might contain three 
types: biased for females, biased for males, 
and neutral. Assuming there were several ads 
within each of these classes, one again has a 
situation in which the experimental variables 
of interest would be confounded with ma- 
terials. Thus, item variation must be included 
in the analysis to make the other variables 
interpretable. 

At this point it should be clear that the 
potential for confounding variables with item 
variation is present in many areas of psy- 
chology and is not at all restricted to psycho- 
linguistic research. In addition to the specific 
examples we have outlined, it might be noted 
that at least two other large areas of research 
should be particularly concerned with the 
potential confound between experimental 
variables and stimuli: developmental psy- 
chology and studies that employ question- 
naires or tests with subscales. Any time à 
developmental psychologist changes specific 
materials to suit the age and ability of sub- 
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jects, he or she entertains the possibility of 
confounding experimental effects with spe- 
cific item differences. Similarly, the same con- 
founding is present whenever an educational 
psychologist or clinician compares people's 
performance on a pencil-and-paper test. If 
the scores that enter into the analysis are 
summed or averaged over items within sub- 
scales, then there is a potential confounding 
of treatments or groups with individual item 
variation. 

All of our examples so far have been con- 
cerned with situations in which item varia- 
tion is directly confounded with treatment 
variation. Ignoring item variation in these 
situations leads to results that might not 
generalize beyond the specific materials of 
the experiment; and of more importance, the 
treatment effects might not even generalize 
across the various stimuli within the experi- 
ment, Unfortunately, the problem of item 
variation confounding treatment effects does 
not disappear in designs in which stimulus 
materials cross all treatment conditions. Such 
experiments are even more common in psy- 
chology. For example, a social psychologist 
might be concerned with the effects of crowd- 
ing on perceived friendliness. One group of 
subjects might be jammed tightly into a 
small room and asked to rate the friendliness 
of a sample of faces drawn from high school 
yearbook pictures. A comparison group of 
subjects might observe the same pictures while 
seated in a much more spacious, less crowded 
room, In such a situation the effect of crowd- 
ing would not be directly confounded with 
item variation, since both conditions would 
receive the same items. Effects of crowding 
would, however, be confounded with the po- 
tential interaction of Treatment X Item such 
that an effect of crowding might be observed 
on only a small subset of items. If the effect 
of crowding were obtained for only a few 
stimuli, it would probably not be of much 
general interest. In fact, it might well be 
argued that an effect such as crowding on 
perceived friendliness would only be of in- 
terest if the result were generalizable beyond 
the idiosyncratic sample of stimuli used in the 
particular experiment. 

Assuming that our examples are sufficient 
to illustrate the many situations in which 
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stimulus variation can compromise the inter- 
pretation of experimental variables, what can 
be done about this problem? The first step in 
solving the problem is to include individual 
stimuli in one’s experimental design as op- 
posed to the current practice of summing or 
averaging over individual stimuli within 
treatment conditions. The next step is to 
decide whether to treat the item variations 
as a fixed or random effect. Obviously, such 
a decision is not completely arbitrary, but it 
is beyond the scope of the current article to 
debate this issue. For our own part, we tend 
to agree with Clark (1973, 1976) that stimu- 
lus variation should be considered a random 
effect whenever stimuli are arbitrarily drawn 
from a potentially larger population and 
whenever it is desirable to generalize the 
results beyond the set of stimuli used. In 
other words, random for stimuli should mean 
about the same thing as random does for 
selecting subjects. (For other sides of this 
issue consult recent articles by Cohen, 1976; 
Keppel, 1976; Smith, 1976; and Wike & 
Church, 1976.) 

It is important to emphasize that the ex- 
perimenter's decision to treat stimulus varia- 
tion as fixed or random has very important 
consequences for interpretation of the experi- 
ment. Consider, for example, an experiment 
with 7 treatments, K subjects, and J items 
nested within treatments. The design for 
such an experiment is summarized in Table 1 
together with the expected values of the 
mean squares assuming that items are either 
a fixed or a random effect, The important 
point to note is that if item variation is a 
random effect, then the observed estimates of 
treatment variation will include a component 
due to item variation, namely, Кола). Thus, 
assuming item variation is fixed when it is in 
fact random will almost always lead to an 
inflated value for F. Moreover, the inflation 
will be more severe as the number of subjects 
is increased. Even replicating the experiment 
with new subjects and items will not help, 
since each experiment will produce an in- 
flated F value. In fact, the inflation can be 
so large that the actual probability of ob- 
taining the observed F can be 40 to 50 times 
larger than the nominal alpha level, even 
with only 10 subjects (see Forster & Dickin- 
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Table 1 
Analysis of Variance Table for Experiment Used i 
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n Monte Carlo Runs 


Items fixed 


Expected M.S 


Items random 


Source df 

Treatments (A) I-1 
Items within 

treatments (B(A))  I(J— 1) даје 
Subjects (C) K-1 IJa*, 
AXC (IX 1)(K — 1) Jets 
B(A XC IJ — 1)(K — 1) аЗ (ae 

Total IJK —1 


Ја? + ЈК (о?) 


а?а) + Јо + Коа) + ЈК (o1) 
+ Kisw] оа + Ко) 
9% ае + ГЈо?, 
rape + Jo*ne 
Зь аде 


son, 1976). In other words, by randomly 
sampling stimulus materials and then ignoring 
stimulus variation ог treating it as fixed, it 
is very easy to obtain, inappropriately, sig- 
nificant treatment effects. Conversely, if item 
variation is not random with respect to treat- 
ment conditions, then the Quasi F will tend 
to provide an inappropriately conservative 
estimate of the true treatment effect. Such 
an estimate will become more conservative 
as the number of items within treatments is 
increased. Thus, the decision to treat item 
variation as fixed or random is very impor- 
tant and must be consistent with the true 
state of affairs in the experiment if the re- 
sulting statistic is to be meaningful. It is not 
simply a question of the experimenter decid- 
ing to accept more or less generality. Rather, 
it is a question of choosing a statistical 
model appropriate to the experiment. 


Robustness of the Quasi F 


A practical question remains with respect 
to the use of the Quasi F ratio. Does the F 
distribution provide an adequate approxima- 
tion of the Quasi F—one that is useful in 
realistic experimental situations? There are 
already several simulations (Davenport & 
Webster, 1973; Forster & Dickinson, 1976) 
that suggest the F is an acceptable approxi- 
mation of the Quasi F as long as the total 
degrees of freedom are sufficiently large 
(> 18). The Satterwaithe (1946) approxi- 
mation of degrees of freedom for the Quasi F 
leads to a markedly conservative test only if 
the stimulus and Stimulus X Subjects vari- 
ance components are quite small. Thus, the 


Quasi F ratio appears to be a reasonable sta- 
tistic when the normal model holds, How- 
ever, what happens when one applies the 
Quasi F in the messy world of real data? The 
ANOVA model has been useful in psychology 
largely because it is generally robust under 
violations of the normative assumptions. Psy- 
chologists commonly analyze data in which 
the cells of the design do not have equal 
variance or the dependent variables are not 
normally distributed, for example, latency 
data, percentage data, or situations in which 
subjects are either correct or incorrect (0—-1).* 

The present article investigates the per- 
formance of the Quasi F for these typical 
violations of the anova model. Since the 
numerator and denominator of the Quasi F 
ratio are linear combinations of estimated 
mean squares for several sources in the 
analysis, one might suspect that the viola- 
tions typically found in psychological data 
may have serious effects on the accuracy of 
nominal alpha levels with the Quasi F. In 
particular, we were interested in the possibil- 
ity that the Quasi F ratio would be an ex- 
tremely conservative test statistic when errors 
are not normally distributed and when item 
variances are heterogeneous. 

Consequently, we undertook several Monte 
Carlo studies of the Quasi F under a variety 
of violations of the normal-model assump- 


1]t should be noted that the 0-1 situation is usually 
avoided by collapsing over a number of stimulus 
items to obtain a measure with less restricted varia- 
tion. Use of the Quasi F makes it necessary to enter 
each subject's score on each item, leading to binary 
scores for individual entries, 
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tions, The experimental design we used is 
given in Table 1. In this design 10 subjects 
were administered each of 12 items from each 
of four treatment groups. 

The data used in the Monte Carlo experi- 
ments varied in three ways: in the distribu- 
tion of data, in the heterogeneity of item 
variance, and in the combination of values 
for the variance parameters оа), ое, acy 
and oae The experimental data were 
drawn from distributions that were normal, 
exponential, log uniform, binary, or log nor- 
mal. The combinations of item variance 
within treatment groups were either homo- 
geneous (1,1,1,1) or heterogeneous in the 
following ratios for the four treatment 
groups: (1,2,2,5), (1,2,5,10), and (1,2, 
5,20). Finally, the basic parameters of the 
ANOVA model, TP 0*5 Ose; and даје 
were varied to approximate a variety of 
realistic experimental situations (see Table 
2). These parameter combinations can be 
divided into two basic groups: those for 
which the item and subject variances are 
large relative to Subject х Item and Treat- 
ment X Subject variances (Cases 1—4 in Ta- 
ble 2) and those for which subject variance 
and Item Х Subject variance are relatively 


Table 3 
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Table 2 
Combinations of Variance Parameters 
Used in Monte Carlo Studies 
Case 9%) о, x0 ob (ae 
1 9 9 4 2 
2 9 9 2 4 
3 9 9 4 4 
4 9 9 2 2 
5 9 9 9 9 
6 3 18 1 38 
7 2 25 1 28 
8 2 12 1 22 


large (Cases 6-8 in Table 2). For Case 5, 
these variances are all equal. 

Every trial in the Monte Carlo was con- 
ducted as follows: (a) values for the random 
variables, item, subject, Subject x Item, and 
error were generated using pseudorandom 
number generators; (b) these components 
were then added together with the treatment 
effect and the overall mean to form an ob- 
served data point; (c) this set of data was 
then submitted to an anova; and (d) the 
probability of obtaining the observed Quasi F 
value or greater for testing treatment effects 
was recorded using the appropriate degrees of 


Observed Proportions of Rejections (1,000 Runs) of Quasi F 


in Null Case With Normal Errors 


Variance ratio 


Case (1,1,1,1) (1,2,2,5) (1,2,5,10) (1,2,5,20) M 
а = .05 
1 :043 .047 .048 .068 .052 
2 .051 .052 .051 .073 .058 
$ :051 .054 .066 .060 .058 
4 .045 049 .057 075 057 
5 .045 .046 :051 048 .048 
6 .028 .027 .033 -028 .029 
7 .035 .028 .036 .035 034 
8 .032 .042 :031 .036 035 
а = .01 
1 .014 013 012 -019 :015 
2 011 .013 :009 .024 014 
3 .012 :017 .014 .019 :016 
4 .009 .013 .016 .026 016 
5 011 .012 :011 014 - O12 
6 .004 :005 .003 :003 .004 
7 .008 :006 :005 .003 .006 
8 .005 .008 .004 .006 .006 
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ireedom. At the end of 1,000 experiments, 
the proportions of rejections were tabulated 
for alpha levels of .05 and .01. Under the 
null hypothesis, the observed proportions of 
rejections should be very close to these nomi- 
nal alpha levels. In our discussion below we 
follow Clark's (1973) notation so that Quasi 
Е = (MS, + MSsujc)/ (MSc + MSn4)), 
F, = MS,/MS,c, and Е = MS,/MSs(4. 
Table 3 shows the performance of the 
Quasi F under various violations of the 
homogeneity of variance assumption. The 
first column of Table 3 illustrates the rejec- 
tion rate of the Quasi F in the situation of 
homogeneous item variances. In general, the 
Quasi F performs quite well under the nor- 
mal model but tends to be slightly conserva- 
tive. Columns 3-5 of Table 3 show the 
performance of the Quasi F under various vio- 
lations of the homogeneity of variance assump- 
tion up to a 20:1 inequality among treatment 
variances. For Cases 1-4 in which item and 
subject variances are large relative to the 
interaction components, there is a trend for 
heteroscedasticity to give slightly more rejec- 
tions than the nominal alpha, but for Cases 
6-8 heteroscedasticity makes the statistic 


Table 4 


slightly conservative. Overall, the Quasi F 
appears quite acceptable in the face of rather 
severe departures from the homogeneity of 
variance assumptions. 

Next, we examined the Quasi F in situa- 
tions in which the data were drawn from a 
variety of common distributions. Monte Carlo 
results of the Quasi F for the eight combina- 
tions of variance parameters and five types of 
distributions are presented in Table 4, In- 
spection of the table reveals that the Quasi F 
is conservative although not painfully so, 
When item and subject variances are rela- 
tively large (Cases 1-4), the Quasi F tends 
to be very slightly conservative; it is some- 
what more conservative in Cases 6-8 in which 
subject and Item x Subject variances are 
relatively large. 

Averaging over all cases and looking at the 
mean rejection rates reveals that nonnormal- 
ity makes the Quasi F somewhat more con- 
servative than it is with the normal distri- 
bution. The best performance is obtained 
with the normal and log-uniform distributions, 
the worst with the log-normal distribution. 
The exponential and 0-1 distributions tend to 
be intermediate. Among these distributions 


Observed Proportions of Rejections (1,000 Runs) in Null Case for 


Quasi F for Five Error Distributions 


Case Normal Exponential Log uniform Log normal 0-1 
а = .05 
1 :043 :042 .050 .034 047 
2 -051 .039 .048 021 .040 
3 051 046 057 :035 .039 
4 .045 .038 .053 .034 .035 
5 .045 .037 .038 .030 .046 
6 .028 .036 .031 .021 .021 
7 .035 .036 .040 .019 .028 
8 .032 .028 .046 :023 .023 
M 041 .038 .045 :027 :035 
а = .01 

1 014 .007 .008 .003 009 
2 011 .006 .008 .007 .005 
3 :012 .007 :014 .004 .003 
4 .009 .006 .012 .004 .005 
5 O11 O11 .012 .007 :008 
б :004 .009 .003 .002 .002 
7 .008 .004 .005 .001 .002 
8 :005 .003 .005 .004 .005 
M :093 .007 .008 .004 .005 


| 
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the log normal is the closest in approximating 
the shape of reaction time distributions typi- 
cally observed in psychological data; conse- 
quently, it may be advisable to take logs of 
reaction times to normalize the data, Finally, 
it should be noted that although the Quasi F 
performs well, on occasion the observed re- 
jection rate is only one tenth of the nominal 
alpha. The relative discrepancy between real 
and nominal alpha is of course greater when 
the nominal alpha is smaller. 

The results of our simulations make it 
clear that the Quasi F is a useful and robust 
test. Even at its worst, the test is not out- 
landishly conservative. For example, a test 
using a nominal alpha of .05 would almost 
certainly have a real alpha between .02 and 
.06. For most researchers this level of un- 
certainty about a decision is perfectly ac- 
ceptable. 


Power 


Let us briefly consider the question of 
power for the Quasi F statistic as contrasted 
to the F statistic obtained by assuming stim- 
ulus variation to be a fixed effect. It is, of 
course, difficult to compare directly the power 
of these tests unless one assumes a constant 


experimental circumstance, In other words, 
assume a situation in which there is a real 
difference underlying the treatment condi- 
tions and an experiment with K subjects and 
J items, The question of interest is how do 
the Quasi F and the F differ in their ability 
to detect the difference. 

It is relatively easy to evaluate power for 
a conventional F by calculating a noncen- 
trality parameter (NCP) from 


NCP(F)) = 2КФ/ а о 


where d = difference in treatment means. 
Given the NCP and the degrees of freedom 
for the experiment (in our situation, 3 and 
27), it is possible to look up the theoretical 
probability of detecting the observed treat- 
ment effect using tables of the noncentral F 
(Tiku, 1967). The situation is not so straight- 
forward with the Quasi F. However, it is 
possible to obtain a noncentrality parameter 
by using 


NCP(Quasi F) = 2/Ka*/[Ko*s (a) 
X Jos. + Oriel. 
The degrees of freedom can then be esti- 


mated by the observed average for the 1,000 
simulated experiments. Using these estimates 


Table 5 | 
Observed Versus Theoretical Power of the Quasi F 
Quasi F Е, 
Noncentrality Theoretical Observed Noncentrality Theoretical 
d parameter b b parameter p 
Case 3: оъ = 9, a?ao = 4, ehe = 4 
.00 0 .05 .051 0 .05 
1 1.0 .10 .109 3.0 27 
1.54 4.0 34 324 11.8 16 
2.46 10.2 75 134 30.3 .99 
3.08 16.0 .92 914 47.3 99+ 
4.00 27.0 .99 .992 80.0 .994- 
Case 7: оъ) = 2, 0% = 25, обе = 1, оъ (а)с = 28 
.00 0 :05 :035 0 05 
15 2.3 14 145 11.3 27 
1.23 6.0 40 408 30.0 16 
1.79 12.8 .80 .190 64.0 .99 
2.24 20.0 97 .937 100.0 99+ 
2.91 33.8 .99 .997 169.0 .994- 


Note. Difference in treatment means — d; means in t 


he four groups were 4 — d, 4,4, 4 + 4; а = .05. 
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one can again consult Tiku's tables to obtain 
an estimate of the power of the Quasi F. 
Table 5 illustrates the observed and theo- 
retical power of the Quasi F as contrasted 
to the power of F, for two representative 
conditions of the anova parameters (Cases 
3 and 7). 

Two conclusions are apparent from the 
table. First, the theoretical power of the 
Quasi F is in good accord with the observed 
probability of rejection. Again the Quasi F 
is a reasonable statistic, this time in terms 
of power. The second conclusion is some- 
what more troublesome although quite ex- 
pected: The Quasi F has considerably less 
power to detect a given treatment effect than 
does F;. One should remember that the 
columns of Table 5 illustrate different situa- 
tions, The powers given for Quasi F are 
those when item variation is in fact random; 
the powers for F, are those when item varia- 
tion is in fact fixed. Inspection of the ex- 
pected mean squares in Table 1 reveals that if 
items within treatments are random it will 
be more difficult to discern treatment dif- 
ferences than if items are fixed. 

Clark (1973) has suggested the use of the 
statistic F'mi whenever mean squares of the 
Quasi F are difficult to compute—for exam- 
ple, when the data contain missing observa- 
tions. For the design discussed in the present 
article, F';;, is as follows: 


F'min(i, j) = MSa/[MSac + MSuo] 


where i is the number of treatment levels 


Table 6 


minus one and 


j= [MS. MS, ЛЕ 
је [MSh + Ива] dfAc + dfs) 


The statistic F'min is also equivalent to 


MS*xc en] 


Aeg Fy Fi 
F'min (i, j) = ЋЕ Ву 
Since F'mm is always less than or equal to 
Quasi F, its computational advantage is to be 
weighed against the obvious disadvantage of 
its conservatism. In the recent Monte Carlo 
study of Forster and Dickinson (1976), the 
observed alpha level for F’min was in fact 
very close to the nominal level. However 
this study was done with a small range of 
"nuisance parameter" values and assumed the 
normal model. Consequently, the generality 
of their results is limited. The observed alpha 
level of F'min for a nominal alpha of .05 in 
the conditions of our study is presented in 
Tables 6 and 7, Inspection of the tables shows 
that F',;, is fairly well behaved for Cases 1 
to 5, but is markedly conservative for Cases 
7 and 8. Clark (1976) has pointed out one 
case for which it is possible to compute 
an exact alpha level for F'mim, namely, when 
there are two treatment groups. In this case, 
the square root of F',,, is distributed as a t 
test between two means with unequal vari- 
ances, In such a situation it is of course 
perfectly reasonable to use F'min. 

Imhoff (1960) generalized Scheffé’s two- 
way mixed model to a completely crossed 


Observed Proportions of Rejections (1,000 Runs) for F' min 


in Null Case With Normal Errors 


ee 


Variance ratio 


Case (1,1,1,1) (1,2,2,5) (1,2,5,10) (1,2,5,20) M 
1 .043 .047 .047 .067 .051 
2 047 :048 .047 071 .053 
3 .049 .051 :064 .056 055 
4 043 .049 .057 .073 .056 
5 .042 044 048 046 045 
6 .008 .009 .009 -008 .009 
Д ~ 017 013 :013 -008 :013 
8 O11 014 015 016 .014 


Note. а = .05. 
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Table 7 


Observed Proportions of Rejections (1,000 Runs) in Null Case for 


F' min for Five Error Distributions 


ee 


Case Normal Exponential Log uniform Log normal 0-1 
1 043 041 049 -034 041 
2 047 .037 046 020 .036 
3 .049 .045 .054 .033 .035 
4 .043 .035 .053 .029 .031 
5 .042 .032 .038 .026 .040 
6 -008 :013 .007 .006 .004 
7j 017 013 013 009 .005 
8 O11 .009 017 -008 009 


Note. a = .05. 


three-way mixed model with two random 
effects and one fixed effect and derived an 
exact test based on Hotelling's T? for the 
hypothesis of no fixed main effects. Again, 
the test is perfectly appropriate but of lim- 
ited applicability. 


Discussion 


Our simulations have shown that the Quasi 
F ratio is a very reasonable statistic. It ap- 
pears to be slightly conservative, but is robust 
with respect to violations of the normal 
model, In short, the Quasi F is quite service- 
able in realistic experimental situations. 
There is no doubt that the Quasi F is a use- 
ful test whenever inferences beyond the 
specific stimuli and subjects of an experiment 
are sought. However, some experimenters 
will be concerned about being penalized by 
the conservative nature of the test. First, it 
should be noted that the Quasi F is not 
markedly conservative with respect to the 
nominal alpha level. Second, experimenters 
who use the Quasi F should feel free to break 
the stranglehold of а = .05. Both experi- 
menters and editors should be willing to 
accept a result that is significant at the .06, 
the .07, or even the .10 level. It is far better 
to relax slightly the nominal level of rejec- 
tion than to persist in the misuse of a sta- 
tistic such as F, in which the nominal alpha 
has virtually no meaning. 

What about situations in which a fixed 
effect design is appropriate, for example, 
when sampling constraints become so strin- 
gent that the experimenter cannot treat the 


stimulus variation as random? Even in these 
situations the experimenter should not totally 
ignore stimulus variation, Rather, stimulus 
variation should still be entered into the de- 
sign as a fixed effect (as opposed to collapsing 
over stimuli to obtain an average score for 
each subject). With this precaution it is at 
least possible to evaluate the contribution of 
stimulus variation. In a design with stimuli 
nested in treatment conditions the experi- 
menter should test for the significance of 
stimulus variation. When the same stimuli ap- 
pear in all treatment conditions, the Treat- 
ment X Stimulus interaction should be evalu- 
ated, If it turns out that stimulus variation 
or the Treatment х Stimulus variation is not 
significant, then general confidence in the 
treatment effect is greatly increased. On the 
other hand, if these terms are significant, 
conclusions about the treatment effect might 
not even generalize across the sample of stim- 
uli used in the experiment. In either case the 
experiment does not generalize beyond the 
particular stimulus sample, but if the ap- 
propriate item variation term is small, the 
experimenter can at least have confidence that 
the treatment effect is not due to a few devi- 
ant items. 

Another alternative for experiments in 
which stimuli are a fixed effect is to replicate 
the experiment using a new sample of sub- 
jects and new stimulus materials. Many ex- 
perimenters already use this procedure, For 
example, an experimenter might use two lists 
of words or alternate forms of a test. In such 
cases it is often possible to use the Quasi F 
test by treating both subjects and replica- 
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tions as random effects even if the experi- 
menter does not want to analyze responses to 
individual stimuli. Demonstrating that a 
result is reliable across replications clearly 
enhances the credibility of the finding. 

The above remarks pertain to a stimulus 
effect that must be treated as a fixed source 
of variation. As we have previously noted, 
it is much more common to want generality 
across both subjects and stimuli. In such cir- 
cumstances the Quasi F is both an appro- 
priate and a robust statistic. However, there 
is a final note of caution: Our discussion of 
the Quasi F ratio is itself limited in general- 
ity. We have examined the performance of 
only one type of Quasi F statistic. More work 
is needed to explore the class of Quasi F 
tests before psychologists proceed to use 
every imaginable form of concocted F ratio. 
In the meantime, the robustness of the Quasi 
F in our simulations is quite promising and 
suggests that more general use of such tests 
would greatly increase the reliability of the 
published data base. 
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The Detection of Deception 


David T. Lykken 
Department of Psychiatry 
University of Minnesota Medical School 


The polygraph (lie detector) test has an accuracy of 64% to 71% (against a 
chance expectancy of 50%) when the polygraph charts are scored blindly and 
are thus uninfluenced by clinical impressions of the subject or of the evidence 
against him. The lie test is biased against truthful subjects, at least half of 
whom may be erroneously classified as deceptive. These conclusions, based on 
two recent studies of lie test validity in real-life applications, corroborate an 
earlier critical analysis of the assumptions on which the lie detector is based. 
Since, in the field, most subjects tend to “fail” the lie test whether they are 
truthful or deceptive, the method more often detects lying than it does truthful 
responding. However, it seems probable that deceptive subjects could be taught 
to artificially augment their polygraph responses to the so-called control ques- 
tions and thus to avoid being scored as deceptive. The review by Podlesny and 
Raskin conyeys the impression that the lie test is already highly accurate and 
that addition of other response variables might enhance its validity even fur- 


ther. This impression is erroneous and dangerously misleading. 


The recent survey by Podlesny and Raskin 
(1977) conveys an impression that existing 
lie detector techniques are based on reason- 
able psychophysiological theory and are sup- 
ported by experimental findings of impressive 
validity. If current techniques already permit 
valid detection of deception in from 88% to 
96% of subjects tested (Podlesny & Raskin, 
1977, p. 787), even without the 9 or 10 new 
variables that the authors describe as “par- 
ticularly promising” (p. 797), then the fur- 
ther research they call for might be expected 
to produce a virtually infallible lie detector. 
Who could then object to the growing trend 
toward lie detector screening of employees or 
the admission of lie test evidence in courts 
(Lykken, 1974)? If the polygraphist is cor- 
rect 96% of the time even “on hardened 
criminals behind bars” (Raskin, quoted in 
Dunleavy, 1976), then allowing a fallible 
jury to deviate from the polygraphist’s judg- 


Requests for reprints should be sent to David T. 
Lykken, Department of Psychiatry, University of 
Minnesota Medical School, Box 392, Mayo Me- 
morial Building, 420 Delaware Street, S.E., Minne- 
apolis, Minnesota 55455. 
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ment—indeed, using a jury at all—can only 
diminish the accuracy of verdicts based solely 
on the lie test. In view of the claims that 
other exponents of the lie detector have been 
making over the years, it strikes one as 
curious that such implications of these claims 
seem never to be spelled out. If the lie test is 
96% accurate, at least as Raskin performs it, 
and if Podlesny and Raskin are correct in 
suggesting that the few existing studies, most 
of which are defective in design, have as yet 
failed to exploit the potential of a host of 
“promising” additional test variables, it ap- 
pears that we are dealing not only with the 
most valid psychological test ever devised but 
also with potentially the most important so- 
cial problem ever to issue from the psycho- 
logical laboratory. 


Theory of the Lie Detector Test 


In an earlier treatment (Lykken, 1974), I 
attempted to show that the theory of the lie 
test is so naive and implausible that one 
should demand especially strong empirical 
evidence before accepting claims of extremely 
high validity. Podlesny and Raskin (1977) 
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Table 1 
List of Questions Used in Raskin's Lie 
Dectector Test (Note 1) 


1. Were you born in Hong Hong? (Yes) 

2. Regarding the stabbing of Ken Chiu, do you in- 
tend to answer truthfully each question about 
that? (Yes) 

3. Do you understand that I will ask only the ques- 
tions we have discussed? (Yes) 

4. During the first 18 years of your life, did you 
ever hurt someone? (No)* 

5. Did you cut anyone with a knife on Dumfries 
St. on January 23, 1976? (No)® 

6. Before 1974 did you ever try to seriously hurt 
someone? (No.)* 

7. Did you stab Ken Chiu on January 23, 1976? 
(No)^ 

8. Is your first name William? (Yes) 

9. Before age 19, did you ever lie to get out of 
trouble? (No)^ 

10. Did you actually see Ken Chiu get stabbed? 
(No)^ 


Note. Defendant's answers are in parentheses. If the 
autonomic disturbance associated with the relevant 
questions tends to be greater than that associated 
with the paired control questions, the subject is 
diagnosed as deceptive. Since it is assumed that an 
innocent subject is more concerned by the control 
than by the relevant questions, larger responses to 
the former are interpreted as evidence that the ans- 
wers to the latter are truthful. 

* Control question. 

* Relevant question. 


dismissed my analysis on the grounds that 
it "assumed that control questions are de- 
signed to be answered truthfully by the sub- 
ject and that a lack of difference in magni- 
tude of reactions to control and relevant 
questions constitutes the basis for a truthful 
result. Both of these assumptions are factu- 
ally incorrect" (p. 787). The second of these 
alleged assumptions does not appear in my 
article (Lykken, 1974), which, on the con- 
trary, correctly states that *the examiner is 
advised to classify tests that give intermedi- 
ate [ie., lack of difference] scores as ‘incon- 
clusive’ " (p. 730). With respect to the first 
assumption, I stated accurately that “the con- 
trol question is chosen with the intention 
that it will elicit an emotional response from 
the subject, preferably a response involving 
the attitude of guilt, for example, ‘Can you 
remember ever stealing anything before you 
were 18 years old?’”; I added, “It is ex- 
pected that the subject will answer it truth- 


fully" (pp. 729-730). I should have also 
explained that many polygraphists claim that 
they can devise control questions, not con- 
cerned with the central issue of the interroga- 
tion, that the subject will answer deceptively 
and that will derive their guilty emotional 
impact from that attempted deception. In- 
stead of somehow invalidating my analysis, 
this addendum reveals even more clearly the 
naiveté and implausibility of the theory of 
this control question lie test, as is illustrated 
below. 

Let us consider an actual lie detector test 
administered by Raskin to a criminal de- 
fendant accused of homicide by stabbing 
(Proceedings at Trial, Note 1). The ques- 
tions employed in that test are listed in 
Table 1, Questions 5, 7, and 10 are the rele- 
vant questions pertaining to the incident; 
Questions 4, 6, and 9 are the control ques- 
tions. The scoring procedure used by Raskin 
involves comparing the polygraph responses. 
associated with each of the three adjacent 
pairs of relevant and control questions am 
assigning a numerical score to each pair, foi 
example, —3 if the relevant question elicits) 
a much larger response than the control, +3 
if the control response is much the larger, and 
O if there is no difference. This is done for 
each of the three or four polygraph channels 
employed and for each of the two or three 
repetitions of the question list that may be 
used. If the sum of these scores is, say, +6 or 
higher, the subject is said to have been truth- 
ful; if the sum is —6 or lower, he is diag- 
nosed as deceptive. In the 10% or so of cases 
in which the total score is near 0, the test 15 
considered inconclusive. 

It should be pointed out that a polygraph 
chart is very complex and that considerable 
subjectivity may influence the polygraphist's 
evaluation of the autonomic disturbance as- 
sociated with a particular question. But let us 
assume that objective and consistent rules for 
evaluating polygraphic response amplitude 
were available and that some means were 
found to insure that they were followed faith- 
fully by polygraphists in practice. Then, re- 
ferring to Table 1, if this defendant tended to 
give larger autonomic responses to the relevant 
questions (5, 7, and 10) than he did to the 
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controls (4, 6, and 9), he would be classified 
deceptive. If his responses to the controls 
tended to be larger, he would be classified 
truthful. The question is whether it is reason- 
able to expect that such a test might have 
96% validity. 

Podlesny and Raskin (1977) explained, re- 
ferring to the control questions, that “the sub- 
ject is very likely to be deceptive to them or 
very concerned about them” (p. 786). I sus- 
pect that this defendant's: по answers to 
Questions 4 and 9 were technically untrue be- 
cause most people have “hurt someone” or 
have "lie[d| to get out of trouble" prior to 
the age of 20. But I do not know that these 
answers were false in this case and neither 
does Raskin. Indeed, it is quite possible that 
this defendant thought he was telling the truth 
in both instances, interpreting kurt to mean 
something serious like the mortal stab wound 
he was accused of inflicting and trouble to 
mean something serious like being charged 
with murder. In the case of the second control 
question (Question 6), I think it most likely 
that the subject’s answer was entirely truth- 
ful, (I had never tried to “seriously hurt 
someone" prior to 1974!) A moment's reflec- 
tion makes it plain that no polygraphist can 
reasonably claim to be able routinely to con- 
struct control questions that are somehow 
guaranteed to elicit deceptive answers from 
the subject. If Podlesny and Raskin wish to 
claim that this is an essential feature of a 
properly administered lie test and that any 
analysis of the theory of the test must assume 
that the control responses are known lies, then 
further consideration of the lie test is point- 
less because no such test could possibly be 
devised except under extraordinary circum- 
stances (viz., if the examiner happens to have 
proof of additional crimes committed by the 
subject but which the subject wishes to deny). 

Clearly the purpose of the control question 
is that “it will elicit an emotional reaction 
from the subject" (Lykken, 1974, p. 730) or 
that the subject will be *very concerned" about 
the questions (Podlesny & Raskin, 1977). 
The emotional reaction occurs because the 
subect's answer is deceptive, because he is 
truthful but is ashamed to make that admis- 
sion, or merely because the question touches 


on some painful or embarrassing issue. The 
issue to be decided, then, is whether a larger 
autonomic response to the relevant question 
(e.g., “Did you stab Ken Chiu on January 23, 
19762") than to the control question (e.g., 
*Before 1974 did you ever try to seriously 
hurt someone?") should plausibly be taken 
as strong evidence that the subject’s answer 
to the relevant question is a lie. 

To state it more generally, does the control 
question really function as a control in the 
usual scientific sense of that term, that is, does 
the autonomic response to the control ques- 
tion provide a reasonable estimate of what 
the subject's response to the relevant ques- 
tion ought to be if he is answering truthfully? 
Alternatively, one might ask whether the con- 
trol response yields a reasonable estimate \of 
what the relevant response should be if the 
answer to the relevant question is deceptive. 
This is the basic unavoidable assumption of 
the lie detector test, and it seems to me to be 
patently implausible. By what prescience did 
Raskin know how concerned the defendant 
was about Question 6? And yet he was obliged 
to titrate this concern with exquisite precision 
in advance of the test proper because his 
scoring assumes that the response to Question 
6 will be greater than the response to Ques- 
tion 7 if the answer to Question 7 is truthful 
but that the response to Question 7 will be 
greater than the response to Question 6 if the 
relevant answer is a lie. One can imagine 
scenarios, all perfectly plausible, that might 
have led this defendant to be extremely, mod- 
erately, or negligibly concerned about the 
three control questions listed in Table 1. As 
a general rule, one would expect most subjects 
to be more concerned about the relevant ques- 
tions than about the controls, whether they 
answer deceptively or truthfully, because it is 
the relevant questions that refer directly to 
the source of their immediate jeopardy. Thus, 
one would expect most subjects to tend to 
“fail” lie detector tests in real-life situations, 
and this bias against the truthful subject is 
just what the data confirm, as is shown below. 

On the other hand, no psychologist ought 
to rule out the possibility that some criminal 
defendants, even though guilty as charged, 
might develop habituation to, or psychody- 
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namic defenses against, specific references to 
their crime and thus might be less responsive 
to the relevant than to the control questions 
so as to “pass” the lie test. A sophisticated 
criminal might know enough to augment his 
own reactions to the three control questions 
by flexing his toes, tensing his diaphragm, or 
biting his tongue at the appropriate moments. 
No good studies of the success of such coun- 
termeasures in real-life situations are avail- 
able. Polygraphists claim that they could not 
easily be deceived in this way, whereas, on 
the contrary, I claim that I could train guilty 
suspects to successfully *beat" the control 
question lie test. 

In the above case, Raskin testified that the 
defendant had in fact responded most 
strongly to the control questions and was 
truthful. The jury disagreed, finding the de- 
fendant to be guilty (Proceedings at Trial, 
Note 1). Did an innocent suspect in this 
instance behave in accordance with the as- 
sumptions of the lie detector or did a guilty 
suspect beat the test? One must turn to sys- 
tematic validity studies to determine how the 
lie test, however implausible, actually works 
in practice. 


Accuracy of the Lie Detector in the Field 


Although Podlesny and Raskin (1977) ac- 
knowledged that “there are many problems 
inherent in laboratory investigations of [psy- 
chophysiological detection of deception]" (p. 
782), they stopped short of asserting that 
meaningful estimates of lie test accuracy in 
any field application must be obtained from 
appropriate field studies. Yet this clearly is 
the only reasonable conclusion. Giving lie 
tests to students who have enacted mock 
crimes or to prisoners who have competed for 
money prizes may be useful for other pur- 
poses but will not provide adequate predic- 
tions of what can be expected in real-life 
criminal investigation. Raskin's laboratory 
study using prison inmates (Raskin & Hare, 
1978), for example, involved a deceptive con- 
text in which genuine and realistic fear of 
failure played no role whatever (surely the 
fear that one might fail to win a $20 prize is 
qualitatively different from a criminal sus- 
pect’s fear that he may end up in prison). 
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Thus, when one looks for evidence concerning 
the accuracy of the lie test in its intended ap. 
plication, one must confine one's attention to 
real-life studies in the field. 

Secondly, since interest lies in the contribu- 
tion of the polygraph to the detection of de- 
ception rather than in the clinical judgment of 
the examiner, one must also exclude all 
studies in which the lie tests were scored 
globally, still a common practice among many 
polygraphists. With this method of scoring, 
all the examiner knows about the subject, the 
evidence against him, his demeanor during 
the examination, and the like is compounded 
in the mind of the examiner with the actual 
polygraph results by some unspecified sub- 
jective formula to produce the final judgment 
of deceptive or truthful. In the Bersh (1969) 
study, for example, the criterion of guilt or 
innocence was the majority verdict of four ex- 
perienced prosecutors based on their reading 
of the completed files of 243 criminal sus- 
pects. These judges split 2:2 on 27 cases, 
leaving 216 that could be usefully compared: 
to the polygraphist's previous diagnosis of де 
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ceptive or truthful. The polygraphists agreed | 
with the criterion in nearly 88% of these 


cases. But because at the time of the ex- 
amination the polygraphists knew what evi- 
dence was then available and were able to in- 
terview and observe the suspects at some 
length, one must suppose that their decisions 
about the suspects’ guilt or innocence would 
have been substantially more accurate than 
chance expectancy (i.e., 50% agreement with 
the criterion) even if they had ignored the 
polygraph results entirely. In fact, one cannot 
be certain that the polygraph itself contrib- 
uted at all to the accuracy rate that Bersh 
reported. Another part of the U.S, Army 
study from which Bersh's data came (“Use 
of Polygraphs," 1975) examined the agree- 
ment between the original polygraphist’s de- 
cision and that of other polygraphists who 
read the same charts blindly. Agreement was 
low (kappa coefficients ranged from .15 to 
.51), indicating that the polygraph data could 
not have contributed much to the accuracy 
of the original examiner’s judgments. Since 
field studies using blind scoring of polygraph 
charts do not show nearly so high an ac 
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curacy, Bersh's findings must be set aside as 
ambiguous and almost certainly are an over- 
estimate of the validity of the polygraph test 
per se. 


Clinical Versus Actuarial Lie Detection 


Polygraphists who endorse the use of global 
judgments stress that it is the examiner, not 
the polygraph, who functions as the lie de- 
tector. Trade journals such as Polygraph or 
Journal of Polygraph Science are full of 
unsubstantiated claims concerning interview 
behaviors said to be indicative of guilt or of 
innocence (Lykken, 1978). Is it fair, 
therefore, to insist on separate assessment of 
the contribution made by the polygraph charts 
themselves to the accuracy of the examiner's 
decisions? 

The history of validity studies of projec- 
tive techniques like the Rorschach Inkblot 
Test provides a limited but useful analogy. It 
appears that certain talented individuals, ob- 
serving a subject responding to the Rorschach 
cards, are often capable of drawing clinical 
inferences of remarkable accuracy. But the 
majority of Rorschach test administrators are 
not nearly so accurate, and attempts to ob- 
jectify the cues or reasoning employed by the 
skillful few have met with limited success. 
There undoubtedly are certain police detec- 
tives and polygraph examiners who are simi- 
larly skillful in determining, by subjective 
evaluation of clinical observations, which sus- 
pect is lying and which is not. It is possible 
that more polygraphists might develop such 
skills if they were given formal training in 
psychology. But it is most doubtful that a 
trial judge would ever admit into evidence the 
clinical opinion of some self-styled “veracity 
expert,” in the form, “I have observed this 
defendant, considered his story and the rele- 
vant evidence, and in my opinion he is (or 
is not) telling the truth,” even if he were a 
fully accredited psychologist or psychiatrist. 
What business concern would employ some- 
one who claims to be able to detect lying in- 
tuitively, through observing interview behav- 
ior, and would allow him to screen prospec- 
tive employees for honesty or to determine 
which current employees are stealing from 
the company and should be fired? 


Clearly, the mystique of the lie detector, 
the reason why the polygraph test is taken 
seriously by some courts, by business, and by 
the general public and why the lie detector in- 
dustry is flourishing in this country, is wholly 
dependent on the technological or scientific 
aura of the polygraph itself. It is conceivable 
that some polygraph examiners are skillful 
Clinical lie detectors, but this issue is of neg- 
ligible scientific or social importance. What 
is important is whether, as Podlesny and 
Raskin contend, the polygraph test is an ob- 
jective, teachable method of extraordinary 
validity. 


The Evidence 


Fortunately, subsequent to my 1974 re- 
view, two field studies appeared that provide 
estimates of lie test accuracy under real-life 
conditions, when the polygraph charts are 
scored blindly by someone other than the 
examiner who administers the test. Both au- 
thors were and are professional polygraphists 
and certainly were not hoping for unfavorable 
results. Horvath (1977) had 10 trained poly- 
graphists independently score charts taken 
from the files of a large police department. 
There were 28 suspects who had subsequently 
been cleared by the confession of another per- 
son; 28 others had themselves confessed some 
time after the original testing. The 10 ex- 
perienced polygraphists agreed with each 
other about 89% of the time, indicating pre- 
sumably that they followed similar rules of 
chart interpretation. But the validity of their 
scoring was not nearly as impressive as their 
interjudge agreement; the 560 blind scorings 
were correct only 64% of the time. My 
analysis of the lie test suggested that it should 
discriminate against the truthful subject or 
at least against those subjects with sense 
enough to realize that the relevant questions 
are more important to their fates and more 
threatening than the (essentially irrelevant) 
control questions. This expectation was 
strongly confirmed by the Horvath study in 
which the known liars were correctly scored 
as deceptive about 77% of the time (against 
a chance expectancy of 50%), whereas the 
known truthful suspects were incorrectly 
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Table 2 
Summary of Available Data on Accuracy of Control Question Lie Test Using Blind Scoring 


o 


Horvath (1977) Barland and Raskin (Note 2) 


Item Verified Reported Corrected* 

Number guilty 28 40 40 

Number innocent 28 11 40 

Percent guilty 50% 78% 50% 
Percent deceptive 63% 88% 76% 
Percent correct (hit rate) 64% 86% 7196 
False negative rate 31% 17% 5% 
False positive rate 39% 13% 36% 
Guilty called truthful 23% 3% 3% 
Innocent called deceptive 49% 55% 55% 


Note. Observe the high proportion of innocent suspects misclassified as deceptive and the associated high rate 
of false positive classification. 

* To provide meaningful estimates of accuracy and error rates, the data of Barland and Raskin had tobe | 
corrected for the high base rate (78%) of criterion-guilty subjects. This was done by assuming Raskin would 
have made the same proportion of errors (55%) if 40 innocent suspects had been tested as he made on the If 
innocent suspects who were tested, thus yielding a standardized base rate of 50% criterion-guilty subjects. 


scored as deceptive half of the time, giving 
a false positive rate of 39%. 

The second recent study (Barland & Ras- 
kin, Note 2) employed Bersh’s method of 
using a panel of lawyers or judges to deter- 
mine, from all evidence excluding the lie test 
results, which suspects were guilty or inno- 
cent. Barland administered the tests, and the 
charts were then scored independently by 
Raskin. A majority of the criterion judges 
agreed on 64 of the 92 cases tested, but 13 
(20%) of these 64 tests were classified in- 
conclusive by Raskin. On the remaining 51 
tests, Raskin’s scoring agreed with the cri- 
terion on 44 of them, a hit rate of about 86%, 
which the authors reported as their estimate 
of field accuracy. However, 39, or 78%, of 
these same cases were guilty by the criterion, 
which means that one might have achieved a 
hit rate of 78% on this sample just by calling 
everyone deceptive (Raskin in fact scored 
88% as deceptive; see Table 2). Clearly, non- 
arbitrary accuracy estimates can only be ob- 
tained either by equalizing the numbers of 
guilty and innocent suspects and assuming 
Raskin would have been correct in the same 
proportion of 40 cases as he was in the actual 
11 cases (see right-hand column of Table 2) 
or by considering the fate of the guilty and 
innocent subjects separately. Raskin scored 
39 of the 47 guilty suspects as deceptive, 1 


as truthful, and the remaining 7 as inconclu- 
sive. But only 5 of the 17 innocent suspects 1 
were correctly scored as truthful; 6 wel 
called deceptive and 6 inconclusive. Had thi 
study been designed like Horvath's (1977 
with half the subjects guilty and half i 
nocent, then—excluding  inconclusives—o! 
might expect about 71% hits overall and à 
false positive rate of 36%, very similar to the 
39% false positives in the Horvath study. Al- 
though Raskin correctly diagnosed all but 1 of | 
the guilty subjects as deceptive (not counting 
the 7 inconclusives), he did this at the expense | 
of calling 55% of the innocent suspects de- || 
ceptive also. р 
These two studies constitute the only evi- 
denae available concerning the accuracy of 
the control question lie test administered 
under real-life conditions and scored to ex- 
clude the influence of clinical judgment (0r 
prejudice) and thus to provide some idea 0 
the accuracy of the polygraph test itself. And 
the two studies agree quite well, showing a" 
accuracy of from 64% to 71%, against а 
chance expectancy of 50%, and showing that 
of those who fail the test, 36% to 39% will 
be false positive, truthful subjects (assuming 
that half the subjects tested are innocent)- 
Raskin failed a higher proportion of his sub- 
jects than Horvath’s polygraphists did: 76% 
versus 63% if one again assumes that bo 
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studies used equal numbers of guilty and in- 
nocent subjects. Therefore, Raskin called a 
higher proportion of both the guilty and the 
innocent subjects deceptive. Podlesny and 
Raskin (1977) cited the Barland and Raskin 
(Note 2) study, but they did not mention the 
actual results; the Horvath (1977) study was 
not even cited. Instead, my arguments (Lyk- 
ken, 1974) predicting that the lie test should 
show a high rate of false positives in real-life 
applications are supposedly refuted by a re- 
ferral to the results of two mock crime labora- 
tory studies by Raskin and his colleagues 
(Podlesny & Raskin, 1977, p. 787). 


Conclusions 


Thus one sees that the control question lie 
test is not 88%, 90%, or 96% accurate in 
real-life applications, but rather is in the 
neighborhood of 64% to 71% accurate when 
standardized for a chance expectancy of 50%. 
The actual false positive expectancy is not 
8%, 4%, or 2%, but is more on the order of 
3696—3996. A skillful examiner who is willing 
to call as many as three fourths of all sub- 
jects deceptive can detect most liars (assum- 
ing the subjects are not equally skilled at 
beating the test), but he will at the same 
time call most of the truthful subjects decep- 
tive also. An interesting research question not 
mentioned by Podlesny and Raskin is whether 
many deceptive subjects could be trained 
to beat the special form of lie test that 
they advocate. Any intelligent criminal could 
easily be taught to identify the three control 
questions and instructed to augment his au- 
tonomic reactions to these questions in a 
variety of covert ways. In fact, I venture an- 
other prediction; let me train guilty suspects 


in Barland and Raskin's next field study, and 
I predict their false negative rate may ap- 
proach what their false positive rate is 
right now. 
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Truth and Deception: 


David C. Raskin 
University of Utah 


In response to Lykken's critique, scientific evidence is presented that shows that 
control question tests of deception have an accuracy of approximately 90% in 
the field situation and are highly effective with both innocent and guilty sub- 
jects. Lykken's erroneous representation of the theory of such tests is corrected, 
and his selective and misleading presentation of the scientific data is rectified. 


A Reply to Lykken | 


John A. Podlesny | 
Western Reserve Psychiatric Habilitation Center. 
Northfield, Ohio | 


The proper interpretation and application of control question tests in the crim- 
inal justice context are described, and the tests are shown to be highly beneficial 
to innocent defendants, the judicial process, and society in general. 


In a critique of our recent article (Pod- 
lesny & Raskin, 1977), Lykken (1979) at- 
tempted to discredit our theoretical analyses 
and conclusions by using intuitive and spec- 
ulative arguments and by selective and mis- 
leading descriptions of the existing data and 
literature. Lykken also made bold claims 
about his ability to train people to “beat” 
the control question test, and he presented a 
misleading description of a polygraph test 
conducted by Raskin in a criminal case. This 
article attempts to correct those errors with a 
careful examination of the theory, the scien- 
tific data and literature, and the applications 
of control question tests for truth and de- 
ception. 


Theory of Control Question Tests 


In spite of 5 years of contact with the lit- 
erature and with concepts of control question 
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| 
tests (see Raskin, 1978), Lykken still does | 
not understand the simple, basic theory. The 
theory holds that, following a detailed pre- | 
test interview, a guilty subject will show 
relatively stronger autonomic responses to the | 
relevant questions and that an innocent sub- 
ject will show relatively stronger responses to | 
the control questions, which deal with acts of 
the same general nature as those covered 
the relevant questions. The control ques 
is a stronger stimulus for the innocent sul 
ject because he knows he is truthful to ti 
relevant questions; he has been led to be 
lieve that the control questions are also very 
important in assessing his veracity; the mam 
ner of explaining the control questions to him 
and their wording have elicited a no answel; 
and he is either deceptive in his answers, үе 
concerned about his answers, or unsure of his 
truthfulness because of the vagueness of the 
questions and problems in recalling thé. 
events, His concern about being diagnosed 85| 
deceptive produces autonomic reactions to the 
controls. There is no attempt to “titrate 
concern with exquisite precision in advance 
of the test proper” (Lykken, 1979, p. 49). 
Control questions are emphasized to all sub-. 
jects during the pretest interview and imme 
diately following each chart (Podlesny E 
Raskin, 1978; Raskin & Hare, 1978). 
Lykken was simply wrong when he stai 
that “the theory of the test must 85 
sume that the control responses are know? 


| 


ted 
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lies" (p. 49) and that the purpose of the 
control question is to provide an estimate of 
the subject's autonomic response to a rele- 
vant question answered truthfully. We have 
never stated that the control question should 
*function as a control in the usual scientific 
sense of that term" (Lykken, 1979, p. 49). 
This statement describes the function of non- 
critical items in a guilty knowledge test, and 
it seems to indicate that Lykken does not 
understand the basic difference between 
guilty knowledge tests and control question 
tests, The actual purpose of the control ques- 
tion is to provide a stimulus that will pro- 
duce a stronger autonomic reaction than the 
relevant question when the subject is inno- 
cent, thereby providing a positive identifica- 
tion of innocent subjects. 


Empirical Issues 


Our 1977 review (Podlesny & Raskin, 
1977) dealt almost exclusively with labora- 
tory research and made suggestions to maxi- 
mize its generalizability to field applications, 
but Lykken arbitrarily dismissed the utility of 
laboratory experiments in estimating the 
accuracy of field detection of deception. This 
position betrays a profound lack of under- 
standing of the scientific method and the 
value of controlled experimentation and di- 
versity of evidence (Hempel, 1966). As we 
pointed out (Podlesny & Raskin, 1977), the 
best strategy “is to employ laboratory re- 
search that simulates field-deceptive contexts 
as closely as possible, along with field valida- 
tion" (p. 784). We have used prisoners, 
criminal suspects, very realistic mock crimes 
(so realistic that some subjects decline to 
participate in the mock crime), substantial 
motivation, and potential loss of reward or 
punishment. In laboratory experiments with 
subjects recruited from the community by 
newspaper ads, with prison inmates, and with 
diagnosed psychopaths, we have consistently 
obtained accuracy rates above 90% (Pod- 
lesny & Raskin, 1978; Raskin & Hare, 1978; 
Rovner, Raskin, & Kircher, Note 1), and 
such findings are very useful in the scientific 
enterprise of estimating accuracy in real-life 


situations. 


Table 1 

Percentage of Correct Decisions in Five Studies 
With Blind Interpretations of Polygraph 
Charts Obtained From Verified Guilty and 
Innocent Subjects 


Study Guilty Innocent 

Horvath & Reid (1971) 75“ 838 

89» 94> 
Hunter & Ash (1973) 88 86 
Slowik & Buckley (1975) 85 93 
Wicklander & Hunter (1975) 95 93 
Raskin (Note 3) 93* 69° 

1004 95d 

Combined results 90 89 


* Decisions were made by intern examiners. 

b Decisions were made by experienced examiners. 
* Evaluation was nonnumerical. 

3 Evaluation was numerical. 


Lykken was correct in emphasizing the need 
for validation studies with criminal suspects 
using blind evaluation of polygraph charts. 
Unfortunately, he provided misleading in- 
terpretations of the two studies that he se- 
lected, and he failed to mention five published 
studies that meet his criteria but provide 
strong evidence against his position. Lykken 
also failed to mention that in the Horvath 
(1977) study the original examiner was 
100% correct and the cases were all verified 
by confessions. It has been pointed out (Ras- 
kin, 1978) that the unusually low level of 
accuracy attained by Horvath's blind evalu- 
ators was very likely due to their lack of 
formal training in systematic chart interpre- 
tation and their heavy emphasis on overt 
beltavior symptoms rather than on systematic 
chart interpretation. Therefore, the Horvath 
study is of little value in assessing the ac- 
curacy of decisions based on systematic 
chart interpretation. 

Lykken (1979) was incorrect when he 
stated that the Horvath (1977) and Barland 
and Raskin (Note 2) studies “constitute the 
only evidence available concerning the accu- 
racy of the control question lie test admin- 
istered under real-life conditions and scored 
to exclude the influence of clinical judgment 
(or prejudice) and thus to provide some idea 
of the accuracy of the polygraph test itself” 
(p. 52). There are five other published 
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studies that meet Lykken's criteria of blind 
interpretation of confirmed polygraph charts 
from criminal suspects. The findings of these 
studies are presented in Table 1. The mean 
accuracy rates of 90% correct on guilty sus- 
pects and 89% correct on innocent suspects 
were based on a total of 1,204 independent 
decisions obtained by blind interpretation of 
polygraph charts by 55 different polygraph 
examiners. The data from Horvath and Reid 
(1971) show that experienced examiners are 
more accurate, and the Raskin (Note 3) data 
clearly demonstrate that the use of a rela- 
tively objective, systematic method of quanti- 
fied chart interpretation yields significantly 
higher accuracy rates, which approach 100%. 

The reliability of the numerical scoring 
system is extremely high. Using the numeri- 
cal system with blind chart interpretation, we 
obtained a mean correlation of .86 for the 15 
pairings of six independent evaluators (Bar- 
land & Raskin, 1975), a .91 correlation be- 
tween numerical scores and 99% agreement 
with the examiner’s original decisions on 102 
criminal suspects (Barland & Raskin, Note 
2), a .97 correlation between numerical 
scores and 100% agreement with the original 
examiner’s decisions in a laboratory study 
(Podlesny & Raskin, 1978), and 95% accu- 
racy and 100% agreement with decisions 
made 2 years before in a laboratory study of 
criminals and psychopaths (Raskin & Hare, 
1978). 

The extensive and consistent findings just 
described demonstrate the very high reliabil- 
ity and validity of blind chart interpretation 
when it is performed by competent exami- 
ners who have been adequately trained in 
chart interpretation and who do not make 
decisions based on the questionable pro- 
cedure of observing behavior symptoms. The 
latter procedure has been shown to be in- 
effective in assessing truth and deception 
(Podlesny & Raskin, 1978; Raskin, Barland, 
& Podlesny, Note 4), and it is not surprising 
that examiners trained to rely on behavior 
symptoms instead of polygraph charts pro- 
duce results hardly better than chance (Hor- 
vath, 1977). 

In addition to the Horvath (1977) study, 
Lykken placed great weight on the high false 
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positive rate in the Barland and Raskin 
(Note 2) study. In this study, as in field 
studies of lie detection generally, it was nec- 
essary to substitute criteria of guilt or inno- 
cence in place of factual knowledge. The two 
major criteria were decisions of a panel of 
legal experts based on case information 
(with all references to the polygraph tests 
deleted) and judicial outcomes. Those cri- 
teria failed to provide assessment of accu- 
racy equivalent to that available in labora- 
tory studies or confirmed criminal cases, 
Raskin (1978) has stated that the panel 
criterion is open to serious challenge because 
the information provided in the Barland and 
Raskin study was generally inadequate, agree- 
ment between the court decisions and the 
panel was less than perfect, and inherent bias 
may have existed toward judgments of in- 
nocence based on the tradition of the as- 


sumption of innocence in the absence of ex- | 


tremely strong evidence to the contrary. 
Furthermore, the number of criterion-inno- 
cent subjects was very small. As a result the 
95% confidence interval for the false posi- 
tives was 11%-59%, whereas the larger sant 


ple size for guilty subjects yielded a 95% | 


confidence interval of 0%-16% for false 
negatives. Therefore we consider these data 
and those of Horvath (1977) to be of rela- 
tively low value in contrast to those pre 


—— ES 


sented in Table 1 and the other data de | 


scribed earlier that were ignored by Lykken. 


Issues in the Application of Control 
Question Tests 


Lykken (1979) stated that a sophisticated 
criminal might be able to augment his reac- 
tions to the control questions and “pass” the 
test. The only published study with such 
procedures used the guilty knowledge test 
(Lykken, 1960), which is more susceptible to 
false negatives than the control question test 
(Podlesny & Raskin, 1977) because the 
guilty knowledge test employs only skin re- 
sistance measures, and a truthful outcome 
does not require larger responses to the non- 
Critical items, as does the control questio? 
test. The subjects were medical students; 
staff psychiatrists, and psychologists who Were 
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given detailed instructions about the test 
structure, a strategy and methods to beat the 
test, and biofeedback training to control 
their skin resistance responses. Even with 
minimal consequences for being detected, the 
accuracy was 100%. Lykken's (1960) failure 
to train sophisticated subjects with little at 
stake to be able to beat the simpler guilty 
knowledge test raises extreme doubt concern- 
ing Lykken's statement, “I claim that I 
could train guilty suspects to successfully 
‘beat’ the control question lie test" (p. 
50). Rovner et al. (Note 1) are presently 
engaged in an extensive laboratory study to 
assess the effects of detailed information and 
practice on the accuracy of control question 
tests. 

Although we are opposed for a variety of 
scientific and ethical reasons to the use of 
polygraph tests in employment situations, 
Lykken's opposition to the use of polygraph 
evidence in court is based on his lack of 
understanding of the control question tech- 
nique, his highly selective presentation of 
the scientific evidence, his misinterpretations 
of those data that he selected for discussion, 
and his gross misunderstanding of the crimi- 
nal justice system. The issues surrounding 
court use of polygraph evidence involve the 
level of confidence that can be placed in a 
truthful or deceptive outcome, the way in 
which such outcomes are used in the crimi- 
nal justice process, and the impact of such 
evidence on juries. 

We agree that the data indicate that false 
positives are more likely than false negatives, 
even though the rates of both types of errors 
are low. Even if Lykken were correct con- 
cerning the rate of false positives, for prac- 
tical purposes the confidence in a truthful out- 
come is higher than that in a deceptive out- 
come, since a truthful result is more likely to 
be correct than is a deceptive result. The use 
of such findings coincides with our judicial 
and moral standards for acquittal and convic- 
tion. Because criminal guilt must be demon- 
strated beyond a reasonable doubt, consider- 
able evidence is required for conviction, and 
a deceptive polygraph result is far from suf- 
ficient. In the absence of other strong evi- 
dence of guilt, no competent or ethical prose- 
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Figure 1. Skin resistance and blood pressure re- 
sponses of homicide defendant William Wong to a 
control question (6) and a relevant question (7). 
The vertical marks indicate the beginning and end 


of the questions, and the minus sign indicates a no 
answer at that time. 


cutor could or would try a case on the basis 
of a deceptive polygraph test. However, in 
the absence of overwhelming evidence to the 
contrary, the high degree of accuracy of a 
truthful polygraph result should be sufficient 
to cast the reasonable doubt required for dis- 
missal or acquittal. It has become common 
practice for law enforcement agencies and 
prosecutors to dismiss charges in such situa- 
tions. Given the very low accuracy of some 
types of evidence, such as eyewitness testi- 
mony, that are commonly used against de- 
fendants and the great weight accorded to 
this evidence by prosecutors and juries 
(Buckhout, 1974), it makes good sense to 
provide an opportunity for innocent suspects 
and defendants to clear themselves by means 
of a properly conducted polygraph test. 
Lykken (1979) may have provided a mis- 
leading description of the William Wong case 
(Proceedings at Trial, Note 5) by failing 
to describe its outcome. Wong was accused of 
homicide on the basis of highly questionable 
eyewitness accounts. Wong was administered 
a polygraph test by Sergeant Smith of the 
Vancouver, Canada police department and 
was retested prior to his trial by Raskin. He 
was found to be truthful by both examiners. 
Using standard techniques, Raskin employed 
typical control questions, including the ques- 
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tion, *Before 1974 did you ever try to seri- 
ously hurt someone?" (Question 6). Lykken 
speculated that Wong was entirely truthful 
and unconcerned when he answered that 
question, and he implied that Wong showed 
a stronger autonomic response to the ques- 
tion, *Did you stab Ken Chiu on January 
23, 1976?" (Question 7). On the contrary, 
Wong was more concerned about Question 
6, and Figure 1 shows his substantially larger 
electrodermal and blood pressure responses 
to the control question (Question 6)! Wong 
obtained a clearly truthful score of +9, and 
these results were presented by Raskin to 
the jury as part of Wong’s defense against the 
murder charge. It should also be mentioned 
that in the same court hearing, Lykken un- 
successfully opposed presenting to the jury 
the results of Raskin’s and Sergeant Smith’s 
polygraph tests in the defense of innocent 
homicide defendant William Wong. Lykken’s 
position at the trial was in direct conflict to 
his previously published position that "judi- 
cious use of the polygraph in the criminal 
investigation context not only can improve 
the efficiency of police work but could also 
serve as a bulwark to protect the innocent 
from false prosecution” (Lykken, 1974, p. 
738). 

Lykken (1979) also claimed that the use of 
polygraph evidence in court would overwhelm 
the jury and might even be used to replace 
the jury system. His speculations are naive 
with regard to the judicial process and be- 
tray a lack of knowledge of the evidence 
concerning the impact of the testimony of 
polygraph experts on jury deliberations. As 
Tarlow (1975) pointed out, polygraph evi- 
dence is simply an aid to the jury in its 
complicated task of assessing the credibility 
of witnesses. As such, if the polygraph evi- 
dence has probative value, the jury is merely 
asked to consider it along with the other 
evidence in the case and to accord it what- 
ever weight the jury finds appropriate. We 
have not suggested that polygraph tests 
should replace the jury system. In fact, the 
available evidence (Tarlow, 1975) indicates 
that juries are very cautious and that they 
tend to be “underwhelmed” by polygraph 


testimony. 


Conclusion 


The results of many scientific studies in 
laboratory and field settings as well as our 
published report to the U.S. Department of 
Justice (Raskin et al., Note 4) indicate that 
the accuracy and reliability of control ques- 
tion tests can be very high. On the basis of 
the present evidence, it is reasonable to con- 
clude that the results of control question 
polygraph examinations conducted by compe- 
tent and ethical examiners can have impor- 
tant and beneficial effects for the criminal 
justice process and for our society in general, 
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The principal objective of this study was to discover which distances are most 
important in determining the recovery performance of a nonmetric multidimen- 
sional scaling algorithm. Using Monte Carlo methods we show that the large 
distances are critical to satisfactory performance, whereas the small and the 
medium distances play a much less crucial role. This finding has been reliably 
demonstrated across a variety of conditions, although only for a single combi- 
nation of dimensionality and number of points. It turns out that certain paral- 
lels exist between this work and previous results obtained using cyclic and other 
incomplete designs. Finally, on the basis of these results we make some recom- 
mendations to experimenters regarding data collection procedures; these repre- 
sent a simple alternative to the methods advocated by Spence and Domoney. 


Good nonmetric multidimensional scaling 
algorithms have been available for more than 
a decade (Kruskal, 1964), and the large 
number of published applications attests to 
their popularity and usefulness. These pro- 
cedures are capable of constructing a config- 
uration of points in a metric space using no 
more than the ordinal properties of the data 
in a matrix of dissimilarities—thus the char- 
acterization “nonmetric.” Despite the fact 
that a large body of experience and knowl- 
edge has been accumulated regarding the 
behavior of nonmetric algorithms, there have 
been few systematic attempts to discover 
which characteristics of the input data are 
essential for successful construction of the 
configuration. It is important to know the 
answer to this question, since in many situa- 
tions it may not be feasible, or desirable, to 
collect all possible pairwise judgments of 
dissimilarity from a subject. The most obvious 
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situation is one in which the number of 
stimuli is large and the resulting number of 
potential paired comparisons becomes too 
onerous a burden for even the most dedicated 
subject. Consequently, if some incomplete 
fraction of the data is to be obtained, it is 
essential that information be gathered re- 
garding the distances that are most influen- 
tial in determining the solution. In this 
article we use Monte Carlo techniques to 
discover which distances are most important 
in determining the nature of the final config- 
uration of points and on the basis of our 
results suggest some possible data collection 
procedures for the experimenter who wishes 
to use a large number of stimuli. 


Method 


The design of the study is straightforward 
and is summarized in Figure 1. A two-dimen- 
sional configuration with 31 points was gen- 
erated randomly within the unit circle. Dis- 
similarities were simulated by computing 
error-perturbed interpoint distances accord- 
ing to two different models: multidimensional 
Thurstone Case V (e.g., Ramsay, 1969)— 


2 
4 = [> (xa* — жао] 


а=1 
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with 
ха = Xia + N (0, or’); (1) 
Wagenaar and Padmos (1971)— 


2 

ду = рэ (ка — x) -N (1, owr’), (2) 
that is, the true distance is multiplied by a 
random normal deviate with a mean of one. 
Although not invented by Ramsay (1969), 
for convenience we shall refer to the first 
model as the Ramsay model: This (noncen- 
tral x?) model produces an error distance 
whose variance increases as a function of in- 
creasing true distance, although the effect is 
actually quite small. For the Wagenaar—Pad- 
mos model, however, the effect is much more 
dramatic with the standard deviation of the 
error distance linearly related to the true 
distance. This error model is not unlike the 
lognormal error model that Ramsay (1977) 
has suggested is the most plausible for many 
psychological situations—the major difference 
being the skewness of the lognormal. The 
distributions are illustrated in Figure 2. The 
standard deviations in panel a and panel b 
for the Ramsay model are actually different 
but not by much. However, the standard 
deviation for panel a in the Wagenaar and 
Padmos model is larger than the standard 
deviation in panel b by exactly the same 
proportion as the ratio of the two true dis- 
tances. These error distributions model what 
we feel to be the reasonable extremes that 
might be encountered with real data. 


GENERATE 


GRAEF-SPENCE: BASIC DESIGN- IO REPS 


Figure 1. The design of the study, (R15 and R30 
refer to the Ramsay model, with св = 15 and ок = 
30, respectively. WP25 and WP50 refer to the 
Wagenaar and Padmos model, with owe = 25 and 
сур = .50, respectively. REPS = replications; TOR- 
SCA = computer program used.) 


(a) (b) 


RAMSAY ERROR MODEL 


(o) (b) 


WAGENAAR-PADMOS ERROR MODEL 


Figure 2. The two error models. 


In the sequel, R15 and R30 refer to the 
Ramsay model with св = .15 and og = .30, 
respectively. These represent medium and 
high error levels and are the same as those 
used by Spence and Domoney (1974). For 
the Wagenaar and Padmos model, WP25 and 
WP50 refer to situations where сур = .25 
and сур = .50, respectively. These two levels 
were chosen such that the variance of an 
average distance would be approximately 
equal in both error models—the appropriate 
relation was determined algebraically. As will 
be seen from the results, comparable recov- 
eries were obtained with the two models. 

The dissimilarity matrices were analyzed, 
using TORSCA-9 (Young, Note 1) in five 
ways—with no deletion, using a maximum 
efficiency cyclic design * with one-third dele- 
tion, and after deletion of the small, the me- 
dium, and the large distances (a $ fraction 
being retained in each case). Thus, for ex- 
ample, when the small distances were de- 
leted, the 155 smallest of the 465 dissimilari- 
ties were discarded prior to analysis. 


Results 


The results for 10 replications are shown 
in Table 1 and Figure 3. With no deletion, we 
obtain the same pattern as in many previous 
Monte Carlo studies and note further that 
R15 and WP25 were about equivalent in 
their effects, as are R30 and МР50. This 


1A cyclic design is a particular kind of partially 
balanced incomplete block design. It is not central 
to the substance of this paper that the reader un- 
derstand how such designs are constructed and used 
in the multidimensional scaling context. A full and 
detailed description is given by Spence and Domoney 
(1974), and it is shown that high efficiency cyclic 
designs perform very well. 
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Table 1 
Mean Recovery Measures 


ll ————________—— 


Root mean square correlations 


Mean absolute error 


Error S M iD; A 5 M LE MA 
No deletion 
Zero 1000 1000 1000 1000 00 00 00 00 
R15 867 786 918 978 07 07 06 07 
WP25 901 791 902 980 06 06 06 06 
R30 572 522 712 899 13 13 14 13 
WPS0 659 542 728 919 13 13 14 13 
Cyclic deletion 
Zero 991 982 993 999 02 02 02 02 
R15 798 718 872 965 09 09 08 09 
WP25 830 689 852 964 08 09 09 09 
R30 368 327 496 751 19 (22 30 24 
WPS0 371 326 483 735 19293 33 25 
Small deletion 
Zero 976 975 996 997 03 02 02 02 
R15 792 705 892 964 10 09 08 09 
WP25 765 685 825 943 11 10 11 11 
R30 312 268 550 704 22 24 36 27 
WPS0 524 431 639 833 17 17 25 20 
Medium deletion 
Zero 994 986 999 999 02 02 01 01 
R15 771 688 857 959 10 10 09 09 
WP25 823 678 825 959 09 10 09 10 
R30 463 421 555 837 17 19 21 19 
WP50 409 375 533 800 19 21 24 21 
Large deletion 
Zero 839 443  —229 505 14 31 78 41 
R15 600 355  —091 616 15 27 57 33 
WP25 713 355 067 609 13 26 61 33 
R30 224 182 095 397 19 41 78 46 
WPS0 335 112 —102 296 20' 48' 93 54 


Nole. S, M, L, and A represent the recovery statistics for small, medium, large, and all distances, respectively. 
R15 and R30 refer to the Ramsay model, with св = .15 and er = -30, respectively. WP25 and WP50 refer 
to the Wagenaar and Padmos model, with сур = .25 and ew» = .50, respectively. 


holds for both the root mean square (rms) 
correlation between true and recovered dis- 
tances and for the mean absolute error; the 
latter is simply the average discrepancy be- 
tween the generated and the recovered dis- 
tances. Thus, the results seem to be inde- 
pendent of the error process involved. 
Consequently, since the two error models 
used are quite dissimilar, it is plausible that 
comparable results would be obtained using 
other error distributions. It should also be 


observed that the recovery of the distances 
has been assessed rather than the configura- 
tion itself. In practical terms this makes 
little difference, since many previous Monte 
Carlo studies have shown that when the 
distances are well recovered, so is the con- 
figuration and vice versa. For example, Spence 
(in press) found a correlation of .967 between 
recovery of the distances and recovery of the 
configuration, 

In addition to assessing the recovery of 
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Figure 3. The recovery correlations as a function of zero (0), cyclic (C), small (S), medium 
(M), and large (L) deletions. (Medium and high error levels were used with both error models, 
as well as a no-error condition; RMS — root mean square. R15 and R30 refer to the Ramsay 
model, with ов —.15 and св = .30, respectively. WP25 and WPSO refer to the Wagenaar and 
Padmos model, with owe = .25 and ew» = .50, respectively.) 


the total set of distances, we computed re- 
covery statistics for the small, the medium, 
and the large distances. These correlations 
should be interpreted with some care be- 
cause they are based on subsets of distances 
with different variances, If allowance is made 
for these different variances, the correlations 
for the small, medium, and large subsets are 
very close in almost all instances. The one 
striking exception will be discussed presently. 

Cyclic deletion produced satisfactory re- 
sults similar to those obtained by Spence 
and Domoney (1974) and Spence (in press) 
and warrant no further discussion. Deleting 
the small and medium distances produced 
recovery statistics very similar to the cyclic- 
design condition, with perhaps slightly better 
recovery in the high error conditions. How- 
ever, when the large distances were deleted, 
the recovery deteriorated dramatically: Even 
in the zero-error condition the overall recov- 
ery was quite clearly unacceptable, and in no 
condition does the recovery correlation exceed 
.7. Tschudi (1972) has suggestd that .7 is 
the minimum acceptable correlation; solu- 
tions with recovery correlations smaller than 
this bear very little resemblance to the gen- 
erated configurations. The analysis by sub- 
sets of distances shows that the large dis- 


"tances are the worst recovered, with some 


correlations negative and the best around 
zero. The medium and smaller distances are 
not as well recovered as in other deletion con- 
ditions, Obviously, a nonmetric scaling algo- 
rithm performs very poorly when it is denied 
information relating to the large distances. 
In this study we cannot consider the in- 
complete fractions that were scaled to be 
experimental designs in the usual sense, since 
the missing dissimilarities are not determined 
a priori but are a function of the actual 
distances, In other words, the situation is not 
exactly the same as if the pattern of missing 
distances had been determined independent 
of the data, as for example, if a cyclic design 
had been used. Nevertheless, the end result 
is similar in that an incomplete matrix is 
scaled. Consequently, it does not seem un- 
reasonable to refer to the pattern of retained 
dissimilarities as a pseudodesign. The analy- 
sis of variance (ANOVA) efficiency cannot be 
calculated as simply as in the case of cyclic 
designs (see Spence, in press; Spence & Do- 
money, 1974), however, it is easy to calcu- 
late the mean resistance of the pseudo- 
designs. This is done by considering the 
graph of the design to be an electrical net- 
work, with the edges of the graph corre- 
sponding to conductors of unit resistance. 
Johnson and Van Dyk (Note 2) have shown 
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Figure 4. The recovery correlations as a function of 
the resistance of the pseudodesigns. (RMS = root 
mean square.) 


that mean resistance and ANOVA efficiency 
are formally identical with resistance in- 
versely proportional to efficiency. We have 
calculated the mean resistances for all condi- 
tions in this study, and the results are shown 
in Figure 4, where the recovery means are 
averaged over all error conditions. For the 
no-deletion and cyclic conditions, the mean 
resistance is, of course, a constant; whereas 
for the small-, medium-, and large-deletion 
conditions the values of mean resistance are 
the averages over the 10 replications, It can 
readily be seen that as mean resistance in- 
creases, the recovery correlation deteriorates. 
This is precisely analogous to the results of 
Spence (in press), where it was found that 
decreases in the efficiency of cyclic designs 
were accompanied by deteriorating recovery. 
It is interesting to note that deleting the 
large distances produces a pseudodesign with 
the highest mean resistance. Deletion of 
small or medium distances produces pseudo- 
designs with resistances not much higher 
than that of the maximum efficiency cyclic 


design. Furthermore, these pseudodesigns 
perform as well as the cyclic design in term 
of recovery. 

It is reassuring to observe that the rela 
tionship between recovery and resistance for 
the pseudodesigns is identical to that found 
for orthodox experimental designs. 


Discussion and Recommendations 


Interpretation of the above results is sub- 
ject to certain qualifications. First, the pro- 
gram that we used provides an ехсе 
starting configuration (see Spence, 1972), 
and consequently it is not known whether all 
scaling programs would produce comparable 
results. However, programs that utilize simi- 
lar methods to compute the initial configura- 
tion (e.g, KYST, ALSCAL, SSA-I) should 
yield good results. Second, in order to con: 
serve computer time, we did not systemati- 
cally investigate configurations in higher 
dimensionalities, nor with larger numbers of 
points. Increases in the values of these pa 
rameters would have made the study е“ 
tremely costly. (We did examine a few in 
vidual cases with larger numbers of poit 
and found the same pattern of results.) Thus 
extrapolation of our results is not without 
risk, but we are confident that with large 
numbers of points there would be no change 
in the conclusions (cf. Spence & Domoney, 
1974). 

In practical terms, if an experimenter is 0 
construct an incomplete design with the small 
or medium-dissimilarity comparisons elimi- 
nated, some prior information as to the rela- 
tive sizes of the dissimilarities is necessary: 
Such information may be obtained very ea 
ily; we suggest a number of relatively ob- | 
vious methods. No doubt the reader Can 
think of others. " 

1. The experimenter may rank or judge 
the complete set of dissimilarities. Even ! 
this takes considerable time, the effort will 
usually not be deemed to be unacceptable, 
especially in the context of the total time 
and effort required to plan and set uP the 
average experiment. F 

2. Pretests with actual subjects may be 
employed—here there is no compelling песе! 
sity for a single subject to perform the who 
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task, since only approximate information is 
required. Several subjects may each judge-a 
subset of the pairs. 

3. Pretesting with one or more subjects 
using the methods of sorting may be used. 
Subjects will be required to group the stim- 
uli in such a fashion as to maximize within- 
group similarity and minimize between-group 
similarity; the number of groups (#) may be 
fixed in advance or left to the subject but 
should in either case be small. One will sub- 
sequently choose not to collect dissimilarity 
judgments for the within-group pairs. Un- 
fortunately, if & is much larger than three or 
four, this method will not yield a sufficiently 
large number of pairs to be discarded. The 
situation may be improved, to some extent, 
by using several judges, since their groupings 
will not, in general, be identical. 

Independent of the strategy employed to 
decide which dissimilarities are to be col- 
lected, it is imperative that a sufficient num- 
ber of judgments be obtained from the sub- 
ject. Too few data values will not permit suc- 
cessful recovery. In this connection the results 
of Spence and Domoney (1974) and Spence 
(in press) should be heeded. Their results 
suggest that the minimum adequate fraction, 
F, be calculated according to the following 
formula: 


Е = 6m/(n — 1), 


where m is the number of dimensions in which 
the scaling is done, and m is the number of 
points. (See Spence, in press, for the ra- 
tionale.) For example, if 30 points are to be 
scaled in three dimensions, at least 62% of 
the possible pairwise judgments should be 
collected, As n increases, this fraction be- 
comes smaller; however, if high dimensional 
solutions are required, the necessary fraction 
will be larger. 

In conclusion, we would like to make a com- 
ment on a piece of folklore familiar to all 
multidimensional scalers. This is the caution 
against interpreting, or attaching significance 
to, the relative positions of points that are 
close together in the recovered space. In 
Table 1, it is seen that in the no-deletion con- 
dition the mean absolute error does not vary 
much over the small, medium, and large dis- 
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Figure 5. An illustration of the problem of attach- 
ing significance to the relative positions of points 
that are close together. 


tances. This means that relative error in the 
smaller distances is much greater than in the 
large distances. This effect is also seen in the 
cyclic, small, and medium-deletion condi- 
tions, although not in the large-deletion: condi- 
tion. We have tried to illustrate the conse- 
quences of this phenomenon in Figure 5. As- 
suming that the error in a recovered distance 
is a function of the uncertainty in locating 
the points, it seems reasonable, given the re- 
sults in Table 1, to assume that this uncer- 
tainty is about equal for all points, thus pro- 
ducing a constant mean absolute error in the 
distances. Some possible recovered locations 
are indicated in Figure 5 by the open and 
solid circles for both a small and a large true 
distance. These are in the same relative posi- 
tions in the regions of uncertainty. It is clear 
that one is on much surer ground when con- 
sidering the relative location of points that 
are far apart. 
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Toward a General Model of Small Group Productivity 


Samuel Shiflett 
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A general model of small 
encompass a wide variety of models of 


group performance is proposed that is designed to 


group performance extant in the psycho- 


logical literature. These existing models are shown to be special cases of the 


more general model proposed here, 
a foundation of these more restricted 


and in fact, this model is developed from 
models. In developing the general model, 


distinctions are made between single and multiple resources and between unique 


and redundant resources. The relation: 


decision-making tasks is demonstrated 


of the model’s usefulness for the understanding of group processes, 


ship between problem-solving tasks and 


by means of the model. Some examples 
leadership, 


and social decision schemes are presented. 


Research on small group performance has 
been moving steadily toward a stance in which 
performance, actual or potential, is charac- 
terized by some sort of mathematical model 
involving various situational and personal 
characteristics as input parameters. One result 
of this theoretical trend has been a proliferation 
of fairly simple models that are rather restricted 
in their application because of various limiting 
assumptions frequently involved in the de- 
velopment of the models. The purpose of this 
article is to demonstrate that a number of 
existing models of group performance can be 
thought of as special cases of a single, more 
general model. The development of the model 
begins below with the presentation of several 
simple models extant in the literature and 
then proceeds to the more complex models, 
all the while demonstrating that each of these 
currently existing models can be treated as a 
special case of a single, more general model. 
In the process, several issues relating to the 
creation and use of mathematical models are 
touched on. The article concludes with a 
discussion and illustration of how a mathe- 
matical model of this sort can guide the 
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development of so-called content-oriented 
theories. Finally, the possible application of 
the proposed model to such topics as leader- 
ship, organizational structure, and group 
decision making is illustrated. 

To begin with, three general classes of 
variables are proposed: resources, transformers 
and outputs. Outputs are defined as any 
products that can be considered as an outcome 
of group interaction and include objective 
measures of group performance but might well 
include more subjective measures such as job 
satisfaction, Resources constitute “all the 
relevant knowledge, abilities, skills, or tools 
possessed by the individual (s) who is attempt- 
ing to perform a task" (Steiner, 1966, p. 274). 
Resources are the raw materials that are 
essential for the creation of the product and 
without which the product could not exist. 
Transformers constitute all the variables that 
have an impact on resources and determine 
the manner in which they are incorporated 
into and related to the output variables. 
Transformers include such variables as situa- 
tional and task constraints, role systems, and 
certain personal characteristics that may affect 
the way personal tas! -relevant resources are 
utilized in the output. An input variable can be 
a resource in some circumstances and a trans- 
former in others, depending on the nature of 
the group product and the function or purpose 
of the resource. For example, oratorical ability 
acts as a resource when the group product is a 
theatrical piece, but it is a transformer when 
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the thespiantics are used during group inter- 
action to influence the manner in which re- 
sources are incorporated into a group product 
not directly requiring or reflecting any theatri- 
cal resources. 

This particular categorization of variables 
was chosen because it was both simple and 
adequate for the development of the general 
model. The system is not even particularly 
original; it simply groups variables within a 
slightly different scheme than other investi- 
gators have suggested. For examples of slight 
variations on this scheme, see Hackman and 
Morris (1975), McGrath and Altman (1966), 
Naylor and Dickinson (1969), and Steiner 
(1972). 

To summarize, input and output variables 
have been categorized in a manner that allows 
the model to be stated, in it simplest form, as 
Р = f(T, R), where P represents the group 
output or product, 7 stands for transformer 
variables, and R represents resource variables. 

In 1966, Ivan Steiner published an important 
theoretical article in which he outlined five 
basic models of small group performance, 
relating his models to others already existing 
in the literature. He subsequently wrote a 
book (Steiner, 1972) on small group produc- 
tivity, using that article as the organizing 
foundation for a more elaborate consideration 
of group processes. The first four of Steiner's 
models involve a single type of resource; 
the fifth model represents situations requiring 
more than one resource for the group product. 
The single-resource models are discussed 
together as a first step in the elaboration of the 
general model. The multiresource model is 
then considered as a somewhat more complex 
level of the general model. 


Single-Resource Models 


Steiner's (1966) additive model describes 
situations in which the task demands require 
every member of a group to perform the same 
function in a manner that causes the members? 
resources to enter the potential group product 
in a simple, additive manner. Under these 
conditions, average potential group produc- 
tivity will vary as a positive, linear function 
of group size. The disjunctive model describes a 
situation in which the potential productivity 


of a group is determined entirely by the 
resources of its most competent member. The 
conjunctive model describes tasks in which 
potential productivity is restricted to a level 
that is established by the group's least com- 
petent member. The compensatory model de- 
scribes a situation in which biases or errors in 
individual resources or products, such as 
judgments, are normally distributed and thus 
tend to cancel themselves or compensate among 
themselves and the group product is essentially 
an average of all the members’ resources. 

It is possible to demonstrate that all of 
Steiner’s models can be treated as special cases 
of a more general model, which can be written 
as 


P-YTR, (0 
б] 


where P represents (actual) group produc- 

tivity, R is the task-relevant resource of the 

ith person, and T is a weight representing the 

sum total of all constraints operating on the 

utilization of resources. In the case of Steiner’s 

additive model for predicting potential ргодис 
tivity, T has the same value for all individuals 
in the group and can be written 


P = nTR, (2) 


where P represents potential productivity, Ё 
represents the average ability of each group 
member, and is the group size. It should be 
noted that the formula for actual productivity 
(P), strictly following Steiner's model, would be 


P — potential productivity — motivation and 
coordination losses. 


However, in the present case, motivation and 
coordination losses are considered to be a part 
of T acting on R in a multiplicative manner. 
Thus, T can be thought of as a function of task 
constraints, motivation losses, coordination 
losses, and in fact, any other variable that 
impinges on group resources in such a way 
as to change potential productivity from the 
ideal case expressed in Equation 1, when 
Tz1. 

With the additive model now in hand, it is a 
simple matter to show that both the disjunc- 
tive and conjunctive models are also special 
cases of the general model, in which group 
members are thought of as constituting an 
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ordered set of individuals. To show this, 
Equation 1 can be rewritten to reflect the 
rank ordering of members from least capable 
to most capable, as follows: 


Р = Т.К + Трака c o 


+ TuaRwa + TuRu. (3) 


Here, L represents the least capable member 
and M represents the most capable member. 
When task and situational constraints are such 
that only the best member's performance 
determines productivity, that is, if task con- 
straints are disjunctive, then Tw > 0, all other 
T — 0, and the disjunctive model is charac- 
terized by the equation, 


Р = Tu Ry. (4) 
Note that Tw is not necessarily equal to 1, 
since other constraints can still moderate the 
extent to which resources become products. 
However, Тм is the only transformation weight 
greater than 0. As was pointed out by Steiner 
(1966) and Davis (1969), this model is essen- 
tially the same as Lorge and Solomon's (1955) 
“Model A," Thomas and Fink’s (1961) 
“rational model,” and the model proposed by 
Taylor (1955). Also closely related is Steiner 
and Rajaratnam's (1961) method for predicting 
group competence levels. 

In a similar fashion, when task and other 
situational constraints are conjunctive in 
nature, Ті, > 0, and all other T = 0, resulting 
in the conjunctive model, 


P= Трку. (5) 

The compensatory model can similarly be 
expressed in terms of the general Equation 1 
if T; takes the value C;/n, where n is the group 
size and С; represents the impact of all other 
constraining variables, including the distri- 
bution of errors. In fact, any averaging model 
is a transformation of the general additive 
model in which T has been appropriately 
adjusted for the group size. 

Steiner (1972) subsequently introduced the 
concept of a discretionary lask to describe 
conditions in which the group members can 
combine their individual resources in any 
manner they wish. This is another way of 
saying that all four models described above 
are special cases of the general model except 
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that, for Steiner, a discretionary task implies 
that group members themselves are able to 
determine what values Т takes instead of the 
values being imposed by situational charac- 
teristics not under group control. Similar 
points have been made by Hackman and 
Morris (1975) and Shiflett (1972). 


Multiple-Resource Models 


Steiner’s (1966) final category of task types, 
the complementary tasks, are designed to deal 
with cases in which a single individual per- 
forms only a part of the total task while other 
group members, possessing different kinds of 
resources, perform the remaining parts. Steiner 
further subdivided complementary tasks into 
those with wnshared resources and those with 
partially shared resources. Steiner originally 
argued that to be complementary the task 
must be divisible into subtasks and that the 
resources required for the various subtasks 
must differ in a qualitative manner. Subse- 
quently, Steiner (1972) emphasized the divisi- 
bility requirement by referring to these types 
of tasks as divisible tasks and stressed the idea 
of appropriately matching resources to the 
requirements of the various subtasks. Laughlin 
and Branch (1972) suggested that although 
this strategy can be referred to as a division of 
labor, it can also be reasonably seen as a 
division of resources. Thus, in order to optimize 
or maximize group performance, the division 
of resources must be coordinated with the 
partitioning of the task, whether accomplished 
by the group itself or by a superordinate 
organization. 

Under these conditions the distinction be- 
tween shared and unshared resources dis- 
appears, since by appropriate assignment of 
individuals to subtasks, resources may be 
redistributed so that they are not shared on 
any subtasks even though they are shared 
within the group as a whole. For this reason, 
the unshared resources model can be treated 
as a special case of the more general shared 
resources model. Similarly, it is possible to 
demonstrate that all the single-resource models 
are special cases of the unshared resources 
model, which, as we have just shown, is a 
special case of the shared resources model. 
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Figure 1. Venn diagrams illustrating various distributions of resources within a group. 


To illustrate these propositions, set theory 
operations can be utilized to describe a series 
of identities. The general model, expressed in 
Equation 1, can now be redefined as the 
union (UJ) of all members’ resources available 
for the group product: 


P= TıRı U ТК U TOR 
U T.aR&aU T.R, (6) 


Т; acts as a constraint on R;, changing it from 
the total amount potentially available from 
the group member into the amount actually 
available as a result of various task and situa- 
tional conditions, Proceeding as before, we can 
now redefine the four basic models already 
described. 
The additive case exists when 


TiRiU T:R: U SEN U Т.К 
= К.к, + ТЕ + · + Т.К, 


= 2 TR. (7) 


This situation can be illustrated by the use of 
Venn diagrams, as in Figure 1a, and is the case 
defined by Equation 1. That is, in terms of 
Figure 1a, the additive case exists when 
AUBUC=A+B4+C. 


The disjunctive case exists when 
TıRı U T4R;U +++ UT,R, = TuRu. (8) 


Here, Тм > 0, and all other T = 0. The 
conjunctive case exists when 


TRU TR: U ---UT,R, = ТК. (9) 


In this case, Tr > 0, and all other 7 = 0. 

It should be evident that in keeping with 
the assertion that all three models are special 
cases of the more general model, the left side 
of the three expressions does not change, for it 
is the general model. 

The disjunctive and conjunctive cases can 
occur in several different ways, which are 
illustrated in Figures 1b, 1c, and 1d. In Figure 
1b, the resources of the least able member (B) 
are completely encompassed by those of the 
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most able member (A). This overlapping of 
resources is sometimes referred to as redun- 
dancy. In this situation, if no other constraints 
exist, then Member B is unable to add any- 
thing more to the product than what Member 
A already has the potential to contribute. In 
other words, the disjunctive case exists and 
AUB = A. On the other hand, if some 
constraint exists so that the appropriate com- 
bination rule is A U B — B, then the conjunc- 
tive case exists. This situation is illustrated in 
Figure 1c, where the dashed line symbolizes a 
constraint preventing the use of A's superior 
resources. 

In Figure 1d, resources are not shared, in 
contrast with the shared resources situation 
of Figures 1b and 1c. The dashed line again 
represents external constraints that operate in 
a manner that prevents one member from 
utilizing his or her resources or in which 
productivity is totally dependent on one 
member. If productivity is determined by the 
most able member, then this particular situa- 
tion is disjunctive, but if productivity is totally 
dependent on the least able member, then the 
situation is conjunctive. It should also be 
evident that the additive model (Equations 
1 and 7 and Figure 1a) is equivalent to the 
unshared resources model, whether or not the 
resources are qualitatively similar. 

Compensatory situations can exist in à 
variety of ways, but it can be said that a 
compensatory model is appropriate when the 
various constraints on resources operate in 
such a way that the following identity is true: 


RURU -U RU Ra 
= (ко. (10) 


i=l 


The Model in Terms of Matrix Operations 


Resources within a group can be thought of 
as being arrayed in a resource matrix, R, in 
which rows are defined by group members (m), 
and columns are defined by a discrete, mutually 
exclusive form of resource (a), 


Gy ds gene О) 
тл [та 712 Tin 
T» | 721 (11) 


Tm2 Tma. 


Entries consist of the amount of each resource 
(r) provided by each member. In a single- 
resource situation the matrix would be a 
column vector. In all cases, resources are 
assumed to be directly relevant and necessary 
for creating the group product, whatever it is 
defined to be. Thus the nature of this matrix 
will vary widely depending on the nature of 
the group product. Often, each column will 
represent a qualitatively different resource. 

These different resources must be combined 
in some fashion to finally represent a group 
product. This combination rule usually in- 
volves rescaling the resources for proper 
reflection in the product, but it also requires 
specification of a particular rule. Although this 
task is formidable, several approaches to the 
problem have been suggested. Job analysis 
(Zedeck & Blood, 1974) and synthetic validity 
(Guion, 1965) represent approaches to break- 
ing products (or their antecedent tasks) into 
the subsets of skills and resources required to 
accomplish the task. The entries can vary in 
nature from interval measures of skills and 
abilities to probabilities ranging from 0 to 1 
of a certain response being in an individual's 
behavioral repertoire. 

Implicitly or explicitly, all theories of group 
performance specify one or more rules for the 
manipulation of the resource matrix. The rules 
are specified in a transformation matrix, T. 
In the additive model, T is a vector in which 
all entries take on the same value; R is also a 
single vector, indicating the amount of the 
resource provided by each member. The con- 
junctive and disjunctive models specify a T 
vector containing all entries equal to 0 except 
for the one entry specific to either the most or 
least able member. 

As the product and the model become more 
complex, so do the matrices representing the 
model In particular, the transformation 
matrix may be the final representation of à 
whole series of matrix operations representing 
some model of group process. An example of 
this explicit use of matrix operations to 
represent group processes is structural role 
theory (Oeser & Harary, 1962, 1964; Oeser & 
O'Brien, 1967). The basic elements in the 
model consist of persons, positions, tasks, and 
the various directional relationships among 
these elements. These elements are then com- 
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bined in a series of mathematical operations 
that result in a specific index describing some 
aspect of the situation. The particular opera- 
tions are a function of the index of interest, 
but the elements are always those listed above. 
Elaborating on the basic model, O'Brien and 
his associates have developed indexes of 
collaboration requirements, cooperation re- 
quirements, certain task characteristics, and 
potential leader influence (O'Brien, 1968, 
19692, 1969b ; O’Brien, Biglan, & Penna, 1972; 
O'Brien & Owens, 1969; Witz & O'Brien, 
1971). Basically, then, structural role theory 
can be thought of as one approach to specifying 
the transformation matrix, T. 


Unique Versus Redundant Resources 


We have already noted the fact that re- 
sources can be distributed within a group and 
applied to the group product in a variety of 
ways. Continuing the use of set notation, 
resources are characterized below as either 
duplicated (redundant) or unique. If two 
individuals" resources are characterized by the 
Venn diagram illustrated in Figure 1e, then the 
following sets can be defined: A — Member 
A's total resources; B — Member B's total 
resources; Ол = A's unique resources, those 
not possessed by B; Us = B's unique re- 
sources; and D — redundant or overlapping 
resources possessed by both A and B. From 
these definitions the following sets can 
be further identified: A 2 (Ua U D); B 
2(Us U D); UD (Ua U Us). 

As a matter of fact, a total set of resources 
can be characterized in any number of ways, 
depending on the requirements of the particular 
model or theory being espoused. For example, 
some models of performance have emphasized 
the amount of unique resources available, 
sometimes referred to as the pooling of re- 
sources, as in the case of Kelley and Thibaut 
(1969) and that of the research by Laughlin 
and his associates (Laughlin & Branch, 1972; 
Laughlin, Branch, & Johnson, 1969; Laughlin 
& Johnson, 1966). Other models have empha- 
sized the impact of the amount of redundancy, 
or overlap, on performance, for example, 
Zajonc and Smoke (1959) and Shiflett (1972, 


1973, 1976). 


The number of sets of overlapping data 
increases rapidly as group size increases. In 
groups of size two, there is one set of shared 
resources. In groups of size three, there are 
four sets of overlapping resources, including 
the set that is common to all three members. 
This situation is illustrated in Figure 1f, and 
the sets are labeled g, h, i, and j. When N — 4, 
common resource sels = 11, and when N = 5, 
common resource sels = 25. Obviously, this 
approach to the subdivision of resources 
rapidly becomes unwieldy as group size 
increases, but it does illustrate the fact that 
the set of available resources can be broken 
into subsets, using any number of different 
criteria. And though the number of sets can 
quickly burgeon into an unmanageable size 
as group size increases, in practice this is 
unlikely to represent a major problem, since 
simplifying procedures and assumptions can 
be applied that substantially reduce the 
number of sets that would have to be dealt 
with. For example, it seems likely that many 
of the subsets of shared resources will be 
trivially small or nearly identical to other 
subsets, so resources could be regrouped into 
more manageable and possibly more meaning- 
ful sets. 

The distinction between unique resources 
and shared resources provides several interest- 
ing implications for the manner in which 
individual resources will be utilized in the 
group product when the resources consist of 
sets of discrete units of the resource and when 
the task is subdivisible. For example, it should 
now be apparent that the fact that two or more 
individuals possess the same resource does not 
increase the total set of available resources 
but does increase the probability of that re- 
source being used. Recognizing that some 
resources are shared provides one explanation 
for the frequent finding that groups are less 
productive than would be expected by à 
knowledge of each individual's resource score. 
Inother words, this distinction permits the rec- 
ognition of the fact that (AU B) < (А + В). 

Although the two sets of variables, unique 
and redundant, are potentially independent of 
each other, in most cases they probably are 
not. They must always be considered together 
in looking at group performance, even if one 
or the other turns out to be an empty set. 
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Failure to consider both sets adequately can 
lead to inadvertent confounding of one with 
the other. For example, in the series reported 
by Laughlin and his associates (Laughlin & 
Branch, 1972; Laughlin et al., 1969; Laughlin 
& Johnson, 1966), subjects of varying degrees 
of ability (resources) were composed into 2-, 
3-, or 4-person groups in factorial combinations 
of member ability. The basic purpose of the 
studies was to test Steiner’s (1966) complemen- 
tary model, which in this context asserts that 
groups with members having nonredundant 
(unique) resources will do better than groups 
with all members having the same (redundant) 
resources, assuming that all resources are 
relevant to the task product. 

The presence and amount of unique re- 
sources in a group were inferred by Laughlin 
and his associates by looking at the difference 
in ability levels of the group members. Thus, 
a member of low ability was predicted to be 
unable to contribute any new resources to 
those of a member of high ability, and produc- 
tivity would be determined entirely by the 
most able member’s resources, that is, using 
the present notation, Р = Чм +D. In the 
case of equal ability, however, it was assumed 
that both provided some unique resources бе, 
Ол > 0 and Us > 0) in addition to a core of 
overlapping resources, D. Thus, in this 
particular case, P was assumed to equal 
Ua + Us + D and was predicted to be greater 
than when either member worked alone or 
with a member of lesser ability. 

The data appeared to support these predic- 
tions strongly. However, there are several 
problems in the design and interpretation of 
these studies. In the case of members of 
unequal ability, we notice that although the 
complementary model does indeed imply the 
prediction that Р = Um + D, so does the 
simpler disjunctive model. As was illustrated 
at the beginning of this section, any member’s 
abilities can be thought of as containing the 
subsets U and D. Thus the most able member’s 
abilities, Rm, are composed of unique re- 
sources, Um, as well as those shared by other 
members, D, or Ru = Um +D. By simple 
substitution we see that the prediction of the 
complementary model, P — Um + D, is identi- 
cal to that of the disjunctive model, P = Км. 


A more important problem in the Laughlin 
series consists in the definition of unique and 
redundant resources. The amount of unique 
or pooled resources is clearly varying syste- 
matically as a function of variations in the 
ability level of group members, as was asserted 
by the researchers; however, the amount of 
redundancy is also varying systematically, and 
the precise relationship of one set to the other 
cannot be determined from the data. Thus, it 
is not possible to state unequivocally that 
uniqueness of resources determined the results, 
since redundancy could equally have been the 
cause. Or more likely, some combination of 
the two sets explains the results. In order to 
partition these effects, empirical measurements 
of both redundancy and uniqueness of re- 
sources must be made. 

The difficulty in assessing redundancy in 
most small group research can be illustrated 
by looking at two individuals who each score 
50 on a 100-item vocabulary test. How much 
redundancy is represented in the two scores 
of 50? If each person answered 50 items 
correctly we could argue that there is 100% 
redundancy. However, this condition would 
hold only if both members correctly answered 
exactly the same items. If all the items are of 
equal difficulty, then redundancy could range 
from 0% to 100% in the case of 50 of 100 
items. It should be obvious, then, that it is 
not possible to infer directly the amount of 
redundancy from ability levels alone, although 
we would expect that as one member’s ability 
becomes increasingly greater than another's, 
AN BA. 

Much of the decision-making research avoids 
this problem because it is possible to precisely 
specify the redundancy term by causing it to 
be a characteristic of the environment (the T 
matrix) rather than the resource matrix (Slovic 
& Lichtenstein, 1971). Shiflett (1972, 1973, 
1976) was also able to manipulate redundancy 
by arbitrarily dividing the task in several 
ways, thus making redundancy a function of 
T instead of R. However, he was not any 
more able to estimate the actual amount of 
redundancy present in the shared-labor work 
strategy than were Laughlin and his associates 
in their shared-labor task. 

The general model, written to reflect the 
distinction between shared and unique re- 
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sources, is 


n n 
P= RUFE TAD, (12) 
i=l i=l 

where U; represents the unique resources of 
Individual 7, T; is the transform weight reflect- 
ing the utilization of those resources in the 
group product P, and 7,“ is the transform 
weight applied to redundant resources, D. T 
and 7* may differ depending on the particular 
constraints operating in a group. It might 
seem that 7* should be constant across all 
individuals, since D is defined to be, but again, 
particular constraints operating in a group 
situation may cause this not to be so. That is, 
some people's common resources are more 
likely to be used than are those of others. 


Discussion 


To illustrate how the proposed model might 
be used, two examples of translation. from 
mathematical abstractions to psychological 
realities are presented below. The first illus- 
tration comes from the area of leadership, the 
Second considers some current approaches to 
decision making. 


Leadership 


In terms of the present model, leadership 
may be defined as a resource recognition or a 
T-facilitating function. That is, leaders must 
recognize the existence of resources available 
to them through their group members, and 
they must be able to do so to the extent that 
they can locate those resources with enough 
precision that they can then take appropriate 
actions to facilitate their effective inclusion 
in the group product. Facilitative behaviors 
may include a wide variety of actions, from 
restructuring an organization in order to better 
utilize the existing distribution of resources 
to soothing angry feelings and increasing 
the motivation and morale of the group so 
that the members are willing to apply their 
resources to the task. Thus a leader's ap- 
propriate or most effective behavior can vary 
drastically, depending on the situational re- 
quirements, but in evéry case the behavior 
should have the same end result: to maximize 
P by making appropriate adjustments to the 7 
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weights over which she or he has some con- 
scious control. 

It would seem that the approach to leader- 
ship suggested here fits rather nicely with the 
ideas recently proposed by Graen and his 
associates (Dansereau, Cashman, & Graen, 
1973; Graen, 1976). They suggested that the 
appropriate unit of analysis for examining 
leadership processes is the “vertical dyad,” 
consisting of the superior and only one sub- 
ordinate of the several in a work unit. In this 
approach the vertical dyad reflects the role 
relationship linking each group member and 
his or her superior. To the extent that this 
process link affects an individual’s contribution 
to the group product it is represented in the 
model by 7. Each vertical dyad can be 
represented by a single number reflecting 
“influence,” and the set of all vertical dyads 
in a work unit can then be written as a vector 
T, containing a series of ts, one for each group 
member or vertical dyad. It will be recalled 
that we pointed out that the T matrix is the 
culmination of what might be a whole series 
of preliminary matrix operations reflecting 
various aspects of the situation. What has 
traditionally been called leadership style is then 
simply a description of the various components 
of the summary transformation index that 
the leader actually influences in his or her 
attempts to increase performance. 

The effectiveness of any leadership style is à 
function of the context within which the style 
is exercised. In some situations it would be 
more appropriate to emphasize organizational 
factors, whereas in others it would be more 
appropriate to influence motivational factors. 
Obviously, a good leader is one who emphasizes 
the right components at the right time. A 
flexible leader is one who can change emphasis 
or style as situational requirements change. 
This approach also allows us to see a common 
ground uniting Graen’s approach with Likert's 
(1958) idea that leadership is an adaptive 
process and Naylor and Dickinson's (1969) 
idea that a group's communication structure, 
which includes leader communication, 1$ 
dependent on task structure and work 
Structure. 

Vroom and Yetton (1973) proposed a model 
of leadership as a decision-making process that 
also appears to be related to the approach being 
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proposed here. They suggested that a leader 
has a series of discrete, alternative behavioral 
processes that can be used in a problem-solving 
situation. The alternatives can range from 
behaviors that could be labeled “autocratic” 
through those called "participative." The 
leader must make a series of sequential 
decisions in evaluating each situation in order 
to determine the most appropriate behavior 
for that situation. Thus, their model stresses 
the need for behavioral differences on the part 
of the leader over situations. This same 
implication is a. keynote of the vertical-dyad 
approach of Graen and his colleagues and is a 
direct implication of the present model. 

It is noteworthy that the effect of a partici- 
pative style of leadership is effectively to 
assign an equal weight to all members' re- 
sources, whereas the autocratic style is 
characterized at its extreme as one with a 
matrix of group members’ weights equal to 0 
and a leader weight equal to 1. 

Vroom and his associates noted that maxi- 
mizing group efficiency or productivity is not 
always the only criterion for determining a 
behavioral style. Other considerations, such as 
humanistic concerns or quality of interaction 
or a pragmatic desire to develop certain social 
characteristics of the members might also 
influence the particular leader style. The latter 
decision base was termed Model B by Vroom 
and Vetton (1973), and the decision model 
involving only group productivity was termed 
Model A. Using the rationale of the present 
model it appears that Model A is in fact a 
subset or special case of the more general 
Model B. By defining group productivity, P, 
as the set of all possible outcomes and products 
resulting from group processes, then P is a 
set containing not only traditionally defined 
group performance criteria but also group-level 
perceptions such as morale or group atmo- 
sphere and individual-level perceptions, atti- 
tudes, and characteristics such as job satis- 
faction and feelings of acceptance, worthwhile 
accomplishment, or personal growth. To each 
of these elements in the product set a weight 
reflecting something like importance or rele- 
vance can be attached forming a vector, D 
with elements equal in number to those 
contained in the P vector. These importance 
or relevance weights contained in the I vector 


are frequently, but not always, determined by 
some authority external to the group or by 
the leader of the group. Thus the president 
of a production firm may determine that the 
only relevant product from an assembly line 
is the physical product being assembled and 
nothing else. In this case, a relevance weight 
of 1 is attached to the group product and 
weights of 0 are attached to all other non- 
material group outcomes, resulting in an 
extreme example of Model A. Obviously, 
there is no reasonable case in which all po- 
tential outcomes can be simultaneously con- 
sidered, but any case in which one or more 
nonmaterial outcomes are given weights 
greater than 0 can be considered to be a 
variation of Model B. Variants of Model B 
can be defined by the appropriate importance 
vector. Thus sensitivity, therapy, or team- 
building groups would usually be characterized 
by a vector containing weights of 0 for material 
group products but large weights for personal 
or interpersonal characteristics deemed im- 
portant to the group. 


Decision-Making Tasks 


In a previous section it was suggested that 
one useful way of conceptualizing the resource 
matrix was in terms of its redundancy. A 
summary vector of redundancy could consist 
of a row of entries, each of which indicates 
how many members have a particular resource. 
Thus the vector would contain as many 
entries as there are defined resources. Using 
this concept it is possible to demonstrate a 
close relationship between problem-solving or 
general performance tasks and decision-making 
tasks. I am suggesting that many decision 
tasks require as their basic matrix the re- 
dundancy vector, whereas problem-solving 
tasks require the summary resource matrix or 
some combination of the resource and re- 
dundancy matrices. The reason for this 
distinction is found in the distinctly different 
way in which resources are incorporated into 
the group product in the two types of tasks. 
Resources (or response alternatives) in a 
decision task are mutually exclusive, and the 
use of one response alternative precludes the 
use of the remaining response alternatives. 
That is, if my response alternatives are yes and 
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n0, I cannot use both responses at the same 
time. If I try to, I really have not made a 
decision. In a problem-solving task, on the 
other hand, resources can be combinatorial 
so that the use of one resource does not pre- 
clude the use of additional resources but 
actually supplements or combines with other 
resources to reflect the final product. 

Pursuing the special characteristics of 
decision tasks, we find that there is typically 
a clearly defined set of response alternatives, 
which is frequently quite limited in scope, 
often consisting of only two or three alter- 
natives. Because of this fact and because the 
response alternatives are mutually exclusive, 
it is convenient to think of each response 
alternative as a qualitatively distinct resource. 
Thus, a two-alternative decision task is defined 
as requiring two resources, a three-alternative 
decision task has three resources, and so forth. 
Under these circumstances, the concept of 
redundancy takes on special importance. By 
definition, each member has only one resource 
(response alternative). Thus it is not possible 
for subjects to have multiple resources. Because 
response alternatives are always mutually 
exclusive within a group member, the R 
matrix has the characteristic that each subject 
vector contains entries consisting of r — 1 05 
and a single 1, for example, 


A, А; Аз 

Mie 00 
SOR RO 1 
R= 510 1 0 
DII 09.0 


This characteristic simply is reflecting the fact 
that when one response is made, the other 
cannot occur. The redundancy matrix, D, is 
then the simple sum of resources across 
subjects; using the same example as above, 


A; А; А; 
ЛИ ој 


Because of the way decisions are typically 
made, it is the redundancy vector that is the 
primary resource matrix of interest in this 
type of task. Transformation rules (T matrix) 
thus operate on the D matrix of redundancy 
rather than on the original resource matrix, R. 

Recently, Davis (1973), in an important 
theoretical article, proposed a model for 


describing the nature of the social processes 
leading to a group decision that is a perfect 
illustration of this conceptualization of re. 
sources. His theory assumes that, as suggested 
above, response alternatives defined by the 
decision task are mutually exclusive and 
exhaustive. An individual probability distri: 
bution is assumed to exist that characterizes 
the population from which a decision-making! 
group is formed. Randomly formed groups 
from such a population display an internal 
distribution of preferences that Davis called а 
“distinguishable distribution" and that is 
identical to the redundancy vector in the 
present model. To quote the succinct summary: 
statement by Davis, Kerr, Sussman, and 
Rissman (1974), "In this system, response 
alternatives, but not particular people, are 
considered distinguishable” (p. 250). In the 
Davis model the population’s individual 


dundancy vector) is operated on by a “social 
decision scheme" matrix to yield an estimate 
of the probability of a group deciding for eacli 
alternative. The social decision scheme matrix | 
is a mathematical statement describing how у 
each possible group-preference distribution is 
resolved through social interaction, and as 
such, is equivalent to the presently defined 
transformation matrix, T. 

Davis's (1973) approach can be seen asi 
Special treatment of unique and redundan 
resources in which the social decision scheme 
determines or influences the relative impor 
tance of the unique and redundant resources: 
In this scheme, a shared resource is any 
resource that has more than one adherent 
in the group, and a unique resource is one thà џ 
has only one proponent. Thus, with respect 
to the set of response alternatives, 41, 4»* "Ам 
and occupancy numbers (redundancy), "is 
т-у, it can be said that when ra = 1 for 
any Ал, that set is a nonredundant or uniqui 
set. And when r, > 1 for any Aq, then the sel 
is a redundant set. 

The impact of redundancy on the final 
outcome will be determined by the 5008 
decision scheme operating on the resources 
Some of the social decision schemes have beel 
given psychologically descriptive titles bj 
Davis, examples of which are “majoritys 
equiprobability, plurality, proportionality, 2 
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truth-wins.” These matrices can represent 
very formally presented rules that are agreed 
on by the group, or more subtle and un- 
articulated influences such as cultural mores, 
or even task characteristics. , 


Conclusion 


A number of factors have been shown to 
influence the relative contribution of resources 
(R) to the group product (P). In addition, 
many factors potentially enter into the system 
of formulas for estimating 7. In fact, the 
system is usually so overdetermined that 
unique solutions seldom exist. Any number of 
different combinations of group-task or trans- 
formational factors can result in essentially 
the same T matrix. Thus a host of influences 
might affect a group, both internally and 
externally, and yet the group can still make 
adjustments so that little change in actual 
productivity occurs. However, many im- 
portant constraints on the resource-product 
relationship are often not amenable to change. 
Under these conditions many factors used for 
estimating 7 have the status of constants and 
the system rapidly moves from its over- 
determined status to one in which there are 
few solutions. Any attempts to improve or 
change group performance by affecting the T 
matrix are severely handicapped under these 
conditions. A simple example illustrating this 
situation would be a routinized assembly line 
in which all operations are highly determined 
and possible variations in operating procedures 
are severely limited. 

Of course, in some cases neither the T nor 
the R matrix can be substantially changed 
without incurring great costs. The assembly 
line again serves as an example. Once workers 
have been adequately trained, the resource 
matrix has reached a maximum—There is no 
way to increase resources within the constraints 
of the assembly line. To the extent that chang- 
ing the assembly line itself (i.e. the T matrix) 
involves extensive retooling of expensive 
machinery, neither matrix is likely to be 
changed unless other considerations that are 
unrelated to the R or T matrix, such as social 
or humanitarian concerns, are brought to bear 


on the decision to change the situation. 


The extent to which the present model will 
adequately characterize a group's resource- 
transform-product system is partially de- 
pendent on the time frame within which the 
model is to characterize the group. In general, 
the longer the time interval, the more oppor- 
tunity there is for unexpected perturbations 
in the environment to occur and thus for the 
model to be inadequate. For example, the 
amount and distribution of resources within 
the group do not have to remain constant. 
Thus the amount of resources available within 
the group can change drastically over a very 
short period simply as a result of the trans- 
formation processes within the group. A group 
can start with each member bringing unique 
resources to the task at hand; that is, task- 
relevant redundancy equals 0. But as a result 
of group interaction, not only has a product 
been created that presumably reflects the 
various resources but the group members may 
have distributed duplicate resource sets among 
themselves. Task-relevant redundancy now 
approaches 100% as a result of learning. When 
this occurs, the resource matrix changes 
substantially, and this in turn may affect the 
adequacy of the existing T matrix. 

As the group interacts over time, interim 
products occur both in terms of the current 
status of the formal group product and in terms 
of interpersonal relations and individual 
perceptions. These interim outputs are fre- 
quently continuous in nature, but by arbi- 
trarily dividing the group interval into small 
time units, a series of interim products can be 
identified as the group progresses toward its 
final goal. These interim product states act as 
inputs to the next time frame, influencing the 
manner in which member resources can be 
incorporated into the group product. If, for 
example, performance on a subtask changes 
the nature of the resources available for the 
remaining portion of a task, the final group 
product can also be expected to differ from 
what might be expected from a knowledge of 
the resources available at the beginning of the 
task interval. The Lorge and Solomon (1955) 
Model B is an early attempt to incorporate 
this fact of life into a mathematical model. 
Restle and Davis (1962) and Davis and Restle 
(1963) inferred group-performance charac- 
teristics over time as a function of the number 
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of distinct, sequential subtasks within a group 
task. Recently, Davis (1973) presented an 
interesting analysis of the potential effect of 
sequential decision making on interim resource 
matrices as a decision-making task moves up 
an organizational hierarchy. 

The model proposed here attempts to 
demonstrate that a number of existing models 
of group performance can be seen as existing 
within a single, unified conceptual framework 
that still retains a fair degree of simplicity. In 
fact, given the present framework, it is possible 
to see that most of the models in the literature 
are really only special cases of this model, in 
which a specific pattern of 7 weights is 
postulated to have a specific effect on group 
resources (R). This approach has often taken 
the form of comparing two or more 7 patterns 
(e.g., conjunctive vs. disjunctive, or equi- 
probability vs. majority rule vs. truth-wins). 
In some cases the R matrix is the primary 
focus. Only rarely does one come across 
careful consideration of both the T matrix 
and the R matrix in the same study. Un- 
fortunately, these cases are restricted almost 
entirely to the decision-making literature (e.g. 
Davis, 1973; Slovic & Lichtenstein, 1971). 

It should be kept in mind that the present 
model can be thought of as a representational 
framework, incorporating a variety of smaller 
models. As such, in its current form it has only 
limited capability for generating precisely 
formulated predictions. In other words, the 
model does not represent a new theory of 
group productivity, nor does it solve the very 
real problems of assessing resource and trans- 
formation variables or of determining the best 
fit between the T and R matrices. Research 
along all of these fronts is in progress and 
appears to be making slow but steady advances. 
Hackman and Morris (1975) presented an 
excellent review and discussion of many of 
these practical problems. I hope that the 
approach presented here, by illustrating the 
need to consider the transformation and re- 
source variables simultaneously and by illus- 
trating the underlying similarities among a 
variety of existing models, will provide a 
general framework within which these other 
issues can be more clearly defined and resolved. 

Research incorporating both transformation 
and resource variables is growing and has, I 
believe, an exciting future. Questions about, 


for example, the ideal T weights for a given 
resource set are now being asked, as well as 
broader questions regarding the effects of 
various patterns of 7 weights where charac. 
“teristics of the R matrix are not entirely known, 
Since ideal weighting is often impossible 
because of lack of information or other syste. 
matic distortions in weighting due to group 
processes (Steiner, 1972), research is necessary 
to determine ways of minimizing bias or to 
determine the effects of various conditions on 
probable weightings. One method of attacking 
this problem has been to compare the effective- 
ness of various formally defined weighting 
schemes on group judgments. The research 
program of Davis (1973) and his associates 
illustrates this approach. Other recent research 
has demonstrated that an averaging or equal- 
weighting model outpredicts differential- 
weighting models in a variety of conditions 
and may be especially effective when the R 
matrix is not known (Einhorn, Hogarth, & 
Klempner, 1977). A different but related: 
approach would be to determine the optimal 
distribution of resources for a fixed T matrix. 
The approach would be applicable to situations 
in which there is little freedom to vary the | 
surrounding environment. | 
АП of these approaches, with their different 
methods, terms, and focuses, are attempts t0 
capture a general process by which individual 
resources are transformed into a group product 
through group processes. The present frame 
work represents an early step toward bringi 
together these diverse models into a unified 
approach to the study of small groups, by 
concentrating on the similarities among th 
models. It is hoped that in doing so, future 


meaningful patterns and that the curren 
state of affairs in the study of group dynamk 
in which theoretical integration is almos 
nonexistent (Shaw, 1976), will begin to mov 
toward a state of genuine integration. 
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Comparative Effectiveness of Paraprofessional 
and Professional Helpers 


Joseph A. Durlak 
Southern Illinois University at Carbondale 


Forty-two studies comparing the effectiveness of professional and paraprofes- 
sional helpers are reviewed with respect to outcome and adequacy of design. 
Although studies have been limited to examining helpers functioning in narrowly 
defined clinical roles with specific client populations, findings have been con- 
sistent and provocative. Paraprofessionals achieve clinical outcomes equal to or 
significantly better than those obtained by professionals. In terms of measurable 
outcome, professionals may not possess demonstrably superior clinical skills 
when compared with paraprofessionals. Moreover, professional mental health 
education, training, and experience do not appear to be necessary prerequisites 
for an effective helping person. The strongest support for paraprofessionals has 
come from programs directed at the modification of college students' and adults' 
specific target problems and, to a lesser extent, from group and individual ther- 
apy programs for non-middle-class adults. Unfortunately, there is little informa- 
tion on the factors that account for paraprofessionals' effectiveness. Future 
studies need to define, isolate, and evaluate the primary treatment ingredients 
of paraprofessional helping programs in an attempt to determine the nature of 


the paraprofessional's therapeutic influence. 


Evaluations of research involving parapro- 
fessional therapists have been highly positive. 
Reviewers concentrating on the use of college 
students (Gruver, 1971), community volun- 
teers (Siegel, 1973), and parents (Berkowitz 
& Graziano, 1972; Johnson & Katz, 1973; 
O'Dell, 1974) or discussing paraprofessional 
therapy outcome research in general (Karls- 
ruher, 1974) have agreed that relatively un- 
trained workers can make a significant service 
contribution within the mental health feld. 
Conspicuously ignored, however, has been the 
research comparing the relative effectiveness 
of paraprofessionals and professionals. This 
article attempts to review and evaluate this 
comparative research with respect to outcome 
and adequacy of design. 

A broad range of helping roles is considered. 
Studies selected for review include individual 
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and group psychotherapy, crisis counseling | 
behavior modification, social and vocational 
rehabilitation programs, and academic-adjust- 
ment and mental-health-related services. Ana- 
logue or otherwise simulated therapeutic in- 
teractions and sensitivity or T-group experi- 
ences are omitted. 

In keeping with traditional distinctions and 
definitions, individuals who have received 
postbaccalaureate, formal clinical training in 
professional programs of psychology, psychi- 
atry, social work, and psychiatric nursing are 
considered to be professionals. Those who 
have not received this training are рагарго- 
fessionals. 

The present review is divided into three 
sections. First, design criteria used to assess 
the methodology of comparative studies are 
described. Second, the characteristics and out- 
come of research studies are summarized, and 
the conclusions that present findings support 
are discussed. Third, experimental problem 
and limitations in current research are re- 
viewed and suggestions for future work ar 
offered. 
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Methodological Considerations 


Luborsky, Singer, and Luborsky (1975) re- 
cently used 13 methodological criteria to eval- 
uate the quality of research comparing al- 
ternative therapeutic approaches conducted 
by professionals. These criteria were modified 
ior application to paraprofessional-profes- 
sional comparative research. Two of the cri- 
teria used by Luborsky et al. regarding the 
need for experienced and mutually competent 
therapists were omitted, and two criteria, re- 
quiring multi-outcome and follow-up measures 
of client change, were added. The methodo- 
logical criteria stressed such requirements as 
adequate sample size, equivalency of client 
groups, controls for expectancy effects and 
concurrent treatments, independent аѕѕеѕз- 
ment of outcome, multi-outcome assessment 
of change, and at least one follow-up measure 
of client progress. For further details see 
Luborsky et al. (1975). 

Forty-two studies comparing the effective- 
ness of professional and paraprofessional help- 
ers were located. The experimental quality of 
each study was evaluated according to a 5- 
point index by applying the 13 design criteria. 
The grading system was not intended to be 
exact, but only to approximate the relative 
research sophistication of selected studies. As 
a partial check of the evaluation system, а 
second judge was asked to apply the criteria 
to five randomly selected studies. Interrater 
agreement was 92%. 


Research Findings and Conclusions 


Table 1 summarizes the characteristics, out- 
come, and experimental quality of the forty- 
two comparative studies. These studies repre- 
sent a diversity of paraprofessional helpers, 
clinical settings, client populations, service 
programs, and target problems. Experienced 
psychologists, psychiatrists, and social work- 
ers typically constituted the professional ther- 
apist group. Only 10 studies exclusively used 
advanced clinical trainees as professional ther- 
apists (Elliott & Denney, 1975; Getz, Fujita, 
& Allen, 1975; Jensen, 1961; Kazdin, 1975; 
Levitz & Stunkard, 1974; Lindstrom, Balch, 
& Reese, 1976; Moleski & Tosi, 1976; Penick, 


Filion, Fox, & Stunkard, 1971; Ryan, Krall, 
& Hodges, 1976; Wolff, 1969). 

The studies are grouped in Table 1 accord- 
ing to five major categories of helping services. 
The 19 studies in Group 1 involved individual 
and group psychotherapy or counseling pri- 
marily for moderately to severely disturbed 
adults; the four studies in Group 2 dealt with 
academic counseling for college students; the 
three studies in Group 3 involved crisis inter- 
vention for adults; and the 13 studies in 
Group 4 dealt with specific target problems 
of college students (л = 5), adults (и = 6), 
and children (п = 2), such as obesity, stutter- 
ing, insomnia, test and speech anxiety, and 
enuresis. Three studies fell into an other cate- 
gory (Group 5); these included Wolff’s 
(1969) interpersonal training groups for nor- 
mal college students, Schortinghuis and Froh- 
man's (1974) cognitive tutoring program for 
handicapped preschool children, and Lamb 
and Clack's (1974) orientation program to a 
campus counseling center to increase use of 
this mental health resource among college stu- 
dents. 

Thirty-five comparative studies have used 
multi-outcome measures of client change, and 
27 studies have collected follow-up data on at 
least one measure. Representative outcome cri- 
teria have included performance on standard- 
ized psychological tests and various psycho- 
metric instruments (Lick & Heffler, 1977; 
Schortinghuis & Frohman, 1974), clients’ self- 
reported change and satisfaction with services 
(Getz et al, 1975; Lamb & Clack, 1974), 
clinical ratings offered by independent judges 
(O’Brien, Hamm, Ray, Pierce, Luborsky, & 
Mintz, 1972), information from significant 
others (Miles, McLean, & Maurice, 1976; 
Wolff, 1969), academic or work performance 
(Mosher, Menn, & Matthews, 1975; Zunker 
& Brown, 1966), behavior ratings (Appleby, 
1963; Ellsworth, 1968), analysis of therapist- 
offered empathy, warmth, and genuineness 
(Knickerbocker & McGee, 1973; Truax, 
1967), performance in role-playing or in vivo 
situations (Fremouw & Harmatz, 1975; Mo- 
leski & Tosi, 1976), therapist improvement 
ratings (Karlsruher, 1976), supervisor evalua- 
tions (Covner, 1969; Magoon & Golann, 
1966), criteria specific to treatment goals, 
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Table 1 


Characteristics, Outcome, and Experimental 
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Paraprofessional and Professional Helpers 


Se EE NM 


Quality of Comparative Studies of 


(20, 9) 


Experi- 
mental Paraprofessional Client and helper Results significantly 
Study quality helpers sample size* favoring 
Group 1: Individual or group psychotherapy or counseling 
Ellsworth (1968) A Psychiatric aides 327 psychiatric Paraprofessionals 
inpatients (?, ?) 
Jensen (1961) B Nurses and attendants 75 psychiatric Neither group 
inpatients’ (2, 3) | 
Karlsruher (1976) B College students 60 school children Neither group. 
(20, 6) 
Miles, McLean, & B Medical students 120 psychiatric Neither group 
Maurice (1976) inpatients (60, 27) 
O'Brien, Hamm, Ray, B Medical students 86 psychiatric Neither group 

Pierce, Luborsky, outpatients (4, 12) | 

& Mintz (1972) 

Truax (1967) B Adult women Over 300 vocational Paraprofessionals 
rehabilitation clients 
(4, 4) 

Truax & Lister (1970) B Adult women 168 vocational Paraprofessionals 
rehabilitation clients 
(4, 4) 

Weinman, Kleiner, Yu, B Community volunteers 179 psychiatric Neither group 

& Tillson (1974) outpatients (?, ?) 

Anker & Walsh (1961) (6 Occupational therapist 56 psychiatric Paraprofessional 
inpatients (1, 1) 

Appleby (1963) G Psychiatric aids 53 psychiatric Neither group. 
inpatients? (2, 2) 

Colarelli & Siegel С Psychiatric aides 477 psychiatric Neither group 

(1966) inpatients (8, ?) 

Cole, Oetting, & С Adult women 22 adolescent de- Neither group. 

Miskimins (1969) linquents (2, 2) | 
Engelkes & Roberts C Adult counselors 1,502 vocational Neither group. 

(1970) rehabilitation clients 

(142, 67) 
Mosher, Menn, & (2: Adult counselors 44 psychiatric Paraprofessionals 

Matthews (1975) inpatients (6, ?) 

Poser (1966) e College students 295 psychiatric Paraprofessionals 
inpatients (11, 15) 
Sheldon (1964) is; General physicians 83 psychiatric Professionals better 
and nurses 5 outpatients (2, 2) than physicians 
but equal to 
nurses 
Mendel & Rapport D Psychiatric aides 166 psychiatric Neither group. 

(1963) outpatients (?, ?) | 
Соупег (1969) Е Community volunteers Alcoholics* (?, ?) Neither group { 
Magoon & Golann E Adult women Psychiatric outpatients? Neither group 

(1966) (8,2) 

Group 2: Academic counseling or advising for college students | 
S Y Brown A College students 320 college students Paraprofessionals | 
(8, 4) 
Brown & Myers С College students 303 college students Neither group 

(1975) (?, 2) | 
Zultowski & Catron Є College students 188 college students Neither group 

(1976) (10, 2) 

Murray (1972) С College students 166 college students Neither group Џ 
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Table 1 (continued) 
BC MN o L^" ——————————— 
Experi- 
mental Paraprofessional Client and helper Results significantly 
Study quality helpers sample size* favoring 
Group 3: Crisis intervention for adults 
Knickerbocker & B Community volunteers 92 adults and adoles- Paraprofessionals 
McGee (1973) cents in crisis (65, 27) 
DeVol (1976) E Adult counselors 45 adults in crisis (4,5) Neither group 
Getz, Fujita, & Allen E Community volunteers 104 adults in crisis Neither group 
(1975) (2,2) 
Group 4: Interventions directed at specific target problems 
Kazdin (1975) A College students 54 unassertive adults Neither group 
and college students 
(2, 58 
Lick & Heffler (1977) A College student 40 adult insomniacs^ Neither group 
(1, 1) 
Moleski & Tosi (1976) A Speech pathologist 20 adult stutterers® Neither group 
(1, 1) 
Elliott & Denney B College students 45 overweight college Neither group 
(1975) students (3, 1) 
Levenberg & Wagner B Public health officer 54 adult smokers (1, 1) Neither group 
(1976) 
Levitz & Stunkard B Community volunteers 234 overweight adults" ^ Professionals 
(1974) (8, 4) 
Lindstrom, Balch, & B College students 68 overweight college Neither group 
Reese (1976) students^ (4, 1) 
Penick, Filion, Fox, & B Adult volunteers 32 overweight adults Neither group 
Stunkard (1971) Q, 1) м А 
Russell & Wise (1976) B College students 42 speech-anxious Neither group 
college students” 
8,3 3 
Ryan, Krall, & Hodges B College students 72 test-anxious college Neither group 
(1976) students (1, 2) А 
Werry & Cohrssen G Parents 70 enuretic children Paraprofessionals 
1965 (22, 4) . 
pé vn & Mandell D Parents 87 enuretic children Paraprofessionals 
(1966) (56, 4) А | 
Fremouw & Harmatz D College students 30 speech-anxious Neither group 
(1975) college students" 


(11, 1) 
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Group 5: Other interventions 


1,192 college students 
(?, 2) 

Community volunteers 37 handicapped 
children (4, 3) 

88 college students" 
(4, 4) 


College students Paraprofessionals 


Lamb & Clack (1974) B 
Schortinghuis & С Paraprofessionals 
Frohman (1974) 


Wolff (1969) Neither group 


College students 


criteria were mainly satisfied ; B, that one or two criteria were deficient; Çi 


Note. A indicates that the desi h wo cr i 
at D, that five were deficient ; and E, that deficiencies were present in more 


that three or four were deficient; 


than five criteria. 5 \ : : 
^ Figures in parentheses are the number of paraprofessional and professional helpers, respectively; a 


indicates that the exact number of helpers was not specified. 


b Includes no-treatment ог attention-placebo control groups. Я 
e Five therapists participated but a breakdown according to helper groups was not provided. 


Client sample size was not indicated. 
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such as weight loss or frequency of target be- 
haviors (De Leon & Mandell, 1966; Levitz & 
Stunkard, 1974), and in hospital studies, such 
measures as length of hospital stay, ward 
transfers, discharge or readmission rates, and 
posthospital social and community adjust- 
ment (Jensen, 1961; Poser, 1966; Weinman, 
Kleiner, Yu, & Tillson, 1974). 

Summaries of four studies involving psy- 
chiatric inpatients, children, adult insomniacs, 
and college students as clients are described 
below to provide the reader with some con- 
crete examples of comparative research. 

In a well-designed study, Zunker and Brown 
(1966) compared the effectiveness of four 
professional and eight student counselors who 
provided academic adjustment counseling to 
matched groups of 80 college freshmen. All 
counselors received 50 hours of identical 
training, used identical counseling materials, 
and followed an identical counseling sequence. 
Outcome criteria were assessments of study 
skills and study problems, academic grades, 
retention of information offered in counseling, 
and a counselor evaluation questionnaire. Stu- 
dent counselors were as effective as profes- 
sionals on the first measure above but achieved 
significantly better results than did the pro- 
fessionals on each of the last four measures. 

Lick and Heffler (1977) randomly assigned 
40 adult insomniacs to four conditions: (a) 
therapist-administered progressive relaxation, 
(b) therapist-administered relaxation plus a 
taped relaxation program for client use at 
home, (c) placebo control, and (d) no-treat- 
ment control. A college undergraduate and a 
clinical psychologist saw clients in each treat- 
ment group. There were no outcome differ- 
ences between the professional and parapro- 
fessional therapists, as measured by client 
self-reported change and Minnesota Multi- 
phasic Personality Inventory scores. The re- 
laxation treatments did not differ significantly 
from each other, but both were significantly 
more effective than placebo and no-treatment 
controls. 

In a child therapy study, 60 maladapting 
fifth and sixth graders were randomly assigned 
to a no-treatment control condition, treatment 
conducted by professional therapists (experi- 
enced school guidance counselors and psy- 


chology graduate students), or treatment by 
two groups of untrained college undergrad- 
uates who did or did not receive clinical super- 
vision (Karlsruher, 1976). Treated children: 
received 10 sessions of individual client-cen- 
tered therapy. The California Test of Person- 
ality completed by the child, therapist im- 
provement ratings, and a teacher-completed 
behavior checklist were used as outcome cri- 
teria. Children seen by supervised parapro- 
fessionals improved the most, closely followed 
by children treated by professionals. Control 
children demonstrated more improvement than 
children seen by unsupervised paraprofes- 
sionals. Unexpectedly, ratings of therapist- 
offered empathy, warmth, and genuineness | 
were negatively related to positive client 
change in the three treatment groups. 

The progress of two matched samples of 60 
psychiatric inpatients seen by medical student 
therapists, or psychiatric residents and fully 
trained psychiatrists was evaluated by Miles 
et al. (1976). Patients reported on changes in 
symptoms at the time of hospital discharge 
and again at 3-month follow-up. Independent 
ratings of improvement were also obtained 
from patients’ significant others and family 
physicians at follow-up. From 40% to 88% 
of the patients demonstrated improvement on 
the outcome criteria at the different time 
points of evaluation, and there were no sig- 
nificant outcome differences between patients 
treated by the medical students, or psychia- 
trists and residents. 

Overall, outcome results in comparative 
studies have favored paraprofessionals. In one 
study, professionals were more effective than 
one group of paraprofessional helpers but 
were equal in effectiveness to a second para- 
professional comparison group (Sheldon, 
1964). In only one study were professionals 
significantly more effective than all parapro- 
fessionals with whom they were compared | 
(Levitz & Stunkard, 1974). In terms of mea- 
surable outcome, there were no significant 
differences among helpers in 28 investigations, 
but paraprofessionals were significantly more 
effective than professionals in 12 studies. The 
central finding from these comparative studies 
is that the clinical outcomes that paraprofes- 
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sionals achieve are equal to or significantly 
better than those obtained by professionals. 

'The provocative conclusion from these com- 
parative investigations is that professionals do 
not possess demonstrably superior therapeutic 
skills, compared with paraprofessionals. More- 
over, professional mental health education, 
training, and experience are not necessary pre- 
requisites for an effective helping person. 

In four studies (Anker & Walsh, 1961; De 
Leon & Mandell, 1966; Penick et al., 1971; 
Werry & Cohrssen, 1965) paraprofessionals 
have provided different treatment services 
from: those provided by professionals, so it is 
not possible to assess therapist and treatment 
effects separately. Results could be due as 
much to the alternative treatment as to the 
use of paraprofessionals. Although these ther- 
apist-treatment confounds are a serious meth- 
odological drawback, the results of these four 
studies are heuristic. Paraprofessionals were 
found to be significantly more effective than 
professionals in three of these four studies 
(Anker & Walsh, 1961; De Leon & Mandell, 
1966; Werry & Cohrssen, 1965) and such data 
challenge professionals to look more closely 
at the nature and efficacy of traditional men- 
tal health practices. 

On closer scrutiny, results offer stronger 
support for paraprofessionals in some program 
areas than in others. The strongest support for 
paraprofessional effectiveness has come from 
the Group 4 studies, those directed at the 
modification of specific target problems. This 
group of studies contains the only one report- 
ing results significantly favoring professionals 
(Levitz & Stunkard, 1974), but the other 12 
studies showed no outcome differences among 
helper groups, and 10 of the 13 Group 4 
studies were of A or B experimental quality 
(see Table 1 above). 

The studies involving individual or group 
psychotherapy or counseling (Group 1) con- 
tained the largest number of studies favoring 
paraprofessionals over professionals (7), but 
only 3 of these 7 and 8 of the total group of 
19 studies were relatively well controlled (of 
A or B quality). The Jack of more rigorous 
research evaluation in Group 1 studies is un- 
fortunate. These studies involved moderately 


to severely disturbed clients, and one can 
argue that such therapeutic programs are 
clinically more demanding than interventions 
attempting to modify discrete problem be- 
haviors. 

Academic counseling for college students, 
crisis intervention for adults, and the other in- 
terventions (Groups 2, 3, and 5, respectively) 
have not yet been carefully evaluated to offer 
any strong conclusions regarding paraprofes- 
sionals’ clinical skills compared with those of 
professionals in these areas. 

Current evidence offers reasonable, initial 
support for paraprofessionals’ clinical effec- 
tiveness in comparative studies based on the 
following three considerations: 

1. On the whole, the experimental quality of 
the studies in Table 1 approached or exceeded 
that observed in reviews of outcome research 
in other clinical areas (Bednar & Lawlis, 
1971; Cash, 1973; Gurman, 1973; Kelley, 
Smits, Leventhal, & Rhodes, 1970; Luborsky, 
Chandler, Auerbach, Cohen, & Bachrach, 
1971). Although some comparative reports 
were of little evaluative worth, others were 
relatively better controlled investigations on 
which at least tentative conclusions can be 
based. 

2. Results are consistent regardless of the 
research sophistication of the study. Con- 
vergent evidence obtained by independent in- 
vestigators using different design strategies 
and methods of evaluation lends strength to 
obtained findings. 

3. Finally, several studies contained biases 
against paraprofessional treatment (Colarelli 
& Siegel, 1966; Fremouw & Harmatz, 1975; 
Jensen, 1961; Mendel & Rapport, 1963; 
Poser, 1966; Truax, 1967; Truax & Lister, 
1970). Biases notwithstanding, positive find- 
ings of paraprofessionals’ effectiveness sug- 
gest that a genuine clinical phenomenon is 
being observed. 

Nevertheless, several reservations must be 
offered in evaluating the current status of 
comparative research. These issues are best 
discussed in reference to current experimental 
inadequacies and limitations and point the 
way toward future research. 
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Experimental Problems and Limitations 
Design Criteria 


The greatest deficiencies in design criteria 
were in the failure to assess the effects of ad- 
ditional, concurrent treatment (22 of 42 
studies), to obtain multi-outcome (7 studies) 
or follow-up measures of client change (16 
studies), or to rely on subjective estimates of 
treatment outcome (9 studies). 

The criterion regarding the possibility of 
differential treatment expectations in the two 
helper-and-client groups was the most dif- 
ficult to score. Although several authors noted 
biases against paraprofessional treatment (see 
above), a reverse bias was seldom noted. The 
initiation of a new treatment program, even 
if conducted by  paraprofessionals, might 
create more positive expectations among staff 
and patients, in comparison with routine pro- 
fessional treatment. However, in most studies, 
it was difficult to determine whether this phe- 
nomenon was present. 

The 13 design criteria adapted from Lubor- 
sky et al. (1975) are offered as a model to im- 
prove the methodology of future comparative 
research, with particular emphasis given to 
the most common deficiencies just mentioned. 
In addition to these design features, investi- 
gators should be aware of at least four other 
issues. 


Patient Factors 


The strongest evidence for paraprofession- 
als' effectiveness derives from studies of col- 
lege students and adults with specific target 
problems (Group 4 studies) and moderately 
to severely disturbed non-middle-class adults 
(Group 1 studies). Work with adolescents is 
represented by a single study (Cole, Oetting, 
& Miskimins, 1969). Interventions with 
younger children concentrating on specific tar- 
get behaviors have not been well controlled 
(De Leon & Mandell, 1966; Schortinghuis & 
Frohman, 1974; Werry & Cohrssen, 1965) 
and there has only been one study focusing 
on children's general adjustment difficulties 
(Karlsruher, 1976). No well-controlled com- 
parative studies offering individual or group 
therapy have been conducted with clients, in- 


cluding college students, from the middle or 
upper socioeconomic classes. Since treatment 
effects might vary as a function of population 
characteristics, more comparative studies are 
needed dealing with the diverse problems of 
children, adolescents, college students, and 
adults not requiring psychiatric hospitaliza 
tion or residential care. 


Therapist Factors 


Future work should examine the extent to 
which three therapist factors have contributed 
to current research findings—the use of un 
matched therapist groups, the failure to study 
individual clinical performance, and the us 
of small therapist samples. 

1. Paraprofessionals and professionals аге 
frequently unmatched helper groups. In addi- 
tion to formal clinical training and experience, 
paraprofessionals often differ from profes- 
sionals in being younger, more often female, 
and from a lower socioeconomic class. Тһе 
effects of these therapist variables in psycho- 
therapy research are not clear (Meltzoff & 
Kornreich, 1970) but merit future study, 
especially in relation to therapy process di- 
mensions and patient characteristics that may 
interact with outcome. 

2. Individual clinical performance is seldom 
studied. It is possible that an emphasis on 
group performance obscures significant indi- 
vidual variability in helping effectiveness. BY 
identifying certain workers who are particu- 
larly effective or ineffective within each helpe 
group, the study of individual therapists’ 
functioning might clarify the finding that on 
the average, paraprofessionals are able to 40 
as well as professionals. p 

3. Finally, therapist samples in comparative 
studies are usually very small, yielding 10 
indication of the representativeness of findings: 
Larger samples of paraprofessionals should 
be studied to provide information оп the rela- 
tive numbers and characteristics of personnel 
who can perform various clinical roles and 
functions adequately. 


Selection Training and Supervision 


Few studies have operationally define? 
their selection, training, and supervisory PY? 
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cedures, and fewer still have studied these 
program dimensions. Two studies compar- 
ing the presence versus absence of training 
and supervision obtained conflicting results. 
Karlsruher (1976) found that unsupervised 
and untrained college students were ineffec- 
tive therapists for maladapting elementary 
school children, whereas untrained but super- 
vised students and experienced professionals 
achieved equally successful results. In con- 
trast, Lindstrom et al. (1976) reported that 
untrained and unsupervised college under- 
graduates were as effective as trained and 
supervised undergraduates and a professional 
therapist in helping college students lose 
weight. 

Paraprofessionals have been selected on the 
basis of the results of psychological testing or 
a personal interview (Brown & Myers, 1975; 
Covner, 1969; Karlsruher, 1976; Russell & 
Wise, 1976), have been chosen because they 
were the working staff on clinical units or 
wards (Anker & Walsh, 1961; DeVol, 1976; 
Jensen, 1961; Miles et al., 1976; Moleski & 
Tosi, 1976), and have essentially bten self- 
selected in the sense that available volunteers 
were accepted following apparently perfunc- 
tory screening procedures (Fremouw & Har- 
matz, 1975; Levitz & Stunkard, 1974; Murry, 
1972; Poser, 1966). Most investigators, how- 
ever, have not reported their selection process 
and criteria. 

Some paraprofessionals have received no 
clinical training except a brief program orien- 
tation (Karlsruher, 1976; Lamb & Clack, 
1974; Levenberg & Wagner, 1976; Poser, 
1966; Weinman et al., 1974), some have re- 
ceived brief (up to 15 hours) training (Brown 
& Myers, 1975; Covner, 1969; Fremouw & 
Harmatz, 1975; Lindstrom et al., 1976; Levitz 
& Stunkard, 1974; Murry, 1972; Russell & 
Wise, 1976; Zultowski & Catron, 1976), and 
a few have participated in intensive programs 
approximating the training offered to profes- 
sionals (Colarelli & Siegel, 1966; Magoon & 
Golann, 1966; Zunker & Brown, 1966). In 
most cases, only global descriptions of train- 
ing programs have been presented. Both in- 
dividual and group supervision have been 
used, but no one has detailed the supervisory 
process beyond clinical clichés. 


Important parameters of selection, training, 
and supervision undoubtedly relate to pro- 
gram settings, treatment activities, goals, and 
client populations served. Judicious selection, 
training, and supervision might well account 
for paraprofessional effectiveness in compara- 
tive studies. Unfortunately, it is impossible to 
abstract the necessary details from compara- 
tive studies in order to analyze how parapro- 
fessionals can be effectively selected, trained, 
and supervised. More systematic research 
should be devoted to these important program 
features. 


The Process of Paraprofessional Intervention 


Probably the most serious weakness in com- 
parative research lies in the failure to examine 
the factors that account for paraprofessionals' 
effectiveness. Investigators have failed to re- 
late specific intervention techniques to spe- 
cific client changes. The nature of paraprofes- 
sional treatment is frequently left unclarified 
or is defined in global, undifferentiated terms. 
The nature of the paraprofessional’s therapeu- 
tic influence is therefore undetermined. 

Theories abound but evidence is meager 
about why nonprofessionals are effective 
helpers. For example, it is not known if para- 
professionals capitalize on powerful, nonspe- 
cific (placebo) therapeutic influences, use 
natural helping skills that perhaps reside in 
their interpersonal style, or adopt interven- 
tion techniques previously identified as effec- 
tive in studies of professional therapists. 

Paraprofessionals’ higher interest and en- 
thusiasm may make them as or more effective 
than professionals, but it seems unreasonable 
that these factors would work uniformly in 
all studies. Moreover, the perceived prestige 
and expertise often attributed to professionals 
by clients would probably minimize differ- 
ential therapeutic effects accruing to parapro- 
fessionals as a result of their enthusiasm and 
interest. Two studies found that paraprofes- 
sionals offered significantly higher levels of 
empathy, warmth, or genuineness to their 
clients than professionals did (Knickerbocker 
& McGee, 1973; Truax, 1967). These dimen- 
sions could account for paraprofessionals’ ef- 
fectiveness in these investigations and in 
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other programs as well. However, this issue is 
complex. Karlsruher (1976) reported that 
therapist-offered empathy, warmth, and gen- 
uineness were negatively related to positive 
changes occurring in child clients. Further- 
more, there is controversy regarding the con- 
struct validity of ratings of empathy and 
other client-centered variables (Avery, D'Au- 
gelli, & Danish, 1976; Chinsky & Rappaport, 
1970). 

Paraprofessional effectiveness in some stud- 
ies may be due to the development of care- 
fully standardized and systematic treatment 
programs (Elliott & Denney, 1975; Kazdin, 
1975; Levenberg & Wagner, 1976; Lick & 
Heffler, 1977; Lindstrom et al., 1976; Russell 
& Wise, 1976). In these programs, treatment 
has consisted of a programmed series of ac- 
tivities. Presumably, the more intervention 
procedures that can be clearly described and 
sequentially ordered in a helping program, 
the easier it will be for less trained personnel 
to administer them successfully. Paraprofes- 
sionals may feel more comfortable and hold 
higher expectations than professionals when 
using standardized clinical procedures, and 
these factors could contribute to paraprofes- 
sionals’ clinical effectiveness. Therefore, para- 
professional clinical success may be closely 
related to professionals’ abilities to define, 
order, and structure effective sequences of 
helping activities when training or supervising 
paraprofessionals. Although systematic treat- 
ment programs have been well controlled and 
have provided the strongest evidence of para- 
professionals’ helping skills (Group 4 studies), 
paraprofessionals have exercised a wide lati- 
tude of clinical responsibility in most com- 
parative studies and have not followed stan- 
dardized and predetermined therapeutic pro- 
cedures. 

In summary, it is frustrating to admit that 
we do not know exactly why paraprofessionals 
with relatively little clinical experience and 
training can achieve results equal to or better 
than those obtained by professionals. Future 
research should attempt to define, isolate, and 
evaluate the primary treatment ingredients of 
paraprofessional helping programs. Research 
is needed in which the behavioral dimensions 
of nonprofessional intervention are described 


and assessed in relation to overall treatment 
effectiveness and specific client change. 

The experimental limitations of compara 
tive research must be discussed in relation to 
two important issues regarding the use of 
paraprofessionals within the mental heal 
field. The first of these concerns the establish- 
ment of paraprofessional associate of arts 
programs and other new career training. The 
second deals with claims for the value of the) 
indigenous therapist in reaching and helping 
various client populations. 


Paraprofessional Mental Health Manpower 


The intent of many undergraduate degree 
programs is to produce mental health gen} 
eralists, that is, workers with a variety of 
basic helping skills who can eventually carry 
out many of the functions traditionally per: 
formed by professionals. Currently, over 
4,000 graduates are being produced annually 
by approximately 170 paraprofessional col 
lege-level training programs (Young, True, 
& Packard, 1976). Present comparative evi- 
dence, however, does not support the ability 
of paraprofessionals to function effectively in 
full-time service positions. | 

Comparative research has studied workers” 
performance in very limited clinical roles wi n 
specific client groups. Paraprofessionals have 
primarily offered one form of treatment 1 
homogeneous client populations in short-term 
experimental programs. In addition, research: 
has neglected to assess paraprofessionals' ant 
professionals! relative abilities to perform 
other primary clinical skills normally 16 
quired in permanent service positions, such 
as intake interviewing, diagnostic assessment 
and evaluations, consultative services, and 
data collection for research purposes. 

Therefore, comparative data supporting th 
value and wisdom of assimilating paraprofes: 
sionally trained workers as full-time staff | 
the human services are not currently avail- 
able and can be obtained only in more comi 
prehensive clinical studies. Systematic infor- 
mation is needed about the long-term effec 
tiveness of paraprofessionals, employing i 
variety of intervention techniques with œi 
verse client populations. We also need infor 
mation on the relative abilities of paraprofes: 
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sionals and professionals to perform other 
primary clinical skills, ranging from intake 
interviewing to consultation. Pilot investiga- 
tions examining workers’ relative skills in 
these areas have only recently begun 
(Gingerich, Feldman, & Wodarski, 1976; 
Sloop & Quarrick, 1974). 


Indigenous Therapists 


Comparative research has offered only par- 
tial support for the value of indigenous thera- 
pists, that is, workers similar to clients in 
background, life-style, and general personal 
and demographic characteristics. The only 
controlled evidence for the effectiveness of 
indigenous therapists has involved college 
students (Brown & Myers, 1975; Elliott & 
Denney, 1975; Fremouw & Harmatz, 1975; 
Karlsruher, 1976; Lamb & Clack, 1974; 
Lindstrom et al., 1976; Murry, 1972; Rus- 
sell & Wise, 1976; Ryan et al, 1976; Wolff, 
1969; Zultowski & Catron, 1976; Zunker & 
Brown, 1966). 

However, the indigenous therapist has often 
been cited as the treatment agent of choice 
for low-income and minority group clients 
who have not received a fair distribution of 
services from professionals in the past. It is 
believed that indigenous therapists can estab- 
lish rapport and identification with previ- 
ously underserved populations, which makes 
them more effective than professionals work- 
ing with the same groups. Unfortunately, 
there are no experimental data to support 
this assertion. In fact, data from two ana- 
logue and attitude studies indicate that Mexi- 
can-American clients perceive white profes- 
sionals to be as trustworthy, understanding, 
and helpful as indigenous therapists (Acosta 
& Sheehan, 1976; Andrade & Burstein, 1973). 
The comparative effectiveness of indigenous 
and professional helpers working with non- 
college populations awaits empirical docu- 
mentation. 


Summary 


Findings from 42 studies comparing the 
helping effectiveness of paraprofessionals and 
professionals are consistent and provocative. 
The clinical outcomes paraprofessionals 
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achieve are equal to or significantly better 
than those obtained by professionals. These 
data suggest that professionals do not neces- 
sarily possess demonstrably superior clinical 
skills, in terms of measurable outcome, when 
compared with paraprofessionals. Moreover, 
professional mental health education, training, 
and experience are not necessary prerequi- 
sites for an effective helping person. The 
strongest support for paraprofessionals has 
come from programs directed at the modifica- 
tion of adults’ and college students’ specific 
target problems and, to a lesser extent, from 
group and individual therapy programs for 
non-middle-class adults, 

Although the above findings and conclu- 
sions must be offered tentatively due to de- 
ficiencies and limitations in the methodology 
of many studies, the consistency of positive 
findings supports the potential value of para- 
professional helpers. A set of 13 design cri- 
teria adapted from Luborsky et al. (1975) is 
offered as a guide to improve the experi- 
mental rigor of future research, Future in- 
vestigators should pay particular attention to 
a number of unresolved issues: 

1. What are the primary treatment in- 
gredients in paraprofessional helping pro- 
grams? What specific behavioral dimensions 
of paraprofessional intervention are associ- 
ated with treatment effectiveness? 

2. How variable is individual clinical effec- 
tiveness within paraprofessional helper 
groups? Is a minority or a majority of help- 
ers responsible for the positive findings that 
have been obtained? Do individual differences 
in clinical success relate to client, treatment, 
or therapist characteristics? 

3, What is the breadth of paraprofession- 
als’ clinical competence? Are paraprofession- 
als effective on a long-term basis when work- 
ing with diverse client populations and 
providing more than one form of treatment 
or service? Can paraprofessionals adequately 
perform those therapy-related skills that 
would be required of them in full-time ser- 
vice positions such as clinical interviewing 
and diagnostic assessments? 

4. Are indigenous therapists the treatment 
agent of choice for any client. populations? 
ТЕ so, what factors account for this finding? 
For example, in comparison with professionals, 
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is the supposition correct that indigenous 
helpers are more accepted by and more em- 
pathic toward clients similar to themselves? 
Furthermore, are these variables associated 
with therapeutic outcome? 

5. Finaly, what are the most effective 
means of selecting, training, and supervising 
paraprofessionals? Generally, attempts in 
these directions have been nonspecific in 
comparative research, due to the limited 
number of therapists studied. However, if 
future work continues to confirm paraprofes- 
sionals' treatment effectiveness in compara- 
tive situations, then the most judicious use 
of paraprofessionals will, in large part, de- 
pend on determining the most successful 
techniques for recruiting, training, and su- 
pervising these new workers. 

This review has used a broad definition of 
helping that encompasses a wide range of 
settings, helpers, clients, treatments, and 
criterion variables. This generic assessment of 
paraprofessional functioning seems appropri- 
ate given the current lack of experimentally 
rigorous process and outcome research. A 
more limited review of specific treatment 
situations would not have afforded any greater 
scrutiny or insight into paraprofessionals’ 
abilities, However, as future investigators 
achieve stricter control of therapist, treat- 
ment, and client characteristics, paraprofes- 
sionals’ functioning under different circum- 
stances can be assessed more accurately and 
reliably in order to challenge or qualify the 
general conclusions offered here. 

Current findings are not offered as a 
polemic against professional treatment but as 
a stimulus to investigate the processes that 
facilitate behavior change. Data indicate that 
paraprofessionals can make an important con- 
tribution as helping agents, but the factors 
accounting for this phenomenon are not 
understood. We are presently recruiting and 
employing thousands of paraprofessionals in 
the human services without an adequate 
understanding of why such personnel are ef- 
fective helpers. It would be a mistake to con- 
tinue using paraprofessionals without more 
closely examining their skills, deficiencies, and 
limitations. 
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and the current status of the research 
five issues of the learned helplessness 
(c) generalization, (d) individual differ- 
theory is seen as inadequate to account 
etiology and generalization. A revised 


individual's attributions of noncontingent failure experiences predict the degree 


and parameters of learned helplessness. 


Seligman and Maier (1967) and Overmier 
and Seligman (1967) used the term learned 
helplessness to describe an interference with 
escape-avoidance behaviors produced in dogs 
by prior inescapable shock. Since these early 
studies, research bearing on learned helpless- 
ness has proliferated. Early work investigat- 
ing the parameters of the phenomenon used 
dogs as subjects (Overmier, 1968; Seligman 
& Groves, 1970; Seligman, Maier, & Geer, 
1968), and more recent studies have reported 
the occurrence of learned helplessness in cats 
(Thomas & Balter, in press), fish (Padilla, 
Padilla, Ketterer, & Giacolone, 1970), and 
rats (Braud, Wepman, & Russo, 1969; 
Maier, Seligman, & Soloman, 1969; Maier & 
Testa, 1975). 

The dominant researcher and theorist in 
this area has been Martin E. P. Seligman, 
who has written extensively on the nature, 
etiology, and significance of the learned help- 
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lessness phenomenon (Seligman, 1973, 1974, 
1975). Seligman's model * has broadened the 
scope of learned helplessness from animal 
behavior to a wide variety of human behav- 
iors, including reactive depression, stomach 
ulcers, voodoo deaths, and child development 
(Seligman, 1975). Since the publication of 
the early animal work and the major exposi- 
tion of Seligman’s theory (Seligman, 1975), a 
substantial body of research has investigated 
the parameters of the learned helplessness 
phenomenon with human subjects. Recent 
reviews (Maier & Seligman, 1976) and cri- 
tiques (Levis, 1976) of learned helplessness 
research and theory have focused primarily 
on animal research. In view of the consider- 
able interest in this area of research and the 


1 Seligman has recently presented a revised model 
of learned helplessness (Seligman, Note 1; Abram- 
son, Seligman, & Teasdale, 1978). Since this revised 
model was presented during the preparation of the 
final draft of the present article, all references in 
the text to “Seligman’s model” refer to his pre-1977 
position. It should also be noted that the basic 
statements of Seligman’s revised model are remark- 
ably similar to those of the model presented in the 
second half of the present article. The present 
authors and Seligman (Note 2) regard this theo- 
retical overlap to be the result of parallel, inde- 
pendent contributions; as such, these represent a 
situation that is unusual in psychology. The present 
article makes no attempt to review, critique, or 
integrate Seligman’s revised theory with the model 
proposed here. 


ation, Inc. 0033-2909/79/8601-0093$00.75 
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possible implications for human behavior, it 
seems appropriate at this time to review the 
learned helplessness research and theory 
with an explicit focus on studies using human 
subjects. The purpose of this article, then, is 
threefold: (a) to review the research litera- 
ture concerning the phenomenon of learned 
helplessness in humans, (b) to assess the 
validity of Seligman's theory in view of re- 
cent research, and (c) to propose a new model 
of learned helplessness in humans. 

In order to facilitate comparison of theo- 
retical predictions and research findings, this 
article focuses on five salient issues concern- 
ing learned helplessness. (a) What is the 
nature of learned helplessness? (b) What 
are the necessary and sufficient conditions for 
development of learned helplessness? (c) 
Is learned helplessness a generalized human 
phenomenon, or is it confined to specific 
laboratory conditions? (d) What are the 
individual differences factors that affect 
learned helplessness? (e) What are the nec- 
essary and sufficient conditions for alleviation 
of learned helplessness? This article reviews 
the relevant research and theory for each of 
the above issues. 

Before moving to an evaluation of the 
research findings, however, it is necessary to 
discuss the basic experimental paradigm used 
in learned helplessness research. In the typi- 
cal study, subjects receive a training phase 
followed by a test phase. In the training 
phase, subjects are exposed to a training task, 
in which they receive (a) contingent (re- 
sponse-dependent) reinforcement, (b) non- 
contingent (response-independent) reinforce- 
ment, or (c) no treatment (control). After 
this training phase, the performance of the 
three groups is compared on a fest task, in 
which reinforcement is contingent for all 
groups. Learned helplessness occurs when the 
subjects receiving noncontingent reinforce- 
ment in the training phase show deficits in 
the test phase relative to the contingent and 
control groups. Thus, learned helplessness re- 
fers to behavioral deficits produced by ex- 
posure to noncontingent outcomes, 


The Nature of Learned Helplessness 


Seligman (1975) suggested that learned 
helplessness consists of three interrelated 


areas of disturbance: (a) motivational, (b) 
cognitive, and (c) emotional. More specifi- 
cally, Seligman hypothesized that learned 
helplessness *(1) reduces the motivation to | 
control the outcome; (2) interferes with 
learning that responding controls the out- 
come; (3) produces fear for as long as the 
subject is uncertain of the uncontrollability 
of the outcome, and then produces depres- 
sion" (Seligman, 1975, p. 56). In an attempt 
to explore the nature of learned helplessness 
and the validity of Seligman's hypotheses, 
this section of the article examines the changes 
in test-task performance reported in the 
learned helplessness research with human sub- 
jects. Thus, this section is concerned with 
the nature of learned helplessness as opera- 
tionalized by performance in the test phase, 
and not with the training-phase conditions 
necessary to produce learned helplessness. 


Cognitive and Motivational Deficits 


We are aware of 23 studies that have at- 
tempted to demonstrate the occurrence of 
changes in performance due to a training 
phase in which the subjects! responses were 
independent of environmental outcomes. Two 
generalizations concerning the measurement 
of the changes in performance are immedi- 
ately evident from a review of these studies. 
First, in an attempt to operationalize Selig- 
man's proposed cognitive and motivational 
deficits, most studies have used a test task 
requiring development and use of cognitive 
problem-solving strategies. A smaller number 
of studies have investigated the emotional 
aspects of learned helplessness. Second, al- 
though a majority of these studies have re- 
ported performance deficits in the test phase; 
several studies (Roth & Bootzin, 1974; Roth 
& Kubal, 1975; Thornton & Jacobs, 1972) 
have reported increases in performance fol- 
lowing a learned helplessness training phase. 
Since the relevant issues concerning this 
facilitation effect seem to lie in the training 
phase of learned helplessness, these studies 
are covered in the next section during the 
discussion of the necessary and sufficient con- 
ditions for the development of learned help- 
lessness. The present section examines those 
studies that have reported significant pet- 
formance deficits. 


HELPLESSNESS IN HUMANS 


Five studies have attempted to replicate 
the type of task used in the animal research 
by utilizing an escape-avoidance task in the 
test phase and have found deficits with hu- 
man subjects (Hiroto, 1974; Hiroto & Selig- 
man, 1975; Klein & Seligman, 1976; Krantz, 
Glass, & Snyder, 1974; Thornton & Jacobs, 
1971). Seven other studies have used an ana- 
gram-solution task (Benson & Kennelly, 
1976; Gatchel, Paulus, & Maples, 1975; 
Gatchel & Proctor, 1976; Hiroto & Seligman, 
1975; Klein, Fencil-Morse, & Seligman, 
1976; Miller & Seligman, 1975; Miller & 
Gold, Note 3). In this task, subjects are 
given a series of anagrams with the same 
solution order. In both the escape-avoidance 
and anagram tasks, three dependent measures 
have generally been employed: (a) number 
of trials to escape (anagram solution) cri- 
terion, (b) number of failures to escape (solve 
anagram), and (c) mean escape (anagram 
solution) latency. The measure of number of 
trials to criterion was hypothesized to opera- 
tionalize the cognitive deficit, and the latter 
two measures Were hypothesized to opera- 
tionalize the motivational deficit. However, 
as was noted by Miller and Seligman (1975), 
because solution criteria were defined in 
terms of response speed, separation of moti- 
vational and cognitive components was not 
possible. 

Tt is also important to note that the abso- 
lute magnitude of the deficits obtained in 
learned helplessness studies using escape- 
avoidance or anagram test. tasks was relatively 
small. In. the Hiroto and Seligman (1975) 
escape-avoidance task, for example, the mean 
response latency for the noncontingent group 
was about 8 sec, whereas the mean latency 
for the control group Was about 7.3 sec. Al- 
though statistically significant, the small ab- 
solute difference raises questions about the 
importance of these results. 

Two studies (Klein & Seligman, 1976; 
Miller & Seligman, 1975) designed to isolate 
Seligman's hypothesized cognitive deficit 
have employed Rotter, Liverant, and Crowne's 
(1961) measure of expectancy change fol- 
lowing success or failure, In this task, sub- 
jects are given a «chance" or "skill? task, in 
which reinforcement is covertly manipulated 
by the experimenter. The dependent measure 
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is the amount of expectancy change after 
trials of success or failure, Several studies 
(Phares, 1957; Rotter et al., 1961) have re- 
ported that reinforcement on previous trials 
had a greater effect on expectancies for fu- 
ture success when the subjects perceived 
reinforcement as response contingent. Un- 
fortunately, the usefulness of this measure 
for learned helplessness depends on the pre- 
vious research suggesting that expectancy 
change in this task is a function of the ex- 
pectancy of response-outcome contingency. 
Other research investigating expectancy 
changes in this task (McMahan, 1973; 
Weiner, Nierenberg, & Goldstein, 1976) has 
suggested that expectancy changes following 
success and failure are not due to perceived 
response-outcome contingency but are due to 
the perceived stability of the causal attribu- 
tions of performance. Hence, the use of the 
expectancy-change measure devised by Rotter 
et al. to operationalize an expectancy of re- 
sponse-outcome contingency remains highly 
questionable. This issue, as well as the con- 
struct of attributions, is examined in detail 
in a later section of this article. 

Although other studies have used a variety 
of cognitive problem-solving tasks, including 
intelligence tests (Thornton & Jacobs, 1972), 
block design (Dweck & Reppucci, 1973), 
digit-letter substitution (Dweck & Bush, 
1976), discrimination learning (Eisenberger, 
Park, & Frank, 1976), and specially designed 
concept-formation problems (Roth & Boot- 
zin, 1974; Roth & Kubal, 1975), no study 
has adequately separated motivational and 
cognitive components of learned helplessness. 

In summary, the literature pertaining to 
the performance changes produced by expo- 
sure to learned helplessness training condi- 
tions supports the conceptualization of 
learned helplessness as a performance deficit 
in cognitive problem-solving tasks. This defi- 
cit, though statistically significant, is rela- 
tively small in an absolute sense. The avail- 
able evidence does not allow a distinction 
between cognitive and motivational explana- 
tions for this deficit. Thus, the performance 
deficits that are defined as learned helpless- 
ness may have a cognitive or motivational 
basis, or they may result from the impair- 


ment of both processes. 
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Emotional Deficits 


Six studies have investigated the emotional 
aspects of learned helplessness (Gatchel et al., 
1975; Gatchel & Proctor, 1976; Griffith, 
1977; Krantz et al, 1974; Miller & Selig- 
man, 1975; Roth & Kubal, 1975). These 
studies generally support Seligman's hypothe- 
sis that learned helplessness involves feelings 
of anxiety and depression. Miller and Selig- 
man (1975) and Gatchel et al. (1975) admin- 
istered the Multiple Affect Adjective Check 
List (Zuckerman, Lubin, & Robins, 1965) 
before and after exposure to contingent and 
noncontingent reinforcement, Both studies 
reported significant increases in feelings of 
depression, anxiety, and hostility following 
noncontingent reinforcement. Griffith (1977) 
administered the Paired Anxiety and De- 
pression Scale (Mould, 1975) before and 
after exposure to noncontingent conditions 
and reported significant increases in depres- 
sion following noncontingent failure and sig- 
nificant increases in anxiety following non- 
contingent success. Similarly, Roth and Ku- 
bal (1975) and Krantz et al. (1974) admin- 
istered questionnaires following the training 
phase and reported increases in feelings of 
helplessness, incompetence, stress, frustra- 
tion, hostility, depression, anger, and fatigue. 
Two studies have investigated the emotional 
aspects of learned helplessness through physi- 
ological measures (Gatchel & Proctor, 1976; 
Krantz et al., 1974). Both studies reported 
that subjects exposed to learned helplessness 
training conditions showed lower levels of 
electrodermal activity, which is thought to be 
evidence of a lowered motivational state 
(Malmo, 1965), which has also been sug- 
gested to be associated with clinical depres- 
sion (McCarron, 1973). 

Thus, although the self-report and physio- 
logical data support Seligman’s predictions of 
increased depression and anxiety following 
exposure to a learned helplessness induction, 
the self-report studies also suggest that ex- 
posure to these conditions results in increased 
hostility, a phenomenon neither predicted 
nor explained by Seligman’s theory. 

In summary, the available research gen- 
erally supports Seligman’s hypotheses con- 
cerning the nature of learned helplessness. 


Performance deficits appear attributable to 
cognitive and motivational deficits, although 
assessment of the relative contributions of 
these two processes has not been demon- 
strated. Similarly, Seligman’s hypothesis con- 
cerning the emotional aspects of learned help- 
lessness appears to be supported, although the 
observed increases in hostility were not pre- 
dicted. 

Two basic issues remain concerning the 
nature of learned helplessness. The first con- 
cerns the limited number of types of tasks 
that have been used in learned helplessness 
test phases. With a few exceptions, these 
tasks have been cognitive problem-solving 
tasks. Few other types of tasks have been 
used to investigate the effects of learned 
helplessness in other types of situations. The 
second issue concerns the degree of impair- 
ment found in human subjects. All of the 
helplessness effects reported with humans have 
been relatively small, with no reports of any 
behavior as disabling as was found in the 
earlier research with infrahumans, There аге 
certainly ethical and legal reasons for not 
attempting to instill these deficits in humans, 
but the consistent findings of small dif- 
ference between groups raises questions con- 
cerning the relative significance of learned 
helplessness in humans. Animal research may 
provide a reasonable facsimile, or it may not. 
It is certainly possible that the superior cog- 
nitive strategies available to humans prevent 
the extreme effects of learned helplessness. 
This question of the relevance of animal re- 
search for a theory of learned helplessness is 
discussed in a later section. 


Etiology 


Seligman (1973, 1974, 1975) has postu- 
lated that the major causal factor in the 
development of learned helplessness is the 
organism’s belief or expectancy that its re- 
sponses will not influence the future proba- 
bility of environmental outcomes (expec- 
tancy of response-outcome independence). 
He has further proposed that information 
about this contingency is a property of the 
environment, not of the organism, but that 
this information is transformed into an ex- 
pectancy of the response-outcome contin- 


HELPLESSNESS IN HUMANS 


gency, and it is this internal expectancy that 
directs behavior. In the learned helplessness 


* paradigm, it is only when the organism forms 


# 


the expectancy that its response will not be 
effective that learned helplessness is pre- 
dicted to occur. Seligman has also briefly 
mentioned three variables that may limit 
the acquisition of the expectancy of response 
independence and of learned helplessness: 
(a) previous response-outcome expectancies, 
(b) discrimination between situations, and 
(c) the relative importance of the situation. 


| He has not discussed these variables in any 
-. detail, however, and has not integrated them 


into his general theoretical framework. 

Before moving on to an examination of the 
research bearing on Seligman's theory, the 
purpose of this section should be clarified. In 
terms of the general experimental paradigm 
discussed in the introduction, this section is 
concerned with the training phase, that is, 
those experimental manipulations and condi- 
tions that have been used to operationalize 
Seligman's construct of response-outcome 
independence, and with the success of these 
manipulations in producing deficits in test- 
task performance. 

Turning now to an examination of the re- 
search, all of the animal studies and over 
half of the studies using human subjects have 


"used an uncontrollable aversive stimulus, 


either noise or shock, in the training phase of 
the learned helplessness experiment (Fosco & 
Geer, 1971; Gatchel et al, 1975; Gatchel & 
Proctor, 1976; Geer, Davison, & Gatchel, 
1970; Glass & Singer, 1972; Hiroto, 1974; 
Hiroto & Seligman, 1975; Klein '& Seligman, 
1976; Krantz et al., 1974; Miller & Seligman, 
1975, 1976; Sherrod & Downs, 1974; Thorn- 
ton & Jacobs, 1971, 1972). In these studies, 
subjects were exposed to an instrumental 
escape-avoidance task. The contingent group 
could actually avoid or escape the aversive 
stimulus, but the noncontingent group's re- 
sponses did not influence the presentation of 
the aversive stimulus. Although several of 
these studies have not used appropriate con- 
trol groups (Wortman & Brehm, 1975), all 
studies using an uncontrollable aversive 
stimulus, except that by Thornton and Ja- 
cobs (1972), have reported that the noncon- 
tingent group showed significant performance 
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deficits in the test phase. These studies offer 
support for Seligman’s hypothesis by dem- 
onstrating that close replication of the ex- 
perimental procedures used to induce learned 
helplessness in infrahumans will also lead to 
learned helplessness in humans. 

However, these studies have provided only 
a partial test of the ramifications of Selig- 
man’s theory. Seligman (1975) suggested 
that learned helplessness will result not only 
from noncontingent aversive stimulation but 
from any noncontingent environmental out- 
comes, including positive reinforcement. One 
group of studies (Benson & Kennelly, 1976; 
Hiroto & Seligman, 1975; Roth & Bootzin, 
1974; Roth & Kubal, 1975; Miller & Gold, 
Note 3) has attempted to test this more gen- 
eral hypothesis by inducing learned helpless- 
ness by exposure to noncontingent positive 
reinforcement. All of these studies have used 
the procedure first reported by Roth and 
Bootzin (1974), who exposed subjects to con- 
tingent or random (noncontingent) rein- 
forcement concerning their performance on a 
series of concept-formation discrimination 
tasks, as were described by Levine (1971). 
Roth and Bootzin argued that the condition 
of random reinforcement corresponded to 
Seligman’s theoretical condition of indepen- 
dence of responses and outcomes but was not 
confounded by the effects of exposure to 
aversive stimulation, The results of studies 
using this type of training condition have 
been mixed, Hiroto and Seligman (1975) and 
Benson and Kennelly (1976) reported that 
random-reinforcement groups showed deficits 
on an anagram test task, relative to con- 
tingent-reinforcement and control groups. 
Conversely, Roth and Bootzin (1974) re- 
ported a facilitation of performance following 
exposure to this type of training task, Roth 
and Kubal (1975) varied the number of 
tasks in which the subjects received random 
reinforcement. They reported that subjects 
exposed to only one learned helplessness 
training task showed facilitation effects, 
whereas subjects exposed to random rein- 
forcement in two different training tasks 
showed deficits on the same test task, These 
results, however, have not been replicated 
(Kilpatrick-Tabak & Roth, Note 4). It 
should be noted that the test tasks used by 
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Roth and Bootzin (1974) and Roth and 
Kubal (1975) differed from those used by 
Hiroto and Seligman (1975) and Benson and 
Kennelly (1976) on several dimensions, not- 
ably on the similarity of the training and test 
phases. (These differences are discussed more 
thoroughly in a later section.) 

In general, it is apparent that the results 
from studies using а random-reinforcement 
procedure have been far from definitive. 
Furthermore, the use of random reinforce- 
ment as noncontingent positive reinforcement 
is questionable. Benson and Kennelly (1976) 
cogently argued that since nonreward in the 
context of reward produces frustration, an 
aversive event (Amsel, 1972), a random 
schedule of rewards means that reward and 
frustration may also occur in a random and 
uncontrollable manner. A random-reinforce- 
ment procedure can thus be seen as con- 
founding noncontingent positive reinforce- 
ment with noncontingent aversive stimula- 
tion. 

Not only does a random-reinforcement pro- 
cedure confound the type of noncontingent 
outcome, it also confounds the absolute 
amounts of positive and negative outcomes 
that occur. Studies using a random-reinforce- 
ment group have typically not yoked the non- 
contingent (random) group's reinforcement 
to the contingent group's level, as was done 
in the early animal studies, but instead have 
fixed the random group's reinforcement rate 
at 50%. Thus, the obtained differences be- 
tween groups may be due to differences in 
the amount and pattern of reinforcement re- 
ceived. 

Two studies (Benson & Kennelly, 1976; 
Miller & Gold, Note 3) have controlled for 
the amount and type of reinforcement. Miller 
and Gold used three groups in the standard 
Levine (1971) discrimination-problem train- 
ing task: (a) contingent (8096 correct), (b) 
yoked noncontingent (8096 correct), and (c) 
random (50% correct). They reported that 
the yoked and random groups showed sig- 
nificant deficits on a later anagram task, rela- 
tive to the contingent group, but the yoked 
group performed significantly better than the 
random group. In addition to contingent, 


random, and control groups, Benson and 


Kennelly (1976) used a noncontingent 1009 
correct group and found no significant dif 
ferences between the contingent group and 
the noncontingent/correct group on а Тајер 
anagram test task, even though the попсой 
tingent/correct group reported having no сой 
trol over outcomes. Thus, noncontingent розе 
tive reinforcement does not appear to pro: 
duce learned helplessness. 

Taken together, these two studies provide 
evidence that the type and amount of rein 
forcement directly influence the development ol 
learned helplessness. Similarly, although stud: 
ies combining noncontingency with a clearly 
aversive outcome have consistently produced 
learned helplessness, other studies using 
random-reinforcement procedure, which сай 
be seen as alternating noncontingent positiv 
and noncontingent negative reinforcement, 
have reported mixed results. The overall pat 
tern of results, however, suggests that nom 
contingent reinforcement is not a песеззац 
and sufficient condition for the developme 
of learned helplessness but that both the com: 
tingency and the nature of Ф obtained out: 
come are critical to learned helplessness. 

In addition to the basic issue of what type 
of conditions are necessary to produce learned 
helplessness, there are several other variables 
that have been shown to exert a significant 
influence on the development of learn& 
helplessness, including (a) instructional se 
(b) task importance, and (c) attributions 0} 
performance. 


Instructional Set 


Instructions given to subjects in the & 
perimental situation regarding response-oul 
come contingencies have been shown to 8 
fect the development of learned helplessne 
(Geer et al, 1970; Glass & Singer, 1972 
Hiroto, 1974). For example, Glass and Sing 
told subjects they could push a button 
would turn off an aversive tone but tha 
the experimenter would “prefer you didn 
use it." This procedure significantly increa 
problem-solving.performance during expos 
to the tone, relative to those subjects УУ 
were not told they had control of the nos 
Similarly, Hiroto (1974) reported that SUP 
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jects who were given instructions that onset 
of an aversive stimulus was contingent on 
their responses were significantly less helpless 
-than those subjects who were told the experi- 
menter was controlling the aversive stimulus. 
Although not discussed specifically by 
Seligman, the above results are congruent with 
the focus of his theory. Since Seligman (1975) 
posited that behavior is motivated by an in- 
ternal, cognitive expectancy, these experi- 
mental instructions can be viewed as one way 
of manipulating expectations. For example, 
è Hiroto's (1974) "chance" instructions repre- 
. sent another procedure that induces an ex- 
pectancy similar to that induced by an experi- 
|j ence with an inescapable aversive stimulus 
(ie, an expectancy that responses and out- 
comes are noncontingent), which also results 
in similar decrements in performance. Simi- 
larly, as in the Glass and Singer (1972) 
study, instructions specifying that responses 
do control outcomes produce a positive ex- 
pectancy and no deficits in performance. 
Thus, instep] set regarding the rein- 
forcement. commgencies appears to be a cru- 
cial variable for inducing learned helplessness. 


ж 


P Task Importance 


. Instructions regarding the relative signifi- 
cance of the experimental task have also been 
shown to influence the development of learned 
helplessness. As is mentioned above, Selig- 
man (1975) briefly mentioned the effects of 
task importance but did not elaborate thís 
. prediction. Two studies (Roth & Kubal, 
* 1975; Miller & Gold, Note 3) have directly 
investigated the effect of instructions regard- 
ing task importance on the development of 
learned helplessness. In both studies, im- 
portance was manipulated by instructing col- 
lege student subjects that the training and 
test tasks were measures of scholastic apti- 
^ tude and intelligence. Subjects exposed to 
noncontingent reinforcement showed signifi- 
cantly greater helplessness in the important 
conditions than in the unimportant condition. 
Using contingent, yoked-noncontingent, and 
random-reinforcement groups, Miller and Gold 
found no differences between unimportant 
groups but significant differences between all 
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important groups. Similar results were re- 
ported by Roth and Kubal (1975). These 
studies strongly support the hypothesis that 
the perceived importance of the experimental 
task is a potent factor in the development of 
learned helplessness and can be manipulated 
by experimental instructions. 


Attributions of Performance 


Five studies (Dweck, 1975; Dweck & Rep- 
pucci, 1973; Klein et ај, 1976; Tennen & 
Eller, 1977; Wortman, Panciera, Shusterman, 
& Hibscher, 1976) have investigated the role 
of the subject’s attributions of task perform- 
ance in the development of learned helpless- 
ness. Attribution theory (Weiner, Frieze, 
Kukla, Reed, Rest, and Rosenbaum, 1971) 
postulates that the individual’s causal attribu- 
tions influence his or her expectations for 
probable outcomes of future performance. 
In learned helplessness, attribution theory 
would suggest that the subject’s attribution 
concerning the noncontingency of reinforce- 
ment would influence both one’s expecta- 
tions and one’s performance in future tasks. 
The research investigating this hypothesis 
has generally been supportive. Dweck and 
Reppucci (1973) reported that following ex- 
perience with noncontingent failure (unsolv- 
able block designs), those children who 
showed the most performance deficits tended 
to attribute success or failure to ability; con- 
versely, those children who showed fewest 
deficits tended to attribute their performance 
to effort. Dweck (1975) continued this line 
of research by developing a treatment pro- 
gram for “naturally occurring” helpless chil- 
dren, that is, those who were more adversely 
affected by failure. In her treatment program, 
Dweck taught these “helpless” children to 
attribute failure to lack of effort. Following 
this “reattribution training," these children 
showed significant improvements in task per- 
sistence and less helplessness than did a 
group treated with “success-only” experiences, 
who showed no differences from baseline. 

In another study concerning attributions 
and learned helplessness, Klein et al. (1976) 
directly manipulated attributions by inform- 
ing subjects about other subjects’ task per- 
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formance. In the internal-attribution condi- 
tion subjects were told that 5596 of previous 
subjects succeeded in all problems, and in 
the external-attribution condition they were 
told that 90% had failed on all problems. 
Following these instructions, depressed and 
nondepressed subjects were exposed to ran- 
dom reinforcement in a discrimination task 
followed by an anagram test task. Klein et al. 
reported that the attribution instructions did 
not significantly affect the nondepressed sub- 
jects, but for the depressed subjects the ex- 
ternal instructions alleviated learned helpless- 
ness on the anagrams. Klein et al. suggested 
that helplessness and depression are due to 
both failure and attribution of that failure to 
personal incompetence, 

The results obtained by Tennen and Eller 
(1977) also support this hypothesis. They 
exposed subjects to a double helplessness con- 
dition, in which subjects were told that each 
succeeding task was either easier or more 
difficult. The results indicated that the easier 
group, who presumably made attributions to 
ability, showed learned helplessness but that 
the more dificult group, who presumably 
made attributions to task difficulty, did not. 
Thus, three studies (Dweck & Reppucci, 
1973; Klein et al, 1976; Tennen & Eller, 
1977) have suggested that attribution of non- 
contingent failure to ability or personal com- 
petence leads to increased learned helpless- 
hess, whereas attribution of these outcomes 
to situational factors or task difficulty does 
not produce learned helplessness. 

One recent Study (Wortman et al., 1976) 
reported findings that appear contradictory 
to this hypothesis, However, a closer examina- 
tion of the methodology of this study re- 
veals significant differences between this study 
and the studies previously described. Wort- 
man et al. told their subjects that their study 
would deal with the effects of noise on per- 
formance and that the amount of noise would 
be contingent on their performance on sev- 
eral problems. Subjects were then exposed to 
unavoidable noise and were told they had 
failed to solve any of the problems. Three in- 
formation conditions were used: (a) no in- 
formation, (b) information that another sub- 
ject could solve the problems (incompetence 
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condition), and (c) information that another 
subject could not solve the problems (task- 
difficulty condition). The results indicated 
that the incompetence group felt more help- 
less and stressed but performed better than 
the task-difficulty group did when the same 
problems were presented later without the 
noise. Although these results appear contra- 
dictory to previous studies, a closer analysis 
of the methodology clarifies the discrepancy. 
As was pointed out by Tennen and Eller 
(1977), the incompetence condition in the © 
Wortman et al. study exposed subjects to a 
situation in which (a) they were told that the 
study was about the effects of noise on per- 
formance, and (b) another person’s perform- 
ance did not seem to be affected by noise, 
whereas their own performance under noise 
conditions was poor. It seems that the most 
likely attribution of this situation would be 
that the subject had difficulty solving prob- 
lems with accompanying noise, Thus, when 
problems were presented later without noise, 
the subjects in the incompetence group would 
expect their performance to improve and 
would be motivated to attempt to do so, Al- 
ternatively, the task-difficulty group, who 
thought that their failure was due to the dif- 
ficulty of the task, could not expect any 
change in their outcomes and thus would 
have been less motivated to attempt solutions 
on later problems. Viewed from this perspec- 
tive, the incompetence group did not produce 
attributions to the general quality of com- 
petence, but produced attributions to the 
relatively specific cause of ability to solve 
problems with noise present, and as such, the 
incompetence condition was not equivalent to 
the ability or competence conditions of previ- 
ous studies. 

It is evident from these studies that at- 
tributions of noncontingent failure experi- 
ences are a potent factor in the development 
of learned helplessness, with attributions to 
competence resulting in increased deficits. 
The implications of these results concerning 
the effects of attributions are very suggestive 
and will form the basis of the revised model 
of learned helplessness proposed in the last 
section of this article, 
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More generally, the research reviewed sug- 
gests that several variables (task instructions, 
task importance, attributions) appear to 
exert significant influences on the develop- 
ment of learned helplessness. Also, these vari- 
ables appear to be mutually interactive. Since 
studies have neither varied nor controlled for 
all of these variables, the relative contribution 
and specific interactions are difficult to spec- 
ify at this time. 


Facilitation 


Although exposure to noncontingent rein- 
forcement has generally been found to result 
in performance deficits, several studies (Roth 
& Bootzin, 1974; Roth & Kubal, 1975; 
Shaban & Welling, cited in Glass & Singer, 
1972; Tennen & Eller, 1977; Thornton & 
Jacobs, 1972; Wortman et al, 1976) have 
reported improved performance (facilitation) 
on a later dissimilar task following exposure 
to a learned helplessness training phase. 
Since these facilitation effects are at odds with 
Seligman's basic hypothesis, they deserve 
closer examination. 

Roth and her co-workers (Roth & Bootzin, 
1974; Roth & Kubal, 1975) have suggested 
that these results point to a curvilinear rela- 
tionship between the amount of exposure to 
noncontingent reinforcement and learned help- 
lessness. They have proposed that a moderate 
degree of exposure will result in a greater 
degree of responding or facilitation, whereas 
more exposure will result in deficits due to 
learned helplessness. Seligman (1975) briefly 
mentioned a similar hypothesis, but only in 
his argument concerning emotional responses 
to trauma. He suggested that the organism’s 
initial reaction to uncontrollability is fear, ac- 
tivity, and overresponding, and that con- 
tinued uncontrollable trauma leads to learned 
helplessness and depression. Similar predic- 
tions were made by Wortman апі Brehm 
(1975) in their integration of reactance the- 
ory and learned helplessness. 

Since experimental documentation of this 
hypothesized curvilinear relationship between 
an amount of exposure and learned helpless- 
ness requires at least two levels of exposure 
to noncontingent reinforcement, only two of 
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the studies reporting facilitation have offered 
direct evidence concerning this hypothesis. In 
the first study, Roth and Kubal (1975) 
varied the amount of exposure to noncon- 
tingent outcomes. They reported that sub- 
jects exposed to the single helplessness con- 
dition showed facilitation effects and that the 
subjects in the double helplessness condition 
showed deficits. Analyses of their data also 
showed significant quadratic trends. "These 
results appear to support Roth and Kubal's 
hypothesis, but Tennen and Eller (1977) 
argued that since the instructions of Roth 
and Kubal’s double helplessness condition 
told the subjects that each succeeding task 
was “a little bit easier,” the Roth and Kubal 
study confounded the amount of exposure 
with attribution instructions. Following at- 
tribution theory (Weiner et al., 1971), Ten- 
nen and Eller hypothesized that attributions 
to ability (an internal, stable cause) would 
lead to helplessness, whereas attributions to 
task difficulty (an external, variable cause) 
would lead to facilitation. Tennen and Eller 
tested this hypothesis by replicating Roth and 
Kubal's study with the addition of a double 
helplessness condition in which subjects were 
told that the tasks were becoming more dif- 
ficult. On a later test task, the double-help- 
lessness “easier” group showed helplessness 
deficits, and the double-helplessness “more 
difficult” group showed facilitation effects. 

Although the available evidence concern- 
ing facilitation effects suggests that a mini- 
mum level of exposure to noncontingent out- 
comes may result in facilitation whereas in- 
creased amounts of exposure to noncontingent 
outcomes may result in learned helplessness, 
there is evidence that these effects are medi- 
ated by the subject’s attributions of per- 
formance. 

In summary, the studies reviewed concern- 
ing the etiology of learned helplessness par- 
tially support Seligman's hypothesis regard- 
ing the necessary and sufficient conditions 
for the development of learned helplessness. 
The results of this review indicate that de- 
velopment of learned helplessness requires ex- 
posure to environmental conditions in which 
outcomes are independent of responses and 
are nondesired or aversive to the individual. 
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The research has also identified four variables 
(amount of exposure, instructions concern- 
ing contingency, task importance, and attribu- 
tions) that affect the development of learned 
helplessness and appear to be mutually in- 
teractive. Although Seligman's theory briefly 
discusses the possible effects of instructions 
and situational importance, it disregards at- 
tributions entirely. Furthermore, the focus of 
the effects of instructions, task importance, 
and attributions appears to lie in the cogni- 
tive processes of the individual, which Selig- 
man does not address beyond the use of a 
simple expectancy construct. This lack of 
specification of cognitive processes influencing 
the development of learned helplessness ap- 
pears to be a critical deficiency. Finally, a 
number of studies have reported improved 
performance following exposure to noncon- 
tingent failure outcomes, a phenomenon 
neither predicted nor explained by Seligman's 
theory. 


Individual Differences 


Seligman has not explicitly included any in- 
dividual differences variables in his formula- 
tions of learned helplessness. However, several 
studies have reported individual differences 
following exposure to a learned helplessness 
training phase. Hiroto (1974) reported that 
subjects scoring in the external direction on 
Rotter's (1966) Internal-External Locus of 
Control Scale showed greater performance 
deficits after exposure to noncontingent rein- 
forcement than did internals. Miller and Selig- 
man (1975), however, did not report differ- 
ences on this variable. Krantz et al. (1974) 
investigated the performance of Type A and 
Type B coronary-prone individuals after ex- 
posure to noncontingent reinforcement. They 
reported significant differences in the re- 
sponses of Type A and Type B subjects to 
noncontingent reinforcement. Furthermore, 
these differences appeared to be a function 
of the interaction of noncontingency and the 
intensity of the aversive stimulus, and are 
consistent with descriptions of these two per- 
sonality types. 

Another individual differences variable that 
has received some attention in the learned 
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helplessness literature is gender. Although the 
great majority of learned helplessness studies 
have not analyzed sex differences, two studies 
(Dweck & Bush, 1976; Dweck & Reppucci, 
1973) have pointed out the possible effects of 
sex differences in response to noncontingent 
reinforcement. Dweck and Reppucci (1973) 
reported that male children attributed out- 
comes to the amount of effort more often than 
did female children, and attributions to effort 
resulted in significantly smaller performance 
deficits. In a more intensive study, Dweck and 
Bush exposed male and female children to 
noncontingent failure from both an adult and 
a peer evaluator. Failure feedback from 
adults resulted in impaired performance for 
girls but in improved performance for boys. 
Similarly, when a peer evaluator was used, 
boys showed no performance increases, 
whereas girls’ performance improved signifi- 
cantly. Also, children's attributions of failure 
varied with the type of evaluator, with boys 
attributing failure to the amount of effort 
with the adult evaluator but to their own 
abilities in the peer situation. Again, girls 
showed opposite effects, attributing perform- 
ance to ability with the adult evaluator and 
to effort with a peer. Although these two 
studies were done with children and were 
preliminary, they do offer evidence that there 
may be sex differences in response to learned 
helplessness and, furthermore, that attribu- 
tions of performance may mediate this effect. 

Seligman (1974, 1975) has also suggested 
that reactive depression and learned helpless- 
ness are similar disturbances, with common 
etiology, symptoms, course, and cure. It is 
not the purpose of this article to discuss the 
validity of learned helplessness as a model of 
depression (see Blaney, 1977; Eastman, 
1976; Seligman, 1974, 1975), but a brief dis- 
cussion of the similarities in performance of 
depressed subjects and subjects exposed to a 
learned helplessness training phase seems ap- 
propriate. Six studies have investigated these 
similarities and have reported that nonde- 
pressed subjects exposed to a learned helpless- 
ness training phase perform similarly to de- 
pressed subjects on a variety of test tasks, 
including expectancy changes (Klein & Selig- 
man, 1976; Miller, Seligman, & Kurlander, 
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1975; Miller & Seligman, 1973, 1976), es- 
cape-avoidance learning (Klein & Seligman, 
1976), and anagram solution (Klein et al., 
1976; Miller & Seligman, 1975). Conversely, 
exposure to a learned helplessness training 
phase has not increased performance deficits 
observed among depressed subjects (Klein 
et al., 1976; Miller & Seligman, 1975). Thus, 
Seligman’s hypothesis regarding the similar- 
ities of depression and learned helplessness 
has been supported by the available research. 
It must be noted, however, that all of these 
studies used nonclinically depressed college 
students, and the generalizability and rele- 
vance of these results for clinical depression 
remains to be demonstrated. 

Although the few studies investigating the 
effects of individual differences variables and 
learned helplessness have produced signifi- 
cant results, individual differences, except for 
depression, have received little attention 
from learned helplessness research or theory. 
Since there are both theoretical arguments 
(Cronbach, 1975; Kiesler, 1971; Underwood, 
1975) and empirical evidence (Bowers, 1973; 
Moos, 1969) suggesting that Person X Situa- 
tion interactions will provide a more accurate 
conceptualization of behavior than will either 
dimension separately, future research and the- 
ory in the learned helplessness area should in- 
clude discussion of these interactions and 
their effects. 


Generalization 


Generalization of performance deficits be- 
yond the specific experimental task and situa- 
tion is a major unresolved issue of the learned 
helplessness literature. The basic finding in 
the paradigmatic learned helplessness experi- 


., ments has been that subjects tend to over- 


generalize experiences in the training phase 
to the later test phase. Subjects exposed to 
aversive situations in which responses do not 
influence outcomes overgeneralize this non- 
contingency to later situations in which re- 
sponses do influence outcomes. The relative 
significance of the learned helplessness para- 
digm is largely tied to the degree of general- 
ization that occurs, Roth and Bootzin (1974) 
stated the issue succinctly: “Тһе major ques- 


103 


tion is whether an induced external expec- 
tancy generalizes to a new situation, not 
whether it controls behavior in the situation 
in which it is induced” (p. 255). Clearly, a 
reduction of responding in a situation in which 
responses do not influence outcomes is an 
adaptive behavior. It becomes maladaptive 
only when it is transferred or generalized to 
new situations in which outcomes are con- 
tingent on responses. 

Seligman (1975) argued that “теп and 
animals are born generalizers. . . . The learn- 
ing of helplessness is no exception” (p. 35). 
He proposed that learned helplessness is a 
general phenomenon that influences many dif- 
ferent aspects of an individual’s life. In short, 
he suggested that learned helplessness can be 
viewed as a generalized personality trait, 
which influences behavior in a wide range of 
situations. 

Unfortunately, the evidence concerning the 
generalizability of deficits caused by learned 
helplessness inductions is not definitive. Most 
learned helplessness studies have used test 
phases in which the type of task and the sit- 
uational conditions were similar to those of 
the training phase (Benson & Kennelly, 1976; 
Dweck & Bush, 1976; Eisenberger et al., 
1976; Fosco & Geer, 1971; Hiroto, 1974; 
Hiroto & Seligman, 1975; Klein & Seligman, 
1976; Krantz et al., 1974; Miller & Gold, 
Note 3). Other studies have demonstrated 
generalization across types of tasks, when the 
training and test situations have been simi- 
lar (Gatchel et al., 1975; Gatchel & Proctor, 
1976; Hiroto & Seligman, 1975; Miller & 
Seligman, 1975; Thornton & Jacobs, 1972). 
That is, generalization has been reported 
when subjects perceived that the training 
and test phases were both part of the same 
experiment. These studies have not investi- 
gated the degree of situational generality of 
the learned helplessness. Hiroto and Selig- 
man (1975) recognized this shortcoming of 
their study: 


One limitation on the generality of these effects 
should be mentioned. The subjects clearly perceived 
both tasks, as different as they are, as part of the 
same experiment. We do not know whether any 
learned helplessness was carried out of the labora- 
tory. (p. 326) 
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The critical nature of the situational simi- 
larity between learned helplessness training 
and test phases is shown in the Dweck and 
Reppucci (1973) study, in which children 
were given a set of unsolvable problems by 
one teacher and a set of solvable ones by 
another. Later, these children were adminis- 
tered a similar set of solvable problems by 
both teachers. The children showed signifi- 
cantly poorer performance when the prob- 
lems were administered by the teacher who 
had previously given unsolvable problems. In 
this study, even though the training and test 
tasks were identical, the stimulus, namely, 
the teacher, influenced the children's per- 
formance. The children did generalize the ex- 
periences of the training phase maladap- 
tively, but the generalization was tied to the 
situational characteristics in which the learned 
helplessness was produced. 

Other studies have investigated the gen- 
eralization issue by using a test phase that 
was situationally dissimilar from the train- 
ing phase (Roth & Bootzin, 1974; Roth & 
Kubal, 1975; Sherrod & Downs, 1974; Ten- 
nen & Eller, 1977; Wortman et al., 1976; 
Kilpatrick-Tabak & Roth, Note 4). Of these 
six studies, three (Roth & Kubal, 1975; 
Sherrod & Downs, 1974; Tennen & Eller, 
1977) reported deficits in their test phase, 
and three other studies (Roth & Bootzin, 
1974; Wortman et al, 1976; Kilpatrick- 
Tabak & Roth, Note 4) reported no deficits. 
The studies that did obtain cross-situational 
generalization had two common factors: (a) 
prolonged exposure of subjects to noncon- 
tingent failure outcomes, and (b) instructions 
designed to induce an attribution of task 
performance to ability. Studies that have not 
exposed subjects to these two conditions 
have not found cross-situational generaliza- 
tion. Conversely, one study (Kilpatrick- 
Tabak & Roth, Note 4) exposed subjects to 
these conditions and did not obtain gen- 
eralization. Thus, cross-situational generaliza- 
tion of learned helplessness has not been con- 
clusively demonstrated at this time. 

In summary, there is evidence that learned 
helplessness generalizes from one type of 
task to another, but there is no conclusive 
evidence regarding the degree of generaliza- 
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tion across situations, and no study has 
varied both situational and task dimensions, 
The failure to demonstrate cross-situational 
generalization is a major flaw in the learned 
helplessness literature. Without conclusive 
evidence for generalization, the significance 
of the learned helplessness phenomenon be- 
comes questionable. Unfortunately, Seligman's 
theory does not address the issue of general- 
ization in any detail and does not adequately 
specify the determinants of generalization of 
learned helplessness. If learned helplessness is 
to represent an adequate analogue of depres- 
sion, then a detailed statement of how and 
when generalization will occur and relevant 
research are essential. 


Alleviation 


Drawing on the animal research, Seligman 
(1975) suggested that learned helplessness 
could be “cured” by the establishment of an 
expectancy that outcomes are dependent on 
responses. In the case of dogs in the shuttle 
box, this "therapy" consisted of repeatedly 
pulling dogs to the safe side of the box until 
they learned that their movement affected the 
offset of the shock. Four studies have at- 
tempted to alleviate deficits caused by learned 
helplessness inductions in humans. Of these, 
three (Dweck, 1975; Klein & Seligman, 
1976; Kilpatrick-Tabak & Roth, Note 4) 
employed procedures in which subjects were 
given response-dependent feedback concern- 
ing their performance on a task interposed 
between training and test tasks. Klein and 
Seligman (1976) exposed nondepressed sub- 
jects to inescapable noise and then exposed 
half of these “helpless” subjects and depressed 
subjects without pretraining to a treatment 
of either 4 or 12 discrimination problems 
with response-dependent feedback. They re- 
ported that on a subsequent escape-avoidance 
task, both treatment groups showed signifi- 
cantly better performance and a greater eX- 
pectancy of response-outcome dependence 
than did subjects in the no-treatment groups. 
This study closely paralleled the earlier re- 
search with dogs and supported Seligman's 


hypothesis that exposure to response-depen- 


dent outcomes reduces learned helplessness. 
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A second study that attempted to alleviate 
learned helplessness was done by Kilpatrick- 
Tabak and Roth (Note 4). In this study, sub- 
jects selected without respect to depression 
were exposed to a training phase used by 
Roth and Kubal (1975). These subjects and 
depressed subjects without pretraining were 
then exposed to one of four treatments: (a) 
reading Velten's (1968) list of positive self- 
statements, a procedure that may be broadly 
conceptualized as a cognitive, possibly reat- 
tributional task; (b) solution of a set of sim- 
ple anagrams; (c) waiting alone for 15 min- 
utes; and (d) waiting with another person 
for the same time. All subjects, including a 
nondepressed control group without training 
or treatment exposure, were given the test 
task from Roth and Kubal's (1975) study. 
The results indicated that although both 
Treatments 1 and 2 were successful for the 
nondepressed, helpless group, they did not al- 
leviate the deficits shown by the depressed 
subjects. In fact, exposure to simple ana- 
grams increased later deficits in depressed 
subjects, a result quite contrary to Seligman's 
hypothesis. Although this study was com- 
promised by the fact that the waiting-period 
groups did not differ from the control group 
on the test task, the failure of the treatment 
procedure (i.e., solving anagrams) to alleviate 
deficits in the depressed subjects conflicts 
with Seligman's hypothesis and the results of 
Klein and Seligman (1976). Thus, the evi- 
dence regarding Seligman's hypothesis is in- 
conclusive at this time. 

Two other studies (Dweck, 1975; Klein 
et al, 1976) have taken a different approach 
to alleviation of learned helplessness and 
have focused on the subjects’ attributions of 
learned helplessness performance. Dweck 
(1975) selected children who were identified 
as “helpless” by several school personnel. She 
then exposed these children to one of two 
treatment conditions. The first treatment, “re- 
attribution training,” taught the children to 
attribute failure to a lack of effort. The sec- 
ond treatment, “success only,” provided chil- 
dren with a variety of success experiences 
with no attribution training. The results in- 
dicated that the success-only group did not 
change from their baseline level of perform- 


ance, whereas the reattribution group showed 
significant improvements from their baseline 
of performance. Unfortunately, the reattribu- 
tion-training group included contingent prac- 
tice, so conclusions of differential effectiveness 
cannot be attributed solely to the cognitive 
intervention. The most that can be concluded 
is that a package of response-dependent feed- 
back and altered cognitions is superior to re- 
sponse-dependent feedback alone.” 

Klein et al. (1976) reported that the per- 
formance of depressed subjects improved 
when they were told that most people failed 
the training task, an instructional set pre- 
sumed to induce an attribution of task diffi- 
culty. However, the treatment used by Klein 
et al. did not improve the performance of 
nondepressed subjects exposed to response- 
independent failure outcomes. 

In summary, the literature pertaining to 
the alleviation of learned helplessness pro- 
vides few clear conclusions. Naturally oc- 
curring performance deficits found in de- 
pressed college students and children have 
been improved by reattribution therapy 
(Dweck, 1975; Klein et al., 1976) and, in 
one study, by exposure to response-dependent 
success conditions (Klein & Seligman, 1976). 
Another study, however, found that exposing 
depressed subjects to response-dependent suc- 
cess outcomes produced no improvement (Kil- 
patrick-Tabak & Roth, Note 4). On the other 
hand, deficits engendered in nondepressed 
college students have been alleviated success- 
fully in two studies (Klein & Seligman, 1976; 
Kilpatrick-Tabak & Roth, Note 4) by ex- 
posure to response-dependent feedback, but 
Klein et al. (1976) reported that treating 
nondepressed college students with a reattribu- 
tion treatment was not successful in alleviat- 
ing the produced deficits. The two investiga- 
tions that have compared response-dependent 
success and a cognitive manipulation either 
did not afford unconfounded interpretation 
(Dweck, 1975) or did not include a treatment 
focused on reattributing failure experiences 
(Kilpatrick-Tabak & Roth, Note 4). Thus, 


2The authors are indebted to Larry Young for 
this observation. 
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present research does not allow conclusions 
concerning the relative effectiveness of these 
treatments. 


Summary 


As the preceding review demonstrates, 
Seligman's theory of learned helplessness no 
longer offers a full and viable explanation for 
the results of the current research. There are 
several areas in which Seligman's model 
seems inadequate to account for present data. 
The first area concerns the necessary and 
sufficient conditions for the development of 
learned helplessness, Although Seligman hy- 
pothesized that an expectancy of response- 
outcome independence is necessary and suf- 
ficient for learned helplessness to occur, cur- 
rent research suggests that an expectancy of 
response-outcome independence and a non- 
desired outcome are necessary for the devel- 
opment of learned helplessness. The second 
area in which Seligman's theory seems defi- 
cient is in regard to the other factors that 
effect learned helplessness. More specifica- 
tion is needed to explain how, why, and 
when variables such as instructions, task im- 
portance, and subject's attributions effect 
learned helplessness. The third area concerns 
generalization, Seligman's theory does not 
delineate the processes and circumstances 
under which learned helplessness will gen- 
eralize. Since the relevance of the learned 
helplessness paradigm seems to hinge on gen- 
eralization and present research has not doc- 
umented generalization of learned helpless- 
ness, this is a most critical deficiency. The 
lack of integration of individual differences 
factors into learned helplessness theory is a 
fourth area of weakness. The final area con- 
cerns the alleviation of learned helplessness. 
Support for Seligman’s hypothesis concerning 
the conditions necessary for alleviation of 
learned helplessness is equivocal, with attri- 
butions appearing to be a major factor. 

More generally, research and theory in the 
learned helplessness area appear to be 
plagued by a narrowness of approach. First, 
the basic stance of Seligman’s theory has 
largely been neglected by researchers. Selig- 
man hypothesized that the cognitive expec- 
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tancy of response-outcome independence, not 
the actual experience of exposure to those 
conditions, produces learned helplessness. 
Unfortunately, the learned helplessness re- 
search literature has focused primarily on 
the environmental conditions that seem to 
produce learned helplessness and have ne- 
glected the cognitive schema that Seligman 
postulated as crucial. Seligman’s theory can 
be seen as contributing to this neglect be- 
cause it does not elaborate the cognitive 
processes and variables relevant to learned 
helplessness, One reason for this deficiency 
may lie in the animal research origins of the 
learned helplessness paradigms. Dogs and 
rats simply do not have the cognitive com- 
plexity or construction abilities of humans. 
As was pointed out by Levis (1976), Selig- 
man’s theory equates the cognitive processes 
of humans with those of a cockroach. This 
article takes the position that the explanation 
of human behavor requires more complex and 
detailed hypotheses regarding the function of 
cognitive processes. The learned helplessness 
research appears to support this contention. 
Recent research has pointed to the importance 
of cognitive processes in mediating the de- 
velopment of learned helplessness, and further 
research and theory must focus on these cog- 
nitive aspects of learned helplessness. 

It is apparent that learned helplessness has 
been conceptualized primarily as a situational 
paradigm; that is, researchers have focused 
on delineating the situational conditions that 
produce learned helplessness in a majority of 
subjects. In doing so, however, the literature 
has largely neglected the processes by which 
these situational experiences are translated 
into cognitive schemata, the characteristics of 
these schemata, and the processes by which 
cognitive schemata influence future behavior. 
It is our contention that the situational view 
of human behavior does not adequately rep- 
resent or explain learned helplessness. 

In view of the deficiencies in the present 
theory of learned helplessness noted above, 
the possible significance of the learned help- 
lessness phenomenon, and the continued in- 
terest and research in this area, it seems 
appropriate at this time to propose a revised 
model of human learned helplessness. The 
model presented below attempts to outline à 
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Figure 1. A comparison of theories of learned helplessness. 


modified version of Seligman’s theory of 
learned helplessness that better fits the avail- 
able data and that gives definitive directions 
for new research in the area. 


Attribution-Theory Model 


Figure 1 presents the basic framework of 
both Seligman’s model and the present re- 
vised model of learned helplessness. The 
present model differs from Seligman’s model 
in several general areas. The first difference 
concerns the complexity of the antecedents 
of learned helplessness. Seligman (1975) 
postulated a single term of "information 
about the contingency" (p. 49), which refers 
to the environmental cues concerning re- 
sponse-outcome dependency, whereas the 
present model subdivides environmental 
events into two categories: outcome cues and 
situational cues. Outcome cues refer to the 
characteristics of the feedback concerning an 
individual's performance in a given situation, 
including Seligman's concern with response- 
outcome dependency. Situational cues refer 
to the stimuli present in the situation itself 
that influence the individual’s perception and 
interpretation of outcomes. Additionally, the 
revised model includes an individual differ- 
ences term and is explicitly interactional. En- 
vironmental events of outcomes and situa- 
tional cues are hypothesized to interact with 
each other and with individual differences 
variables to influence the individual’s cogni- 
tive processes. The final difference concerns 
the cognitive processes involved in learned 


helplessness. In addition to Seligman’s ex- 
pectancy term, the present model includes an 
attribution term. 

In keeping with the previous focus on the 
five salient issues concerning learned help- 
lessness, these variables and the present 
model are discussed below in more detail 
within the context of these issues. 


Nature 


In the present model, the nature of learned 
helplessness is presumed to be similar to that 
originally proposed by Seligman (1975). It 
differs from the earlier model by separating 
deficits resulting from learned helplessness 
into affective and performance components 
rather than into affective, motivational, and 
cognitive aspects. The binary distinction con- 
forms more closely with current research, 
which has not successfully isolated cognitive 
and motivational deficits. The present model 
also differs from Seligman’s position by sug- 
gesting that the affective and performance 
components of learned helplessness can occur 
independently and will vary with the con- 
tent of the individual’s attributions. 


Etiology 


The revised model makes two major addi- 
tions to Seligman’s theory concerning the 
etiology of learned helplessness. The first 
change concerns the content of expectancy 
that is hypothesized to produce learned help- 
lessness. Although Seligman (1975) suggested 
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that an expectancy of response-outcome in- 
dependence is a necessary and sufficient con- 
dition for development of learned helpless- 
ness, recent research, reviewed above, has 
suggested that an expectancy of response- 
outcome independence and an expectancy of 
failure to obtain desired outcomes are neces- 
sary for development of learned helplessness.* 

From an operational viewpoint, it is im- 
portant to note that this hypothesis implies 
that the outcome cues in the experimental 
task situation must include both response- 
outcome independence and failure. As is 
discussed above, most learned helplessness 
studies have used procedures that expose 
subjects to failure as well as noncontingent 
outcomes. In the remainder of this article, 
outcomes that are response independent and 
undesired are referred to as learned helpless- 
ness outcomes, 

The second addition concerning etiology in 
the revised model is the inclusion of the 
attribution term, which is derived primarily 
from the theory and research of attribution 
theory (Weiner, 1974; Weiner et al., 1971). 
In contrast with the single expectancy term 
of Seligman’s model, attribution theory sug- 
gests that analysis of the individual’s ascrip- 
tions of causality of environmental events 
will lead to more accurate representations of 
cognitive processing and to better prediction 
of future behavior, Seligman suggested that 
exposure to environmental events leads to the 
development of an expectancy concerning 
control and that this expectancy in turn in- 
fluences future behavior, whereas an attribu- 
tion model suggests that the interaction of 
outcome and situational cues with individual 
differences variables results in an attribution 
to explain learned helplessness outcomes, and 
the characteristics of this cause then deter- 
mine the expectancy that influences’ future 
behavior. 

According to Weiner (1974), attributions 
can be characterized by two basic dimensions: 
locus of control (internal vs. external) and 
stability (stable vs. variable). The present 
model adds two additional dimensions that 
seem particularly relevant to learned help- 
lessness: specificity (specific vs. general) and 
importance (important vs. unimportant). It 
is predicted that any attribution can be char- 


acterized by these four dimensions. Further- 
more, it is hypothesized that each of these 
dimensions has a specific effect on the future 
development and parameters of learned help- 
lessness. Thus, knowledge of the particular 
characteristics of a given attribution devel- 
oped in the training phase will allow specific 
predictions concerning the resulting learned 
helplessness. Two further points concerning 
the various dimensions of attributions should 
be noted. First, although often referred to as 
dichotomous, each attributional dimension is 
conceptualized as a continuum; for example, 
although attributions to effort and to luck 
may be conceptualized as variable, luck may 
be more variable than effort. Second, each 
dimension of attribution is a subjective di- 
mension; that is, one individual may perceive 
luck as more stable (“I’m a lucky person") 
than another (“That was just luck"). Thus, 
although general statements can be made 
concerning the dimensions of particular at- 
tributions, for best prediction the individual's 
perception of each dimension is necessary. 


Locus of Control 


The locus of control dimension represents 
a concern similar to the response-outcome 
contingency that Seligman postulated as 
basic to learned helplessness. However, the 
locus of control term used in this model does 
not represent the contingency of environ- 
mental events or an expectancy of response— 
outcome contingency but instead represents 
the assignment of causality for the perceived 
contingencies to internal or external sources. 
For example, in a typical learned helplessness 
study, subjects may perceive that their re- 
sponses do not influence outcomes but may 
assign causality for this noncontingency to 


3It should be noted that the present model 
treats task outcome and response-outcome con- 
tingency as orthogonal, independent dimensions. 
Although there is some evidence that these dimen- 
sions may not be independent (Jenkins & Ward, 
1965), the relationship between outcome and con- 
tingency is complex and poorly understood at 
present (see Blaney, 1977). It seems prudent to 
treat these dimensions as independent and to use 
separate measures and predictions for each variable, 
until future research clarifies the relationship. 
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an internal source (“I am stupid") or an ex- 
ternal source (“The experimenter is control- 
ling the task”). Research in attribution the- 
ory (Eswara, 1972; Rest, Nierenberg, Wei- 
ner, & Heckhausen, 1973; Weiner & Kukla, 
1970) has suggested that the attributional 
dimension of locus of control directly influ- 
ences the subjects’ affective reactions to task 
performance. These studies support the hy- 
pothesis that if one attributes failure to an 
internal cause, self-depreciation and negative 
affect result, whereas attribution of failure to 
an external cause minimizes this affect. Simi- 
larly, positive affects following success are 
maximized by attributions to internal causes 
and are minimized by attributions to external 
causes. These findings suggest that following 
exposure to learned helplessness outcomes, 
an attribution to an internal cause will pro- 
duce negative affect, and an attribution to 
an external cause will reduce this negative 
affect. 

Evidence for this hypothesis comes from 
the Roth and Kubal (1975) study, in which 
half the subjects were exposed to a manipu- 
lation designed to produce a belief that the 
learned helplessness training task was a mea- 
sure of intelligence. This important group 
reported significantly more depression and 
frustration following training than did an 
unimportant group. If one assumes that task 
performance in the important condition was 
attributed to intellectual ability, an internal 
cause, then this study can be seen as sup- 
porting this hypothesis. Unfortunately, this 
study and others in the learned helplessness 
area have not focused on attributions and 
have not differentiated among the dimensions 
proposed in the present model. That is, al- 
though the Roth and Kubal manipulation 
can be hypothesized to produce an attribu- 
tion to intellectual ability, no direct measures 
of attributions were taken, and furthermore, 
an attribution of intellectual ability can be 
characterized by other dimensions than locus 
of control, which may have influenced emo- 
tional responses. Thus, the Roth and Kubal 
study offers suggestive but nonconclusive 
support for the present model. Unfortu- 
nately, the presence of suggestive but non- 
conclusive evidence will be a common finding 


" for most of the hypotheses of the model. The 
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research focusing on the relationship of 
these dimensions of attributions to learned 
helplessness simply has not been done. 

This hypothesis also offers an explanation 
for those studies that have reported an in- 
crease in self-reported anger and hostility 
following exposure to learned helplessness 
training conditions. As is discussed above, 
these results are not explained by Seligman's 
model. However, the attribution theory model 
suggests that an attribution of learned help- 
lessness outcomes to an external source, such 
as experimenter control would be predicted 
to reduce depression but also could be ex- 
pected to increase the subject's anger at the 
external agent. Anecdotal evidence for this 
hypothesis comes from Miller and Seligman 
(1975, 1976), who stated that two subjects 
in each study reported high levels of anger 
after the training phase and told the experi- 
menter that “they had decided early in the 
experiment that it had been rigged so that 
they could not escape the noise and that this 
had made them angry" (Miller & Seligman, 
1975, p. 235). Moreover, these subjects re- 
ported large decreases in depression, This an- 
ecdotal evidence provides further support 
for the hypothesis that attributions mediate 
affective reactions to learned helplessness out- 


comes. 


Stability 


Stability refers to the relative permanence 
associated with an attribution. Environmental 
events can be attributed to causes that are 
stable or variable. For example, intelligence 
is a relatively stable attribution, whereas 
effort or luck are relatively variable, Attribu- 
tion theory hypothesizes that stability of 
attribution determines the degree of influ- 
ence that past outcomes exert on expectan- 
cies for performance in future situations. 
Several studies (Fontaine, 1974; McMahan, 
1973; Weiner et al, 1976) have supported 
this hypothesis. Thus, if one attributes past 
outcomes to luck (a variable cause) then 
these outcomes will not influence one's ex- 
pectancies in future situations, but if one at- 
tributes past outcomes to ability (a stable 
outcome) then one's expectancies for per- 
formance in future situations will shift in the 
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direction of the outcome. These results sug- 
gest that stability of attribution mediates 
the degree of influence that past outcomes 
exert on expectancies for performance in fu- 
ture situations, or the degree of cross-situa- 
tional generalization. 

In the learned helplessness situation, in 
which the outcome cues to be explained are 
those of response-outcome independence and 
failure, this hypothesis suggests that an 
attribution to a stable cause will result in a 
shift of the subjects’ expectancies for future 
performance toward noncontingency and 
failure and, therefore, generalization to dis- 
similar situations. This hypothesis, then, 
goes right to the heart of the learned help- 
lessness paradigm, suggesting that the stabil- 
ity of attribution predicts the situational 
generalization of learned helplessness. Thus, 
the next postulate of the model suggests that 
an attribution of learned helplessness out- 
comes to a stable cause will tend to increase 
the situational generalization of these out- 
comes, 

This hypothesis concerning cross-situa- 
tional generalization is supported by the 
Dweck and Reppucci (1973) study in which 
the greatest cross-situational generalization 
of learned helplessness deficits occurred in 
those children who attributed performance to 
ability, a stable cause, and children who 
Showed the fewest deficits attributed their 
performance to effort, a variable cause. The 
results of Roth and Kubal (1975) and Ten- 
nen and Eller (1977) also lend support to 
this hypothesis. In these studies cross-situa- 
tional generalization was found, but only in 
the experimental groups who had been ex- 
posed to an experimental manipulation de- 
signed to induce attributions to intellectual 
ability, a stable cause. 


Specificity 


Although attribution theory does not ad- 
dress itself to this dimension, attributions 
can also be characterized by their specificity 
or generalizability. In the learned helpless- 
ness paradigm, the specificity of attribution 
is postulated to predict the number of future 
tasks that will be affected by the expectancies 
developed in the training phase. For example, 
if a subject attributed learned helplessness 
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outcomes to the ability to solve discrimina- 
tion problems, a specific cause, then expec- 
tancies for and performance on future math 
problems should not be affected. But if the 
attribution were to a general cause, ability to 
solve problems, then expectancies for and 
performance on future math problems should 
be affected. Thus, the present model hypothe- 
sizes that attribution of learned helplessness 
outcomes to a general cause will tend to in- 
crease the influence of the training-phase 
outcomes on expectancies and performance in 
different types of tasks, Unfortunately, no 
study has directly investigated the effects of 
this dimension on learned helplessness. 


Importance 


The last dimension of attribution to be 
discussed is subjective importance, that is, the 
relative value a person assigns to an event. 
Subjective importance or reinforcement value 
has been discussed as a major component of 
the social-learning theory of human behavior 
(Rotter, 1954; Rotter, Chance, & Phares, 
1972) but has not been discussed in detail by 
attribution theorists. In social-learning the- 
ory, the construct of subjective importance is 
seen as a major dimension of the task, The 
present model, however, in keeping with an 
emphasis on attributions, predicts that at- 
tributions, as well as tasks, differ in the de- 
gree of importance to a given individual and 
that the subjective importance of an attribu- 
tion can also influence present and future 
behavior. This model further suggests that 
the influence of the dimension of subjective 
importance will be manifested primarily in 
terms of the magnitude of the affective and 
performance deficits predicted by the other 
dimensions of attributions.* Thus, it is hy- 


+ The subjective importance of the attribution 
is not the only factor that contributes to the mag- 
nitude of deficits. As social-learning theory and 
Abramson, Seligman, and Teasdale (1978) have 
mentioned, the subjective importance of the out- 
come itself can also influence magnitude. Similarly, 
the other dimensions of attributions will affect mag- 
nitude. Attributions to causes that are relatively 
stable or general will, due to their relative general- 
izability, increase the magnitude of learned helpless- 
ness deficits. 
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pothesized that the magnitude of learned 
helplessness deficits will covary with the sub- 
jective importance of the attribution of 
learned helplessness outcomes. 

Two studies designed to vary subjective 
importance (Roth & Kubal, 1975; Miller & 
Gold, Note 3) have manipulated experimental 
task cues to increase the subjects’ belief that 
task performance reflected intellectual abil- 
ity. Although there are no direct data, this 
type of manipulation appears to increase the 
probability of an attribution to intellectual 
ability, presumably an important attribution 
for most college students. Roth and Kubal 
reported that subjects exposed to the noncon- 
tingent important manipulation reported in- 
creased depression and anxiety and increased 
performance deficits. Miller and Gold also 
reported significantly greater performance 
deficits in their noncontingent important con- 
dition. Of course, an attribution of intellec- 
tual ability can also be characterized as in- 
ternal, stable, and general, so it is not clear 
that the importance of the attribution is the 
causal factor. 

The basic etiological hypotheses of the 
revised model of learned helplessness have 
now been presented, Exposure to outcome 
cues of response-outcome independence and 
failure results in an attribution to explain 
these outcomes, The characteristics of the 
attributions constructed are predicted to de- 
termine the development, type, and generali- 
zation of learned helplessness deficits. The 
dimension of locus of control is predicted to 
determine the resulting affective components 
of learned helplessness, with an attribution 
of learned helplessness outcomes to ап 1m- 
ternal cause hypothesized to result in de- 
pression and anxiety. Stability of attribution 
is predicted to determine cross-situational 
generalization, with attributions to a stable 
cause resulting in cross-situational generaliza- 
tion of learned helplessness deficits. Speci- 
ficity of attribution is hypothesized to con- 
trol the cross-task generalization, with at- 
tributions to general causes resulting in 
cross-task generalization. The importance 
dimension is hypothesized to affect the in- 
tensity of the deficits produced in learned 
helplessness, with attributions to important 
causes resulting in maximum disability. Com- 


bining these hypotheses, it should be noted 
that attributions to causes that are internal, 
important, stable, and general are predicted 
to maximize the severity and generalization 
of learned helplessness, whereas attributions 
to causes that are external, unimportant, 
variable, and specific will minimize deficits. 

As with other cognitive processes, the con- 
struction of attributions can be influenced by 
a variety of other factors. Thus, the next 
task of the model is to specify those situa- 
tional cues that may interact with outcome 
cues and individual differences variables to 
affect the attributional process. 


Situational Cues 


As is mentioned above, situational cues 
refer to the stimuli present in the situation 
that exert influence on the subjects’ attribu- 
tions of task outcome. Several situational 
task cues have been identified in the learned 
helplessness literature, including (a) instruc- 
tions regarding the contingency of responses 
and outcomes (Geer et al., 1970; Glass & 
Singer, 1972; Hiroto, 1974), (b) instruc- 
tions regarding task difficulty (Klein et al., 
1976), (c) instructions and other task cues 
specifying the nature of the task (Roth & 
Kubal, 1975; Miller & Gold, Note 3), and 
(d) the amount of exposure to learned help- 
lessness conditions (Roth & Kubal, 1975). 
Other situational cues that are hypothesized 
by attribution theory to influence the devel- 
opment of attributions include social norms, 
observation of others’ performance, and type 
of task (Weiner, 1974). 

Space limitations prevent a discussion of 
the specific effects of all of these situational 
cues (see Weiner, 1974), but an example 
may prove helpful in understanding the sig- 
nificance of this class of variables. Klein et 
al. (1976) reported that instructions that 
the task was difficult tended to increase at- 
tributions of learned helplessness outcomes 
to task difficulty, an external, variable, and 
specific cause, whereas instructions that the 
task was easy tended to increase the proba- 
bility of attributions to ability, an internal, 
stable, and general cause. These instructions 
also produced corresponding differences in 
task performance. Thus, it can be seen that 
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situational cues interact with outcome cues 
to produce an attribution that then influences 
expectancies for future performance and later 
task behavior. 


Individual Differences Variables 


Individual Differences Variables is perhaps 
a misnomer for this section, since a majority 
of the variables discussed are interactions 
between situational and individual differences 
variables. However, the purpose of this section 
is to specify those individual differences 
variables that may interact with situational 
variables to produce differential influences on 
learned helplessness performance. Since a 
number of individual differences variables 
have been found to influence the attributional 
process (see Weiner, 1974), this article pre- 
sents only those variables specifically relevant 
to learned helplessness, 
One individual differences variable that has 
been shown to influence attribution is gender. 
| As is discussed above, Dweck and Reppucci 
(1973) reported different attributions from 
male and female children, with males tending 
to attribute failure outcomes to effort and 
females tending to attribute them to ability. 
These results suggest that females would tend 
to be more susceptible to learned helplessness 
manipulations. However, as another study 
(Dweck & Bush, 1976) reported, these re- 
sults appear to be further mediated by the 
situational cue of the role of the evaluator 
(peer vs. adult). Clearly, then, sex is a po- 
tent variable that can influence attributions 
and development of learned helplessness. It 
is also clear that the interaction among sex, 
learned helplessness, and attributions is a 
complex one, with other variables playing a 
major role. These results also indicate that 
sex is one variable that should be controlled 
in future learned helplessness research, 
Another individual differences variable that 
has been shown to influence the development 
of attribution is prior expectancies of future 


outcomes. Several studies have reported data - 


that suggest that the congruency between 
prior expectancies and present outcomes can 
influence the attributional process. Feather 
(1969; Feather & Simon, 1971a, 1971b) as- 
sessed subjects’ prior expectancies by sub- 
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jects’ reports (Feather, 1969) or previous 
experience (Feather & Simon, 1971a, 1971b), 
Subjects were then given anagram problems 
and were asked to rate the causes of per- 
formance outcomes. These results suggested 
that an outcome that is discrepant from pre- 
viously held expectancies tends to be attrib- 
uted to external causes, and outcomes con- 
gruent with expectancies tend to be attrib- 
uted to internal causes. These results were 
replicated by Gilmor and Minton (1974). In 
a study investigating another dimension of 
attribution, McMahan (1973) also used 
the anagram problems and found that ex- 
pectancy disconfirmation produced attribu- 
tions to variable causes and that expectancy 
confirmation led to attributions to stable 
causes, 

These results suggest that individuals tend 
to attribute discrepant outcomes to causes 
that lead to the smallest change in the indi- 
vidual's previously held expectancies. If this 
is true, then one would also predict that dis- 
crepant outcomes would be attributed to 
specific causes rather than to general causes. 
Unfortunately, there has been no reported 
research bearing directly on this question. 
However, the available research does suggest 
that when learned helplessness outcomes are 
highly discrepant from previously held ex- 
pectancies, attributions tend to be made to 
external, variable, and specific causes. Con- 
versely, if learned helplessness outcomes are 
congruent with previously held expectancies, 
attributions tend toward internal, stable, and 
general causes. Thus, learned helplessness is 
predicted to be more likely to occur when 
training-task outcomes are congruent with 
prior expectancies than when outcomes are 
incongruent. This prediction is supported by 
Hiroto's (1974) study, in which subjects 
with prior expectancies of an external locus 
of control (noncontingent) showed greater 
deficits following exposure to a noncontingent 
failure experience than did subjects with a 
belief in an internal locus of control. 

A final important individual differences 
variable for the learned helplessness para- 
digm is depression. Since learned helplessness 
has been advanced as a theory of depression, 
the interaction between depressed mood and 
exposure to learned helplessness outcomes 
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should be explored in some detail. Depressed 
subjects have been shown to be generally 
more pessimistic about future outcomes than 
are nondepressed subjects (Beck, 1974; Loeb, 
Beck, & Diggory, 1971)* and to expect the 
independence of responses and outcomes (Cal- 
houn, Cheney, & Dawes, 1974). Since learned 
helplessness outcomes of noncontingency and 
failure are congruent with these expectations 
of depressed subjects, the present model sug- 
gests that depressed subjects would tend to 
attribute learned helplessness outcomes to 
causes that are internal, stable, and general. 
Current research has generally supported 
this prediction, Klein et al. (1976) reported 
that depressed subjects tended to attribute 
performance to ability, an internal, stable, 
and general cause, after failure but not after 
success. Similarly, Miller and Seligman 
(1973) reported that depressed subjects 
tended to show smaller expectancy changes 
than did nondepressed subjects on a skill task 
after failure, but there was no difference after 
success. Since a small expectancy change has 
been shown to be associated with a stable 
attribution (Weiner et al., 1976), Miller and 
Seligman’s results suggest that depressed 
subjects tend to attribute noncontingent 
failure on skill tasks to stable causes. 

If depressed subjects tend to attribute 
learned helplessness outcomes to internal, 
stable, and general causes, then the present 
model predicts that depressed subjects would 
exhibit greater performance deficits and 
greater affective reactions than would non- 
depressed subjects, following exposure to 
learned helplessness outcomes. Two studies 
(Hammen & Krantz, 1976; Wener & Rehm, 
1975) offer unambiguous support for this 
hypothesis. In both studies, depressed and 
nondepressed subjects were exposed to non- 
contingent success and failure outcomes. Fol- 
lowing failure, depressed subjects’ responses 
showed more depression, less self-confidence, 
and decreased expectancies for future suc- 
cess. No difference between groups was re- 
ported with success outcomes. Although these 
results support the hypothesis of this model, 
they appear inconsistent with the learned 
helplessness studies of Miller and Seligman 
(1975, 1976), which reported that depressed 
subjects exposed to learned helplessness 
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training do not differ from depressed sub- 
jects without learned helplessness exposure. 
However, as is discussed above, Miller and 
Seligman (1975, 1976) reported that in both 
studies, two of eight depressed subjects ex- 
hibited performance similar to that of non- 
depressed/no learned helplessness subjects on 
the test task. The present view is that these 
four subjects may have attributed learned 
helplessness outcomes to experimenter con- 
trol, an external, variable, and specific cause. 
Thus, for these depressed subjects, although 
learned helplessness outcomes may have been 
congruent with their prior expectancies, other 
situational cues may have led to attributions 
of experimenter control, which in turn may 
have led to nondepressed performance and 
therefore to a reduction in the difference be- 
tween the depressed/learned helplessness 
group and the depressed/no-treatment group. 
Miller and Seligman (1976) reported that 
the remaining depressed/learned helplessness 
subjects did show greater performance defi- 
cits than the depressed/no-treatment group, 
but these differences were not statistically 
significant, They further suggested the possi- 
ble operation of a floor effect not found in the 
Hammen and Krantz (1976) or Wener and 
Rehm (1975) studies. These data, though 
somewhat ambiguous, tend to support the 
present hypothesis, that depressed subjects 
tend to attribute outcomes of noncontingency 
and failure to internal, stable, and general 
causes and therefore tend to exhibit greater 
depression and performance deficits. 

This hypothesis, in combination with the 
basic etiological hypotheses of the model, 
suggests the following chronology of reactive 
depression: Due to some combination of situ- 
ational cues and repeated exposure to non- 
contingent and nondesired outcomes, the in- 
dividual’s attributions of these outcomes 


5 Miller and Seligman (1973) and Hammen and 
Krantz (1976) have reported that depressed subjects 
do not differ from nondepressed ones in initial task 
expectancies. These studies, however, only used non- 
clinically depressed subjects and assessed only spe- 
cific task expectancy. The lowered expectancy dis- 
cussed above reflects a more generalized expectancy, 
which is probably more pronounced in clinically 
depressed populations. 
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changes from external, variable, and specific 
causes to internal, stable, and general causes. 
This change results in a change in future ex- 
pectancies, performance, and mood, Thus, in 
new situations, the individual expects non- 
contingency and failure, and when these con- 
gruent outcomes occur, they are attributed to 
internal, stable, and general causes, whereas 
discrepant outcomes of success and contin- 
gency are attributed to external, variable, 
and specific causes and do not influence fu- 
ture expectancies, performance, or mood, The 
individual is then depressed and tends to 
disregard outcomes of success and contingency 
while overgeneralizing failure and noncon- 
tingent outcomes. Thus, response initiation 
declines, a greater number of failure and 
noncontingent outcomes do occur, and the 
vicious circle of depression has begun. 


Generalization 


As is reported above, the lack of cross- 
situational generalization is a major obstacle 
to the significance of the learned helplessness 
paradigm. Seligman’s theory offers no con- 
vincing explanation for the lack of results, 
As is suggested above, in the Etiology section, 
the present revised model hypothesizes that 
еа generalization of learned 

elplessness is a function of the stability di- 
mension of the attributions constructed. Ac- 
cording to this hypothesis, cross-situational 
generalization should result if and when at- 
tributions of learned helplessness outcomes 
are made to relatively stable causes and 
should not result when attributions are made 
to relatively variable causes. 

However, performance deficits in a situa- 
tionally similar task may be due to attribu- 
tions that are relatively variable but that are 
stable for the duration of the experiment, 
such as experimenter control or task diffi- 
culty, or such deficits may be due to rela- 
tively stable causes such as ability or person- 
ality. The present model suggests that per- 
formance deficits that occur as a result of 
attributions to relatively variable causes are 
really “pseudohelplessness” because they do 
not represent a change in the individual’s 
basic expectancies or mode of adaptation. 
Deficits that occur as a result of attributions 
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to stable causes do result in more lasting, 
generalized learned helplessness. 

Awareness of the role of stability of attri- 
butions in the generalization of learned help- 
lessness leads to several other hypotheses and 
explanations concerning generalization of 
learned helplessness, First, since the typical 
nondepressed college student tends to expect 
response-outcome dependence and success 
(Parducci, 1963, 1965, 1968), the outcomes 
of the learned helplessness training phase 
will be highly discrepant and thus are pre- 
dicted to be attributed to causes that are 
relatively variable. Since attributions to 
variable causes are not predicted to result in 
cross-situational generalization of deficits, the 
present model predicts that studies using 
nondepressed college students would produce 
temporary, specific pseudohelplessness deficits 
during the experimental task situation, but 
not the relatively permanent, generalizable 
learned helplessness described by Seligman, 
unless very strong situational cues designed to 
produce more stable attributions were uti- 
lized. One of the few studies to report cross- 
situational generalization used such situa- 
tional cues. Roth and Kubal (1975) used 
three situational manipulations, each of 
which would tend to increase the probability 
of an attribution to a stable cause and thus 
to cross-situational generalization of learned 
helplessness, 

This interpretation of the learned helpless- 
hess research is crucial to future research in 
the area because (a) it explains the current 
lack of documented cross-situational general- 
ization, and (b) it suggests that to produce 
generalized learned helplessness, the experi- 
mental situational cues will have to be manip- 
ulated to induce attributions to relatively 
stable causes. 


Alleviation 


The final statement of the present model 
concerns alleviation or treatment of learned 
helplessness. Following the basic statement 
of the model, treatment of learned helpless- 
ness is focused on changing the subjects’ at- 
tributions, The first step in treating learned 
helplessness is assessing the type of learned 
helplessness involved, Are the obtained defi- 
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cits due to the pseudohelplessness produced 
by an attribution to external, variable, and 
specific causes, or are deficits due to learned 
helplessness produced by internal, stable, and 
general attributions? If the deficits are due 
to pseudohelplessness, then a change in out- 
comes or situations would be sufficient to 
change future expectancies and performance. 
That is, if one attributed learned helplessness 
outcomes of response-outcome independence 
and failure to a variable or specific cause, 
then a change in these outcomes will be suf- 
ficient to change expectancies for future out- 
comes. This reasoning suggests that for 
pseudohelplessness, exposure to experiences 
of response-outcome dependence and suc- 
cess will alleviate deficits. The success of this 
treatment with nondepressed subjects has 
been demonstrated by Klein and Seligman 
(1976) and Kilpatrick-Tabak and Roth 
(Note 4). 

On the other hand, if deficits are due to 
learned helplessness or depression caused by 
an attribution of learned helplessness out- 
comes to internal, stable, and general causes, 
then exposure to response-dependent success 
will be attributed to variable and specific 
causes and will not influence future expec- 
tancies of performance. The failure of expo- 
sure to response-dependent success to allevi- 
ate in depressed subjects was demonstrated 
by Kilpatrick-Tabak and Roth (Note 4). 
According to the present model, treatment 
of learned helplessness or depression would 
require a direct focus on changing the attribu- 
tions themselves. Examples of these changes 
and the effectiveness of this type of treat- 
ment can be seen in the studies of Klein et 
al. (1976) and Dweck (1975). 

The revised formulation of learned help- 
lessness has now been presented. The interac- 
tion of situational cues, outcome cues, and 
individual differences was hypothesized to 
result in an attribution to explain learned 
helplessness outcomes. The characteristics of 
this attribution were postulated to mediate 
the influence of learned helplessness out- 
comes on expectancies and behavior in future 
situations, Four dimensions of attributions 
were discussed: (a) locus of control, (b) sta- 
bility, (c) specificity, and (d) importance. 
Attributions characterized as internal, stable, 
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general, and important were predicted to max- 
imize learned helplessness. 

This model appears to offer a more accurate 
and predictive theory of learned helplessness. 
We hope that the model will open new ave- 
nues of research within the learned helpless- 
ness paradigm, Beyond testing the specific 
hypotheses of the model itself, the focus on 
cognitive processes and attributions may lead 
to an increased awareness of the complexity 
of human cognition and the inclusion of 
measures of cognition in experimental designs, 
Also, the present model can serve as a spring- 
board for investigation of interactions of out- 
come and situational cues with individual 
differences variables and of the relations of 
these interactions with cognitive processes 
and future behavior. 
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In an attempt to clarify and operationalize the concepts of alienation and in- 
volvement, this article critically examines several current sociological and psy- 
chological approaches to the concepts. Several common sources of confusion 
surrounding the treatment of the concepts are identified. A motivational formu- 
lation of the concepts is suggested to achieve greater parsimony and integration 
of the diverse sociological and psychological thinking on the issue. Finally, some 
major implications of this motivational formulation for future empirical studies 
on alienation and involvement at work and in other situations are indicated. 


In the social science literature of the past 
two decades, one encounters very often the 
usage of the concept alienation and its ob- 
verse, involvement (Johnson, 1973). These 
concepts have been used by sociologists, psy- 
chologists, political scientists, theologians, 
philosophers, and historians to describe and 
explain various contemporary social phe- 
nomena. In fact, the terms alienation and in- 
volvement have been used so often and in so 
many contexts that they have acquired an 
aura of equivocality. As Seeman (1971) 
pointed out, the concept of alienation has 
been *popularly adopted as the signature of 
the present epoch. It has become routine to 
define our troubles in the language of aliena- 
tion and to seek solutions in those terms. But 
signatures are sometimes hard to read, some- 
times spurious, and sometimes too casually 
and promiscuously used. They ought to be 
examined with care" (p. 135). Similar con- 
cern was expressed by Johnson (1973), who 
characterized the concept of alienation as 
being capable of carrying a great deal of feel- 
ing “in an inexplicit, perplexing and deeply 
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annoying way” (p. 28). Although in recent 
years many psychologists and sociologists have 
attempted to demystify and to operationalize 
the concept (Lawler & Hall, 1970; Lodahl & 
Kejner, 1965; Saleh & Hosek, 1976; Seeman, 
1971; Vroom, 1962), none of them seem to 
offer a scientifically organized and meaningful 
view of the concept that could have broad 
generality across cultures. Although most of 
the researchers have tried to explain the phe- 
nomena of alienation and involvement in so- 
cial psychological terms (Clark, 1959; Lawler 
& Hall, 1970; Seeman, 1959), the language 
they have used seems to have created more 
confusion than clarity. Sociological and psy- 
chological explanations of the phenomena seem 
to run parallel courses of their own without 
any serious attempt at integration. In fact, if 
one puts together the various explanations of 
the phenomena advanced by these writers, one 
ends up with greater conceptual fuzziness 
rather than clarity or understanding. If we 
take seriously Seeman’s (1971) call for a 
careful examination of the concept for better 
clarity and rigor, we ought to seek a reformu- 
lation of the issue. The present article is an 
effort in this direction. 

First, the article traces the manifold nature 
of the concept of alienation as it has been 
viewed by several researchers in this area. 
Second, the article identifies the major sources 
of confusion surrounding this concept. And 
finally, the article presents a motivational 
formulation of alienation and a comprehen- 
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sive conceptual framework that could be used 
in future research. Such a framework is aimed 
at integrating the sociological and the psycho- 
logical thinking on the issue and at providing 
a more complete understanding of the com- 
plex phenomena of alienation and involve- 
ment in a parsimonious way. It must be 
pointed out that the terms alienation and in- 
volvement are used here to indicate bipolar 
states of the same phenomena. Furthermore, 
since the phenomena have been studied em- 
pirically mostly in work situations, I have 
chosen to discuss specifically the nature of 
alienation and involvement at work in some 
detail. This choice however does not imply 
that one should limit the use of the concepts 
of alienation or involvement to work situa- 
tions alone. The framework suggested in this 
article is in fact intended to be used by re- 
searchers not only to study work alienation 
phenomena but also to study alienation in 
family life, in religious contexts, and for that 
matter in any other specific aspects of one's 
environment. 


Alienation and Involvement: 
Some Earlier Formulations 


Although contemporary social scientists 
often consider the state of alienation in indi- 
viduals as distinctly a postindustrial phenome- 
non, theologians and philosophers claim other- 
wise. As Johnson (1973) pointed out, social 
alienation as an observed phenomenon is quite 
ancient and the term alienation is an antique 
one. Theologians take the credit for using 
the term as an explanatory concept. Aliena- 
tion, according to them, refers to states of 
separation of human beings from God, from 
their own bodies, from their fellow human 
beings, and from their institutions. This in- 
terpretation is clearly reflected in the use of 
dualism of body and soul in theological writ- 
ings. Essentially, the theologians observed and 
explained the meaninglessness of human ex- 
istence in terms of spiritual alienation or 
separation from God and moral principles. 
Although alienation as a psychological state 
of the individual (or as a collective social 
phenomenon) has been recognized for cen- 
turies, the scientific treatment of the concept 
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with regard to its nature and its effect was 
attempted first by empirically oriented so- 
ciologists and then, more recently, by social 
psychologists. Thus the concept has lived 
through two distinct traditions, the rational 
and the empirical, The rational tradition of 
the concept comes largely from the writings 
of theologians (Macquarrie, 1973) and philos- 
ophers (Denise, 1973). The empirical tradi- 
tion results from the recent works of sociolo- 
gists and psychologists. The main focus of 
the present article is a critical review of the 
empirical tradition. 

In dealing with states of alienation in the 
spiritual life of individuals, theological ap- 
proaches emphasized the idea that there can 
be alienation of different sorts depending on 
what elements of one's environment one is 
separated from (such as God, one's own body, 
other people, etc.). Following a similar line 
of reasoning, the social scientists of today talk 
of different kinds of alienation such as job 
alienation, organizational alienation, urban 
alienation, family alienation, and so on. From 
the point of view of conducting empirical re- 
search, social scientists consider the study of 
alienation in relation to single, well-defined 
environmental elements (such as job, family, 
etc.) to be more fruitful than the study of 
alienation in a global sense (Clark, 1959; 
Seeman, 1971). The theological and philo- 
sophical approaches to the concept influenced 
the thinking of modern social scientists in yet 
another way. The more recent writers have 
identified the core meaning of the concept of 
alienation as a dissociative state of the indi- 
vidual (a cognitive sense of separation) in 
relation to some other element in his or her 
environment (Schacht, 1970). In the follow- 
ing section, an overview of the sociological 
and the psychological approaches to the con- 
cept is presented. 


The Sociological Approach 


The contributions of sociologists in explain- 
ing the nature of alienation have been the 
most extensive. In the classics of sociology, 
such as in the writings of Marx, Weber, 
and Durkheim, the concept of alienation re- 


ceived very comprehensive treatment. Al ^ 
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though Rousseau was the first to provide so- 
ciological treatment of the concept, it was 
Marx and later Weber who put the concept 
on firmer analytic ground. The evidence of 
their powerful influence still persists in the 
works of contemporary sociologists (Seeman, 
1959, 1971). 

If theologians identified the basic psycho- 
logical state of alienation in men in their 
spiritual lives, Marx identified it in their ma- 
terial, working lives. According to Marx 
(1844/1932), labor or working on a job is 
the “existential activity of man, his free con- 
scious activity—not a means of maintaining 
his life but for developing his universal na- 
ture” (pp. 87-88). Thus, ideally a state of 
work involvement can result when the work 
situation elicits job behavior that is perceived 
to be (a) voluntary, (b) not instrumental in 
satisfying basic physical needs, (c) instru- 
mental in satisfying Maslow-type (1954) 
higher order needs such as the need for self- 
realization or self-actualization, and (d) con- 
ducive to developing individuals’ abilities to 
their fullest potential. In the absence of such 
perceptions, the individual worker is bound to 
experience a state of alienation from work. 
Most work setups according to Marx provide 
conditions that alienate workers rather than 
involve them. Marx identified two major job 
conditions that are responsible for alienation 
among workers. They are (a) separation of 
workers from the products of their labor and 
(b) separation of workers from the means of 
production. The first job condition implies 
that the product is perceived as not belong- 
ing to the worker. The worker also perceives 
that he or she cannot influence the disposition 
or quality of the product. Thus he or she 
lacks a sense of ownership and of control over 
the product and its quality. The second job 
condition implies that the worker perceives a 
lack of control over the function of the ma- 
chines and other means of production. Find- 
ing that he or she has no control over working 
life, the worker is bound to be estranged or to 
separate working life from the rest of his or 
her existence over which Marx assumes the 
worker has complete control. 

From the preceding discussion, it becomes 
+ obvious that it is the lack of autonomy and 
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control of one’s behavior and its effects that 
defines the Marxian concept of alienation. If 
one translates the alienation state of the 
worker into motivational terms, it becomes 
quite clear that Marx intended to measure 
alienation in terms of the satisfaction of a 
single set of needs, the ego needs for inde- 
pendence, achievement, and power. In the 
Marxian formulation, the role of other human 
needs, such as the physical and the social ones, 
has been completely disregarded, as if the 
needs do not constitute a part of one’s self 
or perhaps constitute a very insignificant part 
exerting almost no influence in causing states 
of alienation. This interpretation of the Marx- 
jan formulation may appear oversimplified, 
but it is clearly reflected in the following 
quotation from Marx (1844/1932) : 


What constitutes alienation of labor? First, that 
work is external to the worker, that it is not part of 
his nature; and that, consequently, he does not fulfill 
himself in his work but denies himself, has a feeling 
of misery rather than well-being, does not develop 
freely his mental and physical energies but is physi- 
cally exhausted and mentally debased. The worker 
therefore feels himself at home only during his leisure 
time, whereas at work he feels homeless. His work is 
not voluntary but imposed, forced labor. It is not the 
satisfaction of a need, but only a means for satis- 
fying other needs. (pp. 85-86) 


One may notice the assumptions Marx makes 
while defining the state of alienation. Clearly 
he emphasizes the worker’s experience of 
frustration of his or her autonomy and con- 
trol needs at work, and whenever these needs 
are frustrated, Marx considers work to be ex- 
ternal to the worker’s self. 

Another assumption made by Marx in the 
quotation relates to the instrumental and con- 
summatory properties of job behavior. Accord- 
ing to Marx, job behavior can be either in- 
strumental activity that satisfies basic physi- 
cal human needs or it can be consummatory 
activity. In the former sense job behavior is 
viewed as means to an end (satisfaction of 
extrinsic needs), and in the latter sense it is 
viewed as an end in itself. Theories of human 
motivation suggest that human behavior is 
purposive; it has directionality; it is initiated 
by need states; and it is always instrumental 
in satisfying these need states. An individual’s 
job behavior is also purposive; it is aimed at 
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satisfying both the extrinsic and intrinsic 
need states (Lawler, 1973) of the individual. 
However, when Marx wrote of job behavior 
as an end in itself (reflecting a state of in- 
volvement), he did not recognize that such 
behavior is also instrumental in satisfying a 
set of intrinsic human needs. 

Weber’s treatment of the concept of aliena- 
tion is very similar to that of Marx. As Gerth 
and Mills (1946) put it, *Marx's emphasis 
upon the wage worker as being ‘separated’ 
from the means of production becomes, in 
Weber's perspective, merely one special case 
of a universal trend. The modern soldier is 
equally ‘separated’ from the means of vio- 
lence, the scientist from the means of en- 
quiry, and the civil servant from the means of 
administration” (p. 50). Weber’s exposure to 
the American way of life (political democracy 
and economic capitalism) and his study of 
the Protestant religion convinced him that the 
spirit of the Protestant work ethic is the key 
to the realization of man’s potentialities to the 
fullest extent. As Gerth and Mills wrote, 
Weber was impressed by the “grandiose ef- 
ficiency of a type of man, bred by free as- 
sociations in which the individual had to prove 
himself before his equals, where no authorita- 
tive commands, but autonomous decisions, 
good sense, and responsible conduct train for 
citizenship” (p. 18). Such is the image Weber 
had of an involved worker. Like Marx, Weber 
also placed emphasis on the freedom to make 
one’s own decisions, on assuming personal re- 
sponsibility, and on proving one’s worth 
through achievement at work. Translated into 
motivational terms, this implies that if the 
work setup cannot provide an environment 
that satisfies the needs for individual auton- 
omy, responsibility, and achievement, it will 
create a state of alienation in the worker. 

Unlike Marx and Weber, who viewed aliena- 
tion as resulting primarily from perceived lack 
of freedom and control at work, Emile Durk- 
heim, the French sociologist, saw it as a con- 
sequence of a condition of anomie, or the 
perceived lack of socially approved means 
and norms to guide one’s behavior for the 
purpose of achieving culturally prescribed 
goals (Blauner, 1964; Durkheim, 1893; Shep- 

ard, 1971). The condition of anomie is often 
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considered a postindustrial phenomenon. As 
Blauner (1964) observed, industrialization 
and urbanization of modern society have “de- 
stroyed the normative structure of a more tra- 
ditional society and up-rooted people from the 
local groups and institutions which had pro- 
vided stability and security" (p. 24). No 
longer being able to feel a sense of security 
and belonging, modern men and women find 
themselves isolated from others. This form of 
social alienation often results in normlessness 
and in its collective form manifests itself in 
various forms of urban unrest. In social psy- 
chological terms, this variant of alienation 
seems to stem from the frustration of social 
and security needs, the need to belong to 
groups for social approval and social compari- 
son (Festinger, 1954; Maslow, 1954). The so- 
cial psychological processes that explain how 
this form of alienation comes about are dis- 
cussed later in the article. 

The strong impact of Marx, Weber, and 
Durkheim is quite evident in contemporary 
sociological writings on the subject of aliena- 
tion and involvement. For instance, Dubin 
(1956) defined involvement as central life in- 
terest. According to him, a job-involved pet 
son is one who considers work to be the most 
important part of his or her life and engages 
in it as an end in itself. A job-alienated per- 
son, on the other hand, engages in work in à 
purely instrumental fashion and perceives 
work as providing financial resources for more 
important off-the-job activities. Faunce (Note 
1) also considered job involvement as a com- 
mitment to a job in which successful perform- 
ance is regarded as an end in itself rather than 
as a means to some other end. For both Dubin 
and Faunce, the concepts of involvement and 
alienation are intimately related to the Prot- 
estant work ethic, the moral value of work, 
and personal responsibility as conceived by 
Weber. 

In an attempt to clarify the concept of 
alienation, Seeman (1959, 1971) has proposed 
five different variants of the concept: power- 
lessness, meaninglessness, normlessness, isola- 
tion, and self-estrangement. According 10 
Seeman, each variant refers to a different sub- 
jectively felt psychological state of the indi- 
vidual, caused by different environmental con- 
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ditions. Several other researchers, particularly 


" Blauner (1964) and Shepard (1971), have 


used Seeman's classification and have tried to 
provide operational measures of the different 
categories of alienation at work. They also 
have suggested the antecedent physical and 
social conditions that produce each state of 
alienation. 

Alienation in the form of powerlessness in 
the most general sense refers to a perceived 
lack of control over important events that aí- 
fect one's life. Seeman (1959) used this vari- 
ant of alienation to explain and describe men 
and women's alienation from the larger social 
order. An individual's inability to control and 
influence political systems, industrial econ- 
omies, or international affairs may create a 
sense of powerlessness in him or her. Aliena- 
tion in the sense of powerlessness also has 
been observed in job situations. For instance, 
Shepard (1971) described powerlessness at 
work as "the perceived lack of freedom and 
control on the job" (рр. 13-14). Blauner 
(1964) expressed similar views when he 
stated that “the non-alienated pole of the 
powerlessness dimension is freedom and con- 
trol" (p. 16). According to Blauner, the 
powerlessness variant of alienation at work 
results from the mechanization process that 
controls the pace of work and thus limits 
workers’ free movements. If one analyzes the 
sociological concept of powerlessness in mo- 
tivational terms, it becomes obvious that if a 
situation constantly frustrates an individual’s 
needs for autonomy and control, it will create 
in him or her a state of alienation of this type. 

The second type of alienation is identified 
as a cognitive state of meaninglessness in the 
individual. In such a state, the individual is 
unable to predict social situations and the out- 
comes of his or her own and others’ behavior. 
In the work setup such a state results from 
increasing specialization and division of labor. 
When the work process is broken down into 
simple minuscule tasks, and when such simple 
tasks involve no real responsibility and deci- 
sion making, the work situation robs the 
worker of any sense of purpose. The job be- 
comes meaningless for the worker. Meaning- 
lessness of work may also result when the 
worker is not able to see the relation of his 
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or her work to the total system of goals of the 
organization (Blauner, 1964; Shepard, 1971). 
Translated into motivational terms, this im- 
plies that continued frustration of an indi- 
vidual’s needs for assuming personal respon- 
sibility and for gaining greater competence on 
the job (by being more knowledgeable about 
the environment for the sake of influencing 
it) causes this type of alienation. It may be 
noted that both the powerlessness and the 
meaninglessness interpretations of work alien- 
ation bear the mark of the Marxian belief 
that lack of control and freedom over the 
work process is the main cause of alienation. 

The two other forms of alienation sug- 
gested by Seeman (1959) have their roots in 
Durkheim’s (1893) description of anomie. 
Anomie refers to the perceived conditions of 
one’s social environment, such as the percep- 
tion of a breakdown of social norms regulat- 
ing individual conduct in modern societies. 
The two forms of alienation that result from 
such perceived conditions of one’s social en- 
vironment are normlessness and isolation. An 
individual may develop a sense of normless- 
ness when he or she finds that previously 
approved social norms are no longer effective 
in guiding behavior for the attainment of per- 
sonal goals. In other words, the individual 
finds that to achieve given goals it is neces- 
sary to use socially unapproved behavior. 
Finding that he or she can no longer share 
the normative system because of its ineffec- 
tiveness, the person may develop norms of his 
or her own to guide behavior. Because his or 
her norms are different from those of others, 
the individual may eventually perceive him- 
self or herself as being separate from society 
and its normative system. The dissociation 
of oneself from others results in the percep- 
tion of social isolation, The dissociation of 
oneself from social norms results in norm- 
lessness or cultural estrangement. Alienation 
in the sense of social isolation and cultural 
estrangement refers to the perceived states 
of loneliness and rootlessness respectively 
(Seeman, 1971). It may be noticed that these 
two variants of alienation are related because 
they stem from the same basic condition of 
anomie. 

States of loneliness and rootlessness have 
also been identified in work environments. 
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Blauner (1964), for instance, suggested that 
these forms of social alienation may be 
manifested on the job due to the lack of so- 
cial integration of the worker. When an or- 
ganization does not provide the worker any 
opportunity for developing a sense of mem- 
bership or belonging in the social system, the 
worker is bound to show a sense of isolation 
from the system and its goals. From a moti- 
vational point of view, the two variants of 
social alienation, isolation and normlessness, 
seem to be based on two different social needs 
of the individual. Continuous frustration of 
the membership or belonging need of the in- 
dividual may be the crucial determinant of 
the isolation form of alienation. The norm- 
lessness form of alienation, however, is deter- 
mined by continuous frustration of another 
social need, the need to evaluate oneself 
through social comparison (Festinger, 1954). 
In the context of social influence theories, 
social psychologists (Jones & Gerard, 1967) 
have postulated two major kinds of influences 
that groups exert on the individual. They are 
referred to as the normative and the informa- 
tional social influences. By being a member 
of the group and by adhering to the group 
norms, the individual fulfills his or her need 
to belong, to love, and to be loved by others. 
When, however, the group norms are perceived 
to be too restrictive and in conflict with the 
individual’s personal goals, they cease to 
influence the individual. The group loses its 
normative influence on the individual. The 
person becomes an isolate in relation to the 
group. He or she perceives himself or herself 
as one who no longer belongs to the group 
and no longer is loved by others in the group. 
Such a psychological state can be identified 
as the isolation form of alienation. The indi- 
vidual also depends on the group norms for 
self-evaluation and for evaluating his or her 
abilities and opinions (Festinger, 1954). 
Group norms generally provide the person 
with information on how to behave, on what 
is right and what is wrong. When the indi- 
vidual finds that group norms do not provide 
useful information for self-evaluation, he or 
she may separate himself or herself from 
these norms and experience a state of norm- 
lessness. Thus, in terms of social influence 
theory, the two variants of social alienation 
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result from the failure of the groups to exer- | 


cise the two forms of social influence, norma- 
tive and informational. 

The final variant of alienation proposed by 
sociologists is self-estrangement. In many 
ways the characterization of this category of 
alienation has posed problems for sociological 
thinkers. Seeman (1971) admits that it is an 
“elusive idea” (p. 136), but then goes on to 
operationalize it. According to Seeman, a 
person is self-estranged when he or she is 
engaged in activity that is not rewarding in 
itself, but is instrumental (a means to an 
end) in satisfying extrinsic needs such as the 
needs for money, security, and so on. Fol- 
lowing Seeman (1959), Shepard (1971) con- 
siders instrumental work orientation, or the 
degree to which one works for extrinsic-need 
satisfaction, to be an index of the self-es- 
trangement kind of alienation in the work 
setup, Blauner (1964) suggests that a job 
encourages self-estrangement if it does not 
provide opportunity for expressing “unique 
abilities, potentialities, or personality of the 
worker” (p. 26). In motivational terms, 
Blauner’s observation means that whenever 
the individual finds his or her environment 
(job or otherwise) lacking in opportunities 
for the satisfaction of self-actualization needs 
(Maslow, 1954) through expression of his or 
her potentialities, he or she experiences 4 
state of self-estrangement. Following Marx, 
many contemporary sociologists believe that 
self-estrangement is the heart of the aliena- 
tion concept, as if all other forms of aliena- 
tion eventually result in self-estrangement 
Blauner (1964) attests to this belief in the 
following remark: *When work activity does 
not permit control [powerlessness], evoke 4 
sense of purpose [meaninglessness], or en- 
courage larger identification [isolation], em- 
ployment becomes simply a means to the 
end of making a living" (p. 3). 


Characteristics of the Sociological Approach 


At this point it may be helpful to identify 
some dominant considerations that have 
guided most sociological treatments of the 
concept of alienation. 


First, one notices a stronger emphasis in 


sociological writings on the analysis and 
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measurement of the state of alienation than 

* on the analysis and measurement of the state 
of involvement. In a sense, sociologists have 
focused their attention on the negative side 
of the issue with a clinical perspective on 
social systems. Thus, they have been more 
concerned with the diagnosis of states of ali- 
enation in social systems and consequent so- 
cial maladies than with the identification of 
conditions for social involvement and growth. 
Like Freudian psychologists who attempt to 
explain human nature through an analysis of 
pathological psychological states, sociologists, 
taking the lead from Marx, have emphasized 
the analysis of alienation and resulting 
pathological states to explain the nature of 
social systems. In the same way as the 
Freudian influence in psychology delayed the 
formulation of growth theories of personality 
and motivation (Maslow, 1954; Allport, 
1961), the Marxian influence in sociology 
may have retarded the progress of sociologi- 
cal theories in understanding better the na- 
ture of healthy and growing social systems. As 
is discussed later, unlike the sociological ap- 
proaches outlined above, the current psycho- 
logical approaches to the issue are trying to 
attack the problem from the positive side 
through the study of the concept of involve- 
ment. 

The second consideration that has domi- 
nated various sociological treatments of ali- 
enation is their emphasis on studying aliena- 
tion in groups and social systems. The level 
of analysis of the concept in most sociological 
approaches has been at the social system 
level rather than at the individual level. This 
has created measurement problems. Although 
sociologists often talk of the frequency of 
volatile activism, of suicide rates, of crime 
rates, and so on as indices of alienation in 
social systems, they find it hard to establish 
and theoretically justify the validity and the 
reliability of these measures. The records on 
such social maladies are notoriously unreli- 
able. Very often incidents of activism, crime, 
and suicide go unreported. Even if the inci- 
dents are recorded accurately, it is often 
difficult to infer from these data states of 
alienation in individual persons. For instance, 
an activist in his or her desire to bring about 

* changes in the social system may be showing 
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signs of greater involvement in the social sys- 
tem than would an apathetic conformist. 

Third, sociological approaches generally 
describe the state of alienation not in specific 
behavioral terms, but in terms of epiphenom- 
enal categories. As Johnson (1973) pointed 
out, alienation is seen as “an epiphenomenal 
abstraction, collectively summarizing a series 
of specific behaviors and categorizing them 
as ‘loneliness,’ ‘normlessness,’ ‘isolation,’ etc.” 
(p. 40). Such epiphenomenal descriptions of 
the concept may have the flavor of intellec- 
tual romanticism, but have very little scien- 
tific value because they pose problems of 
empirical verification. The concept of aliena- 
tion as an epiphenomenal abstraction tends to 
carry excess meaning and therefore eludes 
precise measurement. Besides, such an ab- 
straction merely describes alienation; it does 
not explain it. 

Finally, most sociological approaches con- 
sider the presence of individual autonomy, 
control, and power over the environment as 
basic preconditions for removing the state of 
alienation. 


The Psychological Approach 


The psychological approach to the concept 
of alienation has been somewhat sketchy 
compared to the sociological approach de- 
scribed earlier. In psychological literature, 
the treatment of the concept does not have as 
long and as rich a tradition as in sociology. 
The interest in the concept is very recent 
among psychologists, and they have essen- 
tially taken an empirical (and exploratory) 
approach to the study of the problem. Devel- 
opment of psychological theories to explain 
the phenomenon of alienation is simply ab- 
sent from the literature. Furthermore, in con- 
trast to the sociological approach, psycholo- 
gists have attempted to analyze the nature 
of alienation only in the limited context of 
job situations. Unlike sociologists, psycholo- 
gists have studied the problem of alienation 
from the point of view of job involvement 
and have attempted to define and measure 
involvement at work rather than alienation 
at work. 

In trying to explain the nature of job 
involvement, psychologists have concentrated 
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on the analysis of specific motivational states 
of the individual in work situations. Psycho- 
logical explanations are based on motivation 
theories and therefore tend to emphasize the 
need-satisfying qualities of the job as basic 
determinants of job involvement. For in- 
stance, Vroom (1962) proposed that a per- 
son's attempts to satisfy his or her needs for 
self-esteem through work on the job lead to 
job involvement. In his study, “The degree of 
job involvement for a particular person was 
measured by his choice of ‘ego’ rather than 
extrinsic factors in describing the sources of 
satisfaction and dissatisfaction on the job" 
(p. 161). Vroom seems to emphasize intrin- 
sic-need satisfaction as the essential condi- 
tion for higher job involvement. In his view, 
higher autonomy extended to the individual 
results in higher ego involvement, which in 
turn leads to a higher level of job perform- 
ance, 

Lodahl and Kejner (1965) proposed two 
definitions of job involvement. One of their 
definitions states that “job involvement is 
the degree to which a person is identified 
psychologically with his work, or the im- 
portance of work in his total self-image" (p. 
24). Such a psychological state of identifica- 
tion with work may result partly from early 
socialization training during which the indi- 
vidual may internalize the value of the good- 
ness of work, Lodahl and Kejner (1965) 
recognized this possibility. They stated that 
the concept of job involvement "operation- 
alizes the ‘protestant ethic’ and because it is 
a result of the introjection of certain values 
about work into the self, it is probably re- 
sistant to changes in the person due to the 
nature of a particular job" (p. 25). Lodahl 
and Kejner also provided another definition 
of job involvement; this definition states that 
job involvement is “the degree to which a per- 
son's work performance affects his self-es- 
teem” (p. 25). These two definitions are 
quite distinct, and Lodahl and Kejner made 
no attempt in their study to show how the 
two are related. In fact, the questionnaire 
measure of job involvement they developed 
includes items reflecting both definitions. Use 
of their questionnaire measure in job involve- 
ment research therefore provides data that 
are hard to interpret. 
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Recently Rabinowitz and Hall (1977) 


critically reviewed the work of several re. ` 


searchers who have made use of the defini. 
tions of job involvement mentioned above, 
Their review clearly suggests that there is a 
great deal of confusion and ambiguity in 
theories about job involvement. Furthermore, 
as these authors pointed out, “The confusion 
does not stop at the theoretical level, but 
rather continues in the empirical studies of 
involvement" (p. 267). 

Weissenberg and Gruenfeld (1968) investi- 
gated the relationship between satisfaction 
with various job factors and job involvement, 
They concluded that increased job involve- 
ment is positively related to satisfaction with 
motivators or job-content factors (Herzberg, 
1966) such as achievement, responsibility, 
independence, and so forth. These motivators 
tend to satisfy the intrinsic needs of the indi- 
vidual. The extrinsic needs, however, are 
satisfied through job-context factors such as 
company policies, nature of supervision, sal- 
ary, benefits, and working conditions. Ac- 
cording to these researchers, satisfaction with 
the job-context factors is unrelated to job 
involvement, but the latter can be predicted 


from the satisfaction with the motivators in } 


the job. 
Lawler and Hall (1970) for the first time 
distinguished the psychological state of job 


involvement from two other psychological | 


states of the worker. According to Lawler and 
Hall, job involvement is different from both 
intrinsic motivation on the job and job satis- 
faction. Intrinsic motivation refers to a state 


of the individual in which satisfaction of the | 


intrinsic needs is contingent upon appropriate 
job behavior and in which job satisfaction 
results from satisfaction of the needs of the 
individual through the attainment of job out- 
comes without any regard to the contingen- 
cies of the outcomes, Lawler and Hall argue 
in favor of the definition of job involvement 
suggested by Lodahl and Kejner (1965): 
Job involvement is seen in terms of psycho- 
logical identification with work or the im- 
portance of work to one's total self-image. In 
general Lawler and Hall suggest that job in- 
volvement refers to the “degree to which 4 
person’s total work situation is an important 
part of his life. The job-involved person is 


| 


1 
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one who is affected very much personally by 

" his whole job situation, presumably because 
he perceives his job as an important part of 
his self-concept and perhaps as a place to 
satisfy his important needs (e.g., his need 
for self-esteem)" (pp. 310-311). It appears 
that in defining the concept of involvement, 
Lawler and Hall assumed that intrinsic or 
growth needs (Alderfer, 1972) are central to 
the self-concept of the individual. To empha- 
size the centrality of intrinsic needs, they 
pointed out that *the more the job is seen to 
allow the holder to influence what goes on, 
to be creative, and to use his skills and abili- 
ties, the more involved he will be in the job" 
(p. 310). In the same article, Lawler and 
Hall (1970) reiterated their position with 
the following remark: “Other things being 
equal, more people will become involved in a 
job that allows them control and a chance 
to use their abilities than will become in- 
volved in jobs that are lacking these char- 
acteristics" (p. 311). 

Patchen (1970) identified three general 
conditions for job involvement. According to 
him, “Where people are highly motivated, 
where they feel a sense of solidarity with the 
enterprise, and where they get a sense of pride 
for their work, we may speak of them as 
highly ‘involved’ in their job" (p. 7). When 
Patchen talks of workers being highly moti- 
vated, he refers to their high levels of achieve- 
ment need or to their wish to accomplish 
worthwhile things on the job. When he talks 
of workers’ solidarity with the enterprise, he 
refers to their need for belonging to the 
organization, Finally, when he talks of work- 
ers’ sense of pride, he refers to workers’ 
feeling of high self-esteem. Thus, in Patchen's 
view, when a job provides opportunities for 
the satisfaction of one's achievement needs, 
belonging needs, and self-esteem needs, one 
experiences a greater degree of job involve- 
ment, 

In a recent review of the psychological 
literature on job involvement, Rabinowitz 
and Hall (1977, p. 284) stressed that among 
other things, a job-involved person believes 
strongly in the Protestant ethic, has strong 
growth needs, and has a stimulating job that 
gives him or her a high degree of autonomy 

* and an opportunity for participation. 
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In another review of the psychological 
literature on job involvement, Saleh and 
Hosek (1976) identified four different inter- 
pretations of the concept of involvement: 
“A person is involved (1) when work to him 
is a central life interest; (2) when he actively 
participates in his job; (3) when he perceives 
performance as central to his self-esteem; (4) 
when he perceives performance as consistent 
with his self-concept” (p. 215). The first in- 
terpretation of the concept of involvement in 
terms of central life interest (Dubin, 1956) 
is very similar to the interpretation offered 
by Lawler and Hall (1970). The main idea 
underlying this interpretation is that the 
psychological state of involvement with re- 
spect to an environmental entity (such as 
job, family, etc.) is a cognitive or perceived 
state of identification with that entity. The 
second interpretation of involvement in terms 
of participation suggests that the psycho- 
logical state of involvement be viewed as 
behavioral acts of the individual directed 
toward the satisfaction of his or her needs 
for autonomy and control. Bass (1965), for 
instance, considered participative job behav- 
iors such as making important job decisions, 
setting one’s own work pace, and so on to be 
important indices of greater work involve- 
ment, The remaining two interpretations of 
involvement, namely, providing a sense of 
personal worth (Siegel, 1969) and reinforc- 
ing one’s self-concept (Vroom, 1964), sug- 
gest that involvement may be viewed as the 
experience of satisfaction resulting from the 
fulfillment of the individual’s self-esteem and 
self-actualization needs. From the results of 
their own factor analytic work, Saleh and 
Hosek (1976) concluded that job involve- 
ment is “the degree to which the person iden- 
tifies with the job, actively participates in it, 
and considers his performance important to 
his self-worth. It is, therefore, a complex 
concept based on cognition, action, and feel- 
ing” (p. 223). It is interesting to note that 
to achieve conceptual clarity Lawler and Hall 
tried to differentiate the state of involvement 
from intrinsic motivation and job satisfaction, 
whereas Saleh and Hosek brought them all to- 
gether again. 

At this point it must be noted that there is 
one common thread that runs through all the 


128 


psychological formulations outlined above. 
All of them seem to emphasize that situations 
lacking in opportunity for the satisfaction of 
intrinsic needs of the individual such as self- 
esteem, achievement, autonomy, control, selí- 
expression, and self-actualization will de- 
crease the individual's involvement in them. 
Even the recent studies on central life in- 
terest in work settings, on organizational 
identification, and on organizational commit- 
ment (Dubin, Champoux, & Porter, 1975; 
Hall & Schneider, 1972; Hall, Schneider, & 
Nygren, 1970) reflect this bias. It seems as if 
the lack of intrinsic-need satisfaction is the 
basic condition for increasing work aliena- 
tion. In this regard, psychologists seem to 
have followed the sociological tradition of 
considering the lack of individual freedom, 
power, and control as necessary preconditions 
of the psychological state of alienation. 


Sources of Confusion Surrounding 
the Concepts 


Five major sources of confusion can be 
identified in the literature on alienation and 
involvement. Not only in the theoretical 
treatment of the concepts but also in the op- 
erationalization of the concepts in empirical 
studies, one notices the presence of these 
sources of confusion. They have contributed 
to the exasperating conceptual ambiguity pre- 
vailing in the area, and it goes without say- 
ing that any meaningful scientific treatment 
of the concepts should guard against them. 

The most common source of confusion is 
the application of the concepts sometimes to 
specific individuals and sometimes to groups 
of individuals. Particularly in sociological 
writings one finds the use of the concept of 
alienation sometimes to describe the psycho- 
logical state of the individual and at other 
times to describe pathological states of large 
collectivities such as groups, organizations, 
and other socio-political systems. As Johnson 
(1973) correctly pointed out, “There is a 
difference in meaning between these two ap- 
plications that is not merely the difference 
between singular and plural categories. The 
phenomenology and the meaning connected 
with individual states of alienation are dif- 
ferent both in quality and significance from 
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those connected with the social, interactional, 
and collective applications of the term" (p, 
35). For example, to say that a worker is 
alienated can mean two things. It can sug- 
gest an instance of collective experience of 
worker alienation as reflected in absenteeism, 
tardiness, goldbrick, sabotage, and so on that 
results from the prevailing social and physi- 
cal conditions (mechanization, impersonal 
control through rules and regulations, etc.) 
within the organization, or it can suggest an 
individual worker's personal view of his or her 
work that does not meet his or her salient 
needs (unique to the individual) regardless 
of how other workers view the situation, 
From a methodological standpoint, it is ad- 
visable to approach the study of alienation at 
the personal rather than at the collective 
level of experience. Measurement and inter- 
pretation of the collective experience of 
alienation are often difficult and confusing. 

A second source of confusion stems from 
the fact that the concept of alienation has 
been described and measured in two dif- 
ferent ways, Sometimes the term alienation 
is used to imply objective social conditions 
directly observed by others and later attrib- 
uted to individuals and groups. Blauntt 
(1964), for instance, considered mechaniza- 
tion and division of labor to be the alienating 
conditions, and people working under these 
conditions were assumed to be experiencing 
alienation. At: other times, alienation has 
been interpreted as a subjective psychological 
state of the individual not detectable to 
outsiders but felt by the individual. Such а 
difference in the usage of the term has ob- 
vious implications for the operationalization 
of the concept. States of alienation measured 
through identification of objective conditions 
may not parallel the subjective measures of 
the concept. Mechanization and division of 
labor in an organization may be viewed by 
external observers as necessarily contributing 
to a state of alienation of the worker (pow- 
erlessness), but the worker may not perceive 
the situation in the same way. In fact, it is 
quite conceivable that for some workers 
(mentally and physically handicapped, un- 
skilled, uneducated, and many belonging to 
developing countries) mechanization and di- 
vision of labor may increase job involvement. 
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A third source of confusion (related to the 
preceding one) results from a failure to main- 
tain the conceptual distinction between the 
antecedent conditions of alienation and the 
consequent states of alienation, Here the con- 
fusion results from mistaking the cause as the 
effect. As Josephson and Josephson (1973) 
remarked, *Durkheim's notion of anomie or 
normlessness can be regarded as an important 
cause of alienation but should not be con- 
fused with alienation as a state of mind... . 
By the same token, alienation should not be 
confused with ‘social disorganization,’ since 
estrangement may be found in highly orga- 
nized bureaucracies" (p. 166). In spite of such 
warnings, both the sociological and the psy- 
chological formulations neglect to maintain 
the distinction between alienating conditions 
and alienating states, In fact, most empiri- 
cal researchers have attempted to measure 
the state of alienation through indices of alien- 
ating conditions instead of directly measur- 
ing it (as if the two were equivalent). For 
instance, Seeman (1959) considered norm- 
lessness to be the perception of a social situa- 
tion in which rules and norms regulating be- 
havior have broken down. Such perceptions 
may be the antecedent conditions of the 
alienated state, but they cannot be identified 
with the alienated state itself. Likewise, iso- 
lation, meaninglessness, and powerlessness 
may describe different conditions or causes of 
alienation, but should not be equated with 
it. Even when self-estrangement was mea- 
sured by Blauner (1964), he used several 
indices of alienating conditions on the job, 
such as whether the job met the worker's 
achievement needs, Shepard (1971) also mea- 
sured the different forms of alienation sug- 
gested by Seeman (1959) by measuring 
Various job conditions such as whether the 
job provided opportunity for participation 
and control (powerlessness), how the job fit 
into the total operation of the organization 
(meaninglessness), and the like. Clearly these 
kinds of questions probe into the assumed 
conditions or causes of alienation rather than 
into the state of alienation itself. In the 
psychological literature similar confusion may 
be noted. For instance, Saleh and Hosek 
(1976) have proposed a measure of job in- 
volvement that contains three distinct cate- 
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gories of items. The first category measures 
directly the state of alienation (e.g., with the 
item “The most important things I do are 
involved with my job"). The second category 
seems to index the antecedent conditions or 
presumed causes of alienation (e.g., with the 
item “How much chance do you get to do 
things your own way?"). Finally, a third 
category measures workers! behaviors and 
experiences that often (but not necessarily) 
result from the alienated state (e.g., the item 
“I avoid taking on extra duties and responsi- 
bilities in my work"). Thus Saleh and Hosek 
combine indices of causal conditions and 
effects of alienating states into one single in- 
strument. Such an instrument cannot provide 
meaningful data on the state of alienation of 
the worker. Needless to say, for both con- 
ceptual clarity and effective methodology in 
empirical studies, the state of alienation 
needs to be identified and measured sepa- 
rately from its causes as well as its effects. 

A fourth source of confusion results from 
the description of the state of alienation as 
being both a cognitive as well as an affective 
state of the individual. Most researchers have 
found it difficult to strip the concept of alien- 
ation from its negative affect. Traditionally, 
alienation has been associated with negative 
emotional states such as anger, dissatisfac- 
tion, and unpleasantness, and involvement 
has been associated with positive emotional 
states such as satisfaction and pleasantness. 
Many measures of alienation or involvement 
therefore contain items reflecting levels of 
satisfaction or dissatisfaction (e.g., the item 
“The major satisfaction in my life comes 
from my job” in Lodahl & Kejner, 1965). 
Recent empirical studies (Lawler & Hall, 
1970; Seeman, 1971) clearly suggest that 
work involvement and job satisfaction are 
not the same thing, although they may be 
related to one another. It may be more use- 
ful to conceptualize the states of involve- 
ment or alienation as cognitive or belief 
states of identity or dissociation (separate- 
ness) than as psychological states necessarily 
associated with feelings of satisfaction or dis- 
satisfaction. A cognitive state of dissociation 
may or may not accompany positive or nega- 
tive affect under certain conditions. A highly 
involved worker under some conditions may 


130 


feel a high level of satisfaction with his or 
her work and under other conditions may 
experience deep dissatisfaction. In the future, 
empirical work needs to be done to identify 
conditions under which involvement and alien- 
ation are related to positive, negative, and 
neutral affective states. 

Finally, some ambiguity regarding the con- 
cept of alienation has resulted from the con- 
fusion of two kinds of causation, contempo- 
raneous and Aistorical. Sometimes the state 
of alienation in an individual has been viewed 
as the result of the past history of the indi- 
vidual. For instance, Lodahl and Kejner 
(1965) suggested that work involvement of 
an individual is determined by the early 
socialization process during which the indi- 
vidual internalizes the values of the goodness 
of work or the Protestant ethic. In this 
sense, alienation from or involvement with 
work becomes a more stable characteristic of 
the individual, which he or she carries with 
him or her from one situation to another. 
Sociologists have viewed the historical causa- 
tion of alienation in a slightly different way. 
Following Marx, many sociologists have con- 
sidered job experience to be central to an 
individual's life. According to them, the long- 
standing social arrangements of technology, 
division of labor, and capitalist property in- 
stitutions have created the state of alienation 
from work (Blauner, 1964). Since work is 
central to one's life, alienation from work 
necessarily leads to alienation from all other 
aspects of life, As Seeman (1971) put it, 
“Perhaps the most important thesis concerns 
the centrality of work experience, the impu- 
tation being that alienation from work ‘is the 
core of all alienation’ and that the conse- 
quences of alienated labor color the life 
space of the individual in a profound and 
disturbing way” (p. 135). The state of alien- 
ation has also been conceived as being 
caused by contemporaneous events, For in- 
stance, Lawler and Hall (1970) consider the 
job-involved person to be one who is "affected 
very much by his whole job situation, pre- 
sumably because he perceives his job as an 
important part of his self-concept and perhaps 
as a place to satisfy his important needs 
(e.g., his need for self-esteem)” (pp. 310- 
311). These authors therefore consider the 
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worker’s present perceptions of the need- 
satisfying potentialities of the job to be a 
major determinant of the state of involve- 
ment, From the above discussion, it is ap- 
parent that a state of alienation or involve- 
ment with regard to the specific aspects of 
one's environment (such as work, family, re- 
ligion, etc.) may be jointly caused by two 
sets of events—one historical and the other 
contemporaneous. Through the socialization 
process (cumulative learning and experience 
of the past) the individual may develop a 
set of relatively stable beliefs and values 
regarding work, family, and so on, and the 
present experiences with them may either 
reinforce the beliefs and values or modify 
them. 


A Motivational Framework for the Study of 
Alienation and Involvement 


The following discussion is a description of 
a conceptual approach that can be used to 
study the phenomena of alienation and in- 
volvement in any specific aspect of one's life. 
To provide an example, however, I have 
chosen to discuss work involvement or the 
work setup as the specific environmental en- 
tity toward which a state of alienation ог 
involvement may develop in an individual. 
The approach is characterized as a motiva- 
tional one. It uses the existing motivational 
language to explain work alienation and in- 
volvement for two basic reasons. First, the- 
ories of human motivation at work (Mas 
low, 1954; Lawler, 1973) are generally ad- 
vanced to explain all work behavior, and 
alienation and involvement at work should 
not be considered exceptions. Second, the 
fact that the existing motivational constructs 
can adequately and parsimoniously explain 
work alienation phenomena lies hidden in 
many of the sociological and psychological 
formulations discussed earlier. Thus, @ 
clearer motivational formulation of the phe- 
nomena is needed to bring this fact to the 
surface. In addition to the use of motiva- 
tional language, the present approach is 
characterized by an emphasis on the following 
considerations. 

1. In defining the state of work alienation, 
the approach limits itself to the analysis of 
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the behavioral phenomenon at the individual 
level. It identifies the state of alienation with 
a cognitive belief state of the individual. As 
a cognitive state, work alienation becomes 
conceptually distinct from many associated 
covert feelings (affective states of the indi- 
vidual expressed in terms of satisfaction or 
dissatisfaction experienced on the job) and 
overt behavior (job participation, assuming 
responsibility, etc.). 

2, The approach emphasizes that the state 
of work alienation must be clearly distin- 
guished from its causes (antecedent condi- 
tions) and its effects (consequent conditions). 
It considers the phenomenon to be caused by 
two sets of events, historical and contempo- 
raneous. The approach also stresses that the 
cognitive state of work alienation has signifi- 
cant effects on subsequent job behavior and 
job attitudes. 

3. The present approach can integrate and 
adequately explain the different types of alien- 
ation at work suggested by sociologists 
within its own framework. 

It may be argued that the concepts of alien- 
ation and involvement should not be re- 
duced to a single dimension, since they rep- 
resent two distinct types of behavioral phe- 
nomena, The phenomenon of alienation has 
been described by sociologists at the collec- 
tive level (alienation of labor, alienated so- 
ciety, etc.), whereas the phenomenon of in- 
volvement has been identified by psycholo- 
gists at the individual level (involved 
worker). A closer scrutiny of the issue, how- 
ever, reveals that even when sociologists 
describe the concept of alienation at the col- 
lective level, they try to explain the phe- 
nomenon in terms of psychological states of 
the individual. A number of empirical socio- 
logical studies on alienation (Blauner, 1964; 
Seeman, 1971; Shepard, 1971) attest to this 
fact. If one considers the phenomena of both 
alienation and ‘involvement’ to be states of 
the individual, it would be more parsimo- 
nious to consider both concepts as represent- 
ing a single dimension than to consider them 
as independent dimensions. 

In the following description of the present 
approach, the above-mentioned character- 
istics are highlighted. 


Definitions of the Concepts 


In the present approach, work involve- 
ment is viewed as a generalized cognitive (or 
belief) state of psychological identification 
with work insofar as work is perceived to 
have the potentiality to satisfy one's salient 
needs and expectations. Likewise, work alien- 
ation can be viewed as a generalized cog- 
nitive (or belief) state of psychological sepa- 
ration from work insofar as work is per- 
ceived to lack the potentiality for satisfying 
one's salient needs and expectations. Thus, 
the degree of involvement at work should be 
directly measured in terms of individual's 
cognition about his or her identification with 
work. The individual's identification with this 
work, however, depends on two things: the 
saliency of his or her needs (both extrinsic 
and intrinsic) and the perceptions he or she 
has about the need-satisfying potentialities 
of work. 

Defining the concepts in this way has some 
implications for their measurement. If job 
involvement and alienation are viewed as cog- 
nitive states of an individual, they cannot be 
measured with the existing instruments 
(Blauner, 1964; Lodahl & Kejner, 1965; 
Saleh & Hosek, 1976; Shepard, 1971). Most 
of these instruments combine some measures 
of the cognitive state of alienation with some 
measures of its presumed causes and effects. 
For example, the most widely used instru- 
ment, developed by Lodahl and Kejner 
(1965), contains not only items that reflect 
the cognitive state of involvement (^I live, 
eat, and breathe my job") but also items 
that reflect both antecedent and consequent 
feeling states and behavioral tendencies (“1 
feel depressed when І fail at something con- 
nected with my job" or “I will stay over- 
time to finish a job, even if I am not paid for 
it”). Because of such built-in ambiguities in 
the existing instruments, the data these in- 
struments yield are often hard to interpret. 
Future research efforts should attempt to 
develop more unambiguous measures of job 
involvement that reflect only the nature of 
the cognitive state of psychological identifica- 
tion with work. For instance, items such as 
«Т live, eat, and breathe my job,” “I am very 
much involved in my job," “The most im- 
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portant thing that happened to me involved 
my work," and so on tend to reflect the indi- 
vidual’s awareness of work identification 
without measuring his or her need states 
(antecedent conditions) or overt behavioral 
tendencies (consequent conditions). These 
kinds of items have construct validity and 
therefore are more desirable measures of the 
cognitive state of job involvement. One can 
also use graphic techniques or the semantic 
differential format (Osgood, Suci, & Tannen- 
baum, 1957) to measure job involvement on 
dimensions such as involved-noninvolved, 
important-unimportant, identified-separated, 
central-peripheral, and so forth. Besides be- 
ing less confusing with regard to assessing 
the cognitive states of involvement and alien- 
ation, such measures of job involvement 
that have construct validity seem to be bet- 
ter suited for cross-cultural and comparative 
research than are the existing measures, be- 
cause the latter tend to mainly include and 
heavily emphasize items on intrinsic-need 
satisfaction. For groups of people who do not 
consider intrinsic needs (autonomy, control, 
etc.) to be the guiding forces in their lives, 
the existing measures with an emphasis on 
intrinsic needs cannot truly reflect their job 
involvement. 


Conditions of Job Involvement 


A schematic representation of the present 
motivational approach to job involvement, its 
causes, and its effects is presented in Figure 1. 
As can be seen in Figure 1, an individual’s 
behavior and attitudes exhibited both on and 
off the job are a function of the saliency of 
need states within him or her. At any given 
moment, the need saliency within the indi- 
vidual depends on the prior socialization pro- 
cess (historical causation) and on the per- 
ceived potential of the environment (job, 
family, etc.) to satisfy the needs (contempo- 
rary causation), The cognitive state of in- 
volvement as a by-product of need saliency 
also depends on the nature of need saliency as 
historically determined through the socializa- 
tion process and on the perceived potential of 
the environment to satisfy the needs. In the 

context of job involvement, an individual’s 
belief that he or she is work involved or job 
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alienated depends on whether the work is 
perceived to have the potential for satisfying 
his or her salient needs. The saliency or the 
importance of different needs for the indi- 
vidual is determined by the individual’s past 
experiences with groups of which he or she 
was a member (socialization process) and 
with jobs that he or she has held. Different 
groups of people are influenced by different 
cultural, group, and organizational norms, 
and thus they tend to develop different need 
structures or to set different goals and ob- 
jectives for their lives. For example, the work- 
motivation literature suggests that the sources 
of work involvement for managers within any 
organization may be very different from 
those for the unskilled laborers because of 
differences in the need saliencies of the two 
groups. Managers may value more autonomy 
and control in their jobs, whereas the un- 
skilled laborers may attach greater impor- 
tance to security and to sense of belonging in 
their jobs, Such value differences stem essen- 
tially from past socialization processes and 
from the influence of the norms of the groups 
to which the workers belong. 

In a recent study, Kanungo, Gorn, and 
Dauderis (1976) demonstrated that because 
of differences in the socialization process, 
francophone and anglophone managers exhibit 
different patterns of need saliency at work. 
For instance, security and affiliation needs 
seem to have greater saliency for franco- 
phone as compared to anglophone managers, 
whereas autonomy and achievement needs 
tend to have greater saliency for anglophone 
as compared to francophone managers. The 
salient needs tend to determine the central 
life interests of the individuals. On the job, 
the saliency of a need in an individual may 
be reinforced when the person finds that 
through job behavior he or she is capable of 
meeting the needs. His or her perception that 
the job is capable of satisfying his or her im- 
portant needs will make the individual devote 
most of his or her available energy to the job. 
The worker will immerse himself or herself in 
the job, and the feedback from his or her job 
behavior will lead the worker to believe that 
the job is an essential part of himself or her- 
self or that he or she is job involved. If, how- 
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. ever, the job is perceived by the individual the satisfaction of his or her salient needs, the 

` ' as lacking in opportunities for the satisfaction person will redirect his or her energy else- 
of his or her salient needs, he or she will de- where by engaging in various off-the-job ac- 
velop a tendency to withdraw effort from the tivities or by engaging in various undesirable 
job and thus become alienated from it. For on-the-job activities. 


Socialization Process: 
Cultural, Organizational 
and Group Norms 


Need Saliency 


Instrumental Behavior 
and Attitude 


Off the Job Activities: 
Family activities, 
Community activities, etc. 


On the Job Activities 


Satisfaction of Needs 


Perceived potential 
or lack of potential of 
Community to satisfy need 


Perceived potential 
or lack of potential of 
Family to satisfy need 


Perceived potential 
or lack of potential of 
Job to satisfy needs 


Community involvement 


Family involvement or 
or alienation 


alienation 


Job involvement or 
alienation 


Community related 
Behavior and Attitude 


Family related 


Job related х 
Behavior and Attitude 


Behavior and Attitude 


* Figure 1. Schematic representation of the motivational approach to involvement and alienation. 
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A recent comparative study (Basu, 1976) 
of job involvement among írancophone and 
anglophone workers has provided some indi- 
rect evidence in support of this motivational 
approach to work involvement. On the basis 
that anglophone workers are a product of the 
Protestant ethic socialization process and 
that they value job autonomy and achieve- 
ment to a greater extent than francophone 
workers, they are expected to show greater 
psychological identification with their jobs 
than are francophone workers. Such a predic- 
tion is based on the previous approaches to 
alienation that emphasize the importance of 
autonomy and control in the worker's selí- 
concept. This prediction however was not 
confirmed by Basu. If anything, the results of 
his study revealed stronger psychological job 
identification among francophone workers 
than among anglophone workers. The reason 
for greater work involvement among the 
francophone workers may lie in the fact that 
they perceive their salient needs, such as se- 
curity and affiliation tendencies, to be met to 
a greater extent on the job than do the anglo- 
phone workers. Further empirical research is 
necessary, however, to directly test the im- 
plication of the present formulation in work 
situations. 

The notion that job involvement has its 
roots both in the past socialization process and 
in the need-satisfying potential of the job en- 
vironment seems to be supported by the work 
of several researchers (Rabinowitz & Hall, 
1977). For instance, those researchers (Blood 
& Hulin, 1967; Hulin & Blood, 1968; Lo- 
dahl, 1964; Siegel, 1969) who have studied 
job involvement as an individual-difference 
variable have proposed that job involvement 
has its roots in past socialization. On the 
other hand, those researchers (Argyris, 1964; 
Bass, 1965; McGregor, 1960) who have 
studied involvement as a function of the job 
situation have proposed that the root of in- 
volvement lies in the need-satisfying poten- 
tial of the job environment. It is important, 
however, to keep in mind that the cognitive 
state of involvement is caused by both the 
socialization process and the job environment. 
Future studies should be directed toward as- 


RABINDRA N. KANUNGO 


sessing the relative contributions of each 
these causes of job involvement. 

Figure 1 also suggests that a cognitive stat 
of job involvement will have significant 
fluence on job behavior and job attitude 
Several interesting possibilities present them 
selves in this regard. It would be worthwh 
to investigate the influence of the state of ii 
volvement on the quality and intensity @ 
job attitudes. Job involvement does not песе 
sarily cause positive job attitudes, but per 
haps does affect the intensity of job att 
tudes. The effects of the state of involvemen 
on the quality and quantity of job prod 
tivity and on membership behavior (turm 
over, absenteeism, tardiness, etc.) also neei 
to be investigated in future research. 


Integration of the Sociological Approach 


The present approach can also be used | 
interpret the different types of work aliend 
tion suggested by sociologists (Blauner, 1964 
Seeman, 1959). In terms of the present fol 
mulation, the isolation variant of alienatig 
will be experienced by those individuals who 
social and belonging needs are the mí 
salient and who find that their jobs do по 
have the potential to satisfy their sod 
needs. Blauner seems to concur with this po: 
tion when he states that the state of isolatio 

“implies the absence of a sense of membershi 
in an industrial community” (p. 24). 
Canada, the isolation type of alienation hi 
been reported more often among French Ca 
nadian workers than among English Canadial 
workers, perhaps because in the case of th 
former group, the necessary conditions for 
state of isolation are present to a greater @ 
tent (salient affiliation needs of the Frend 
Canadian workers and their perception 
the anglophone ownership of industry). 
very similar reasons, female workers ой@ 
may experience a greater degree of isolatiol 
at work than male workers, The normlessnes: 
and the meaninglessness variants of wot 
alienation can be observed in persons who 
have a salient need for information with whichl 
to predict their physical and social work endi 
vironments. Finding that their jobs do nd 
Provide the necessary information, they maj 
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develop beliefs about the meaninglessness of 
+ their jobs. Educated and skilled workers may 
have a stronger need for information than 
lesss educated and unskilled workers. Hence 
the former group of workers may have a 
stronger tendency to develop beliefs about 
the meaninglessness of work than the latter 
group. Perhaps for similar reasons, the aliena- 
tion of intellectuals tends to be of the mean- 
inglessness variety (Seeman, 1959; Mills, 
1951). The powerlessness type of alienated 
state may be experienced by individuals who 
have salient ego needs such as the need for 
autonomy, the need for control, and the need 
for self-esteem but who find the job environ- 
ment incapable of satisfying them. Finally, 
the self-estrangement variety of alienation 

. may be experienced by people who have 
highly salient self-actualization needs, such 
as the need for achievement, and find their 
jobs limiting the realization of their potential- 
ities. Thus, from a motivational standpoint, 
the different types of alienation suggested by 
sociologists represent the same cognitive state 

. of separation from an environmental entity 
and are different only in the sense that they 
are caused by the different saliency structure 
of needs in the individuals. 


Some Major Differences Between the Present 
and the Earlier Approaches 


At this point, it may be useful to compare 
and to highlight a few important differences 
between the present conceptualization and 
earlier ones. 

Although the definitions of involvement and 
alienation as cognitive states of identifica- 
tion with work resemble the way the con- 
Cepts were defined by Lawler and Hall (1970), 
the former are different from the latter in 
One important respect. As discussed earlier, 
Lawler and Hall (1970) put exclusive em- 
Phasis on the job opportunities that meet a 
Worker's need for control and autonomy as 
necessary preconditions to the state of job 
involvement. In fact, all earlier formulations 
(both sociological and psychological) seem 

/to have followed this line of thinking. The 
' present approach, however, suggests that job 
» involvement does not necessarily depend on 
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job characteristics that allow for control- and 
autonomy-need satisfaction. It emphasizes 
that workers have a variety of needs, some 
more salient than others. The saliency of the 
needs in any given individual is determined 
by his or her past socialization in a given cul- 
ture (historical causes) and is constantly 
modified by present job conditions (contem- 
porary causes). Different groups of individ- 
uals, because of their different socialization 
training or different cultural background, 
may develop different need saliency patterns. 
They may value extrinsic and intrinsic job 
outcomes (Lawler, 1973) very differently. One 
set of needs (e.g., growth needs such as self- 
esteem and autonomy) may be salient in one 
group of workers, but the same needs may 
not be salient in another group. This may 
result in different self-images in the two 
groups and, consequently, in different job ex- 
pectations in the two groups. One group of 
workers that considers control and autonomy 
to be the core of their self-image may get in- 
volved in jobs that are perceived as offering 
opportunity for exercising control and au- 
tonomy, and they may become alienated from 
jobs that are perceived as providing little 
freedom and control. Such job characteristics, 
however, may not be the crucial considera- 
tions for another group (who may view se- 
curity and social needs to be the core of their 
self-image) in the determination of their job 
involvement or alienation. That people do 
differ with respect to what constitutes the 
core of their self-concepts should not be over- 
looked. The developed societies of the West 
may make their citizens believe that all that 
counts in one's life is to have individual liberty 
and freedom. Workers belonging to these so- 
cieties may feel therefore that working life is 
of little worth without freedom and control. 
In contrast, however, in the developing so- 
cieties of the East, economic and social se- 
curity often are considered more important 
to life than are freedom and control. Thus, 
workers in eastern societies may find work 
very involving if it guarantees such security, 
but may not cate very much for freedom and 
control in their jobs. In these societies, people 
may value equality more than liberty as the 
guiding principle of working life. Rabinowitz 


136 


and Hall (1977) alluded to this possibility, 
but found no available research that ex- 
amined "this lower-need-based form of job 
involvement” (p. 280). 

In their attempts to increase job in- 
volvement among workers, the sociological 
(Blauner, 1964) and the psychological (Law- 
ler & Hall, 1970) approaches have analyzed 
the work situation from the standpoint of job 
design, or the nature of the job. They have 
emphasized job characteristics such as the 
lack of variety in a job, mechanized and rou- 
tine operations, strict supervision, and so on 
and their effects on the involvement of work- 
ers without any attempt to understand the 
nature and the saliency of needs in the 
workers. In presenting such a position, these 
authors have argued in favor of a universal 
prescription for increasing job involvement 
by designing jobs to provide greater autonomy 
and control to the workers. The prescription 
is of course based on the assumption that the 
needs for control and autonomy are the most 
salient needs in workers. This position can 
be contrasted with the approach that Taylor 
(1911) advocated in his principles of scien- 
tific management. In his pig-iron-loading ex- 
periment, he selected as his subject a physi- 
cally strong individual who had salient mone- 
tary need. In selecting the right man for the 
job, he looked into the past training and 
abilities, the need saliency, and the job per- 
ceptions of the worker. Obviously, Taylor 
must have thought that these characteristics 
have a significant influence on a worker’s job 
involvement. The approach advocated in this 
article does not make the assumption that the 
needs for control and autonomy are the most 
salient needs in all workers, Unlike previous 
approaches, the present approach suggests 
that job involvement can be best understood 
if we find out the nature of and the saliency 
of needs in workers as they are determined 
by prior socialization and Present job condi- 
tions. The design of jobs and the determina- 
tion of their extrinsic and intrinsic outcomes 
for the sake of increasing job involvement 
should be based on an understanding of 
worker needs and perceptions. The findings 
of Lawler and Hackman (1971) seem to 
support this position. According to them, 
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“There is no reason to expect job changes 
affect the motivation and satisfaction of em. 
ployees who do not value the rewards t 
their jobs have to offer" (p. 52). 

Previous approaches emphasized the dis- 
tinction between work as instrumental and 
work as consummatory activity (the means 
to an end vs. the end in itself). The present 
approach considers work to be a set of job- 
related behaviors and attitudes and, like all 
behaviors and attitudes, work is considered 
to be instrumental in satisfying a variety of 
needs that a worker may have. All human 
behaviors stem from need states, and all hu- 
man behaviors tend to be purposive and in- 
strumental in obtaining goals or outcomes 
for the satisfaction of needs. Work behaviors 
and job attitudes should not be an exception 
to this rule. 

In summary, the motivational approach to 
the study of alienation and involvement ad- 
vocated in this article provides an integra- 
tive framework for future psychological and 
sociological research. Future research in the 
area should not only attempt to measure 
job alienation and job involvement as cogni- 
tive states but also attempt to relate such 
cognitive states to the antecedent conditions 
of need saliency in the individual and his or 1 
her job perceptions. Attempts should also 
be made to relate the cognitive states of 
alienation and involvement to the various af- 
fective states that accompany them and to 
their behavioral consequences. Using the mo- 
tivational approach, future studies should 
explore the phenomena of alienation and in- 
volvement in areas other than work, such as 
in the family, in the community, and in 
other forms of leisure-time pursuits (as sug- 
gested in Figure 1). It would be of consider- 
able interest to find out the reasons for 
alienation and involvement in these areas for 
different groups of people with different so- 
Cialization training. It would also be of in- 
terest to see how involvement and alienation 
in one area influence the nature of such 
states in other areas. For instance, how does 
job’ involvement affect family involvement 
and vice versa? The widely accepted Marx- | 
ian dictum that work alienation 15 the cause. 
of all social maladies is something that clearly : 
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needs empirical verification. These are some 
of the general issues that need exploration 
in the future, and it is hoped that the frame- 
work proposed here will help in such ex- 
ploration. 


Reference Note 


1. Faunce, W. Occupational involvement and selec- 
tive Lesting of self-esteem. Paper presented at the 
meeting of the American Sociological Association, 
Chicago, 1959. 


References 


Alderfer, C. P. Existence, relatedness, growth: Hu- 
man needs in organizational settings. New York: 
Free Press, 1972. 

Allport, G. W. Pattern and growth in personality. 
New York: Holt, Rinehart & Winston, 1961. 

Argyris, C. Integrating the individual and the or- 
ganization. New York: Wiley, 1964. 

Bass, B. M. Organizational psychology. Boston: 
Allyn & Bacon, 1965. 

Basu, К. S. Job involvement: An analysis in а bi- 
cultural context. Unpublished master's thesis, Mc- 
Gill University, Montreal, Canada, 1976. 

Blauner, R. Alienation and freedom: The factory 
worker and his industry. Chicago: University of 
Chicago Press, 1964. 

Blood, M. R., & Hulin, C. L. Alienation, environ- 
mental characteristics, and worker responses. Jour- 
nal of Applied Psychology, 1967, 51, 284-290. 

Clark, J. Р, Measuring alienation within a social 
system, American Sociological Review, 1959, 24, 
849-852. 

Denis, T. C. The concept of alienation: Some 
critical notices. In F. Johnson (Ed.), Alienation: 
Concept, term, and meanings. New York: Seminar 
Press, 1973. 

Dubin, R. Industrial workers’ worlds: A study of 
the central life interests of industrial workers. 
Social Problems, 1956, 3, 131-142. 

Dubin, R., Champoux, J. E., & Porter, L. W. Cen- 
tral life interests and organizational commitment 
of blue collar and clerical workers. Administra- 
tive Science Quarterly, 1975, 20, 411-421. 

Durkheim, E. De la division du travail social. Paris: 
F. Alcan, 1893. 

Festinger, L. A theory of social comparison pro- 
cesses. Human Relations, 1954, 7, 117-140. 

Gerth, Н. H., & Mills, C. W. From Max Weber: 
Essays in sociology. New York: Oxford Univer- 
Sity Press, 1946. 

Hall, D. T., & Schneider, B. Correlates of organiza- 
tional identification as a function of career pat- 
tern and organizational type. Administrative Sci- 
ence Quarterly, 1972, 17, 340-350. 


137 


Hall, D. T., Schneider, B., & Nygren, H. T. Personal 
factors in organizational identification. Adminis- 
trative Science Quarterly, 1970, 15, 176-190. 

Herzberg, F. Work and the nature of man. Cleve- 
land, Ohio: World Publishing, 1966. 

Hulin, C. L., & Blood, M. R. Job enlargement, in- 
dividual differences, and worker responses. Psy- 
chological Bulletin, 1968, 69, 41-65. 

Johnson, F. (Ed.). Alienation: Concept, term, and 
meanings. New York; Seminar Press, 1973. 

Jones, E. E, & Gerard, H. B. Foundations of so- 
cial psychology. New York: Wiley, 1967. 

Josephson, E., & Josephson, M. R. Alienation: Con- 
temporary sociological approaches. In F. Johnson 
(Ed.), Alienation: Concept, term, and meanings. 
New York: Seminar Press, 1973. 

Kanungo, R. N., Gorn, G. J., & Dauderis, H. J. 
Motivational orientation of Canadian anglophone 
and francophone managers. Canadian Journal of 
Behavioral Science, 1976, 8, 107-121. 

Lawler, E. E. Motivation in work organizations. 
Belmont, Calif.: Wadsworth, 1973. 

Lawler, E. E., & Hackman, J. R. Corporate profits 
and employee satisfaction: Must they be in con- 
flict? California Management Review, 1971, 14, 
46-55. 

Lawler, E. E. & Hall, D. T. Relationship of job 
characteristics to job involvement, satisfaction, 
and intrinsic motivation. Journal of Applied Psy- 
chology, 1970, 54, 305-312. 

Lodahl, T. M. Patterns of job attitudes in two as- 
sembly technologies. Administrative Science Quar- 
terly, 1964, 8, 482-519. 

Lodahl, T. M., & Kejner, M. The definition and 
measurement of job involvement. Journal of Ap- 
plied Psychology, 1965, 49, 24-33. 

Macquarrie, J. A theology of alienation. In F. 
Johnson (Ed.), Alienation: Concept, term, and 
meanings. New York: Seminar Press, 1973. 

Marx, K. [Economic and philosophical manu- 
scripts.] In, Marx-Engels Gesamtausgabe (Vol. 
3). Berlin, Germany: Marx-Engels Institute, 
1932. (Originally published, 1844.) 

Maslow, A. H. Motivation and personality. New 
York: Harper, 1954. 

McGregor, D. The human side of enterprise. New 
York: McGraw-Hill, 1960. 

Mills, C. W. White collar. New York: Oxford Uni- 
versity Press, 1951. 

Osgood, C. E., Suci, G. J. & Tannenbaum, P. H. 
The measurement of meaning. Urbana: University 
of Illinois Press, 1957. 

Patchen, M. Participation, achievement, and in- 
volvement on the job. Englewood Cliffs, N.J.: 
Prentice-Hall, 1970. 

Rabinowitz, S., & Hall, D. T. Organizational re- 
search on job involvement. Psychological Bul- 
letin, 1977, 84, 265-288. 

Saleh, S. D., & Hosek, J. Job involvement: Con- 
cepts and measurements. Academy of Manage- 
ment Journal, 1976, 19, 213-224. 


138 


Schacht, R. Alienation. Garden City, N.Y.: Double- 
day, 1970. 

Seeman, M. On the meaning of alienation, Ameri- 
can Sociological Review, 1959, 24, 783—791. 

Seeman, M. The urban alienations; Some dubious 
theses from Marx to Marcuse. Journal of Person- 
ality and Social Psychology, 1971, 19, 135-143. 

Shepard, J. M. Automation and alienation: A study 
of office and factory workers. Cambridge, Mass.: 
MIT Press, 1971. 

Siegel, L. Industrial Psychology. Homewood, Ill.: 
Irwin, 1969. 


RABINDRA N. KANUNGO 


Taylor, F. W. Principles of scientific management, 
New York: Harper, 1911. 

Vroom, V. Ego involvement, job satisfaction, and 
job performance. Personnel Psychology, 1962, 15, 
159-177. 

Vroom, V. Work and motivation. New York: Wiley, 
1964. 

Weissenberg, P., & Gruenfeld, L. W. Relationship 
between job satisfaction and job involvement. 
Journal of Applied Psychology, 1968, 52, 469-473, 


Received October 5, 1977 = 


unc ЗЕН 


Psychologii 
1979, Vol. 


Donald P. Schwab, Judy D. Olian-Gottlieb, and Herbert G. Heneman III 


cal 


Bulletin 
86, No. 1, 139-147 


Between-Subjects Expectancy Theory Research: 
A Statistical Review of Studies Predicting 
Effort and Performance 
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A large number of between-subjects expectancy theory studies have correlated 
measures of employee motivation (force) to perform (consisting of perceptions 
regarding linkages among effort and performance, performance and outcomes, 
and the attractiveness of the outcomes) with measures of effort and perform- 
ance. The results of these studies (i.e., variance explained in effort and per- 
formance) vary considerably. A statistical review of these studies was conducted 
to determine the extent to which variance explained (the dependent variable) 
was a function of various characteristics of the effort and performance and of 
the force-to-perform measures (the independent variables). There were 160 ob- 
servations, derived from 32 studies. Using multiple regression, we found that 
variance explained in these studies was greater when (a) self-report or quantita- 
tive measures of effort and performance were used rather than evaluations of 
these variables by someone other than the subject; (b) 10-15 outcomes were 
included in the force measure rather than a greater or smaller number of out- 
comes; (c) outcome valence was numerically scaled with positive numbers only, 
and the scale values were described in terms of desirability rather than impor- 
tance; and (d) the force measure either contained no assessment of expectancy 
or an assessment that confounded expectancy and instrumentality. These var- 
iables accounted for 42% of the variance in the results obtained in the studies 
reviewed. Some theoretical and research implications of these findings are dis- 


cussed. 


Expectancy theory (Vroom, 1964) is a 
process theory of work motivation that has 
received considerable theoretical and empiri- 
cal attention. Although numerous variations 
of the theory have been proposed (Campbell 
& Pritchard, 1976), the core of all formula- 
tions is the proposition that motivation 
(force) to perform is a function of (a) the 
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expectancy that changes in effort will re- 
sult in changes in performance, (b) the in- 
strumentality of performance changes for the 
attainment of outcomes (such as pay in- 
creases), and (c) the valences of the out- 
comes. Also, in most formulations, these 
variables are combined in a multiplicative 
fashion as follows: Force to perform = f[ex- 
pectancy X X (instrumentalities X valences)]. 
A substantial amount of research has been 
stimulated by expectancy theory. Almost all 
of it has used a between-subjects design in 
which a measure of force to perform is cor- 
related with a measure of effort or perform- 
ance for a sample of individuals. Some re- 
viewers have objected to this orientation and 
have suggested that a within-subject design 
would be more appropriate for testing the 
theory (e.g., Mitchell, 1974). To date, how- 
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ever, there has been almost no such research 
on the prediction of effort or performance. 

As the body of empirical literature on 
this theory has proliferated, so also have 
reviews summarizing and interpreting that 
literature (e.g., Campbell & Pritchard, 1976; 
Connolly, 1976; Heneman & Schwab, 1972; 
House, Shapiro, & Wahba, 1974; Mitchell, 
1974; Mitchell & Biglan, 1971; Wahba & 
House, 1974). Although some of these re- 
views are essentially uncritical descriptions 
of the theory and research findings (e.g., 
House et al, 1974), all have made at least 
some suggestions aimed at improving the 
conduct of future research. 

The suggestions that have been made can 
be viewed írom two related perspectives. 
One is primarily theoretical in orientation. 
From this perspective, reviewers can be 
viewed as encouraging their empirically ori- 
ented colleagues to test expectancy theory 
qua theory. Thus, as examples, Heneman 
and Schwab's (1972) admonition to include 
assessments of expectancy in force measures 
and Mitchell’s (1974) plea to perform 
within-subject studies can be viewed as at- 
tempts to make the measures and procedures 
of empirical investigations isomorphic with 
the theory. 

An alternative perspective is to view these 
suggestions as aimed at increasing the the- 
ory's ability to predict effort or performance. 
The focal point of such an orientation is in 
increasing variance explained rather than in 
the elegance of the theoretical formulation 
per se. The reviews mentioned here have gen- 
erally made suggestions regarding the mea- 
surement of the constructs specified in the 
theory. These recommendations, however, 
have in every case been based on casual as- 
sessments of previous research; that is, none 
of the reviews have empirically determined 
if variance explained differs as a function of 
the measurement procedures used. 

The present review involves a statistical 
analysis of the results obtained in previously. 
published between-subjects studies of expec- 
tancy theory. Studies included in our sta- 
tistical analysis were those that examined the 
amount of variance explained in measures 
of effort or performance by measures of 
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force to perform. These variance-explained 
values reported in the previous studies be- ' 
came our dependent variable. Independent 
variables chosen for the statistical analysis 
were various characteristics of (a) the mea- 
sures of performance or effort and (b) the 
force measures that were used in the previ- 
ously published research. Thus, the present 
study aims to account for variation in the 
strength of the relationship observed in other 
studies between force to perform and per- 
formance or effort as a function of charac- 
teristics of the effort, performance, and force- 
to-perform measures. 

Three 'issues were examined regarding 
characteristics of the performance and effort 
measures. The first of these involved dis- 
tinguishing between use of effort and use of 
performance as criterion measures. The con- 
ception of force is clearly aimed at predicting 
the effort expended toward some behavior. 
The actual behavior (job performance in 
the case at hand) is seen as a multiplicative 
function of force to perform and ability to 
perform (Vroom, 1964). Thus, from a the- 
oretical perspective, one might argue, ceteris 
paribus, that more variance would be ex- 
plained in studies using effort as the de- 
pendent measure than in studies using some 
measure of performance. 

There is, however, reason to suspect that 
measures of effort might not be more pre- 
dictable than measures of performance. Spe- 
cifically, as Campbell and Pritchard (1976) 
have stated, “Organizational psychology is 
without any clear specification of the mean- 
ing of effort and consequently there is no 
operationalization of the variable that pos- 
sesses even a modicum of construct validity" 
(p. 92). Thus, one might anticipate difficulty 
in predicting effort from force on pure mea- 
surement grounds. 

A second dependent variable issue рег- 
tained to whether the measure was internal 
(self-reports) or external (others! ratings and 
rankings or productivity indexes) to the sub- 
ject. Mitchell (1974) has argued that internal 
measures are more appropriate because effort 
is difficult to observe and hence measure ех- 
ternally. Moreover, it is generally agreed 
that when self-ratings are used simultaneously 
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for both the dependent and the independent 
variables, the resultant correlations are in- 
flated. Thus, we hypothesized that more vari- 
ance would be explained when selí-ratings 
were used to measure effort or performance 
than when alternative measures were em- 
ployed. 

A third issue regarding the dependent vari- 
able was concerned with performance mea- 
sures that were external to the subject. 
These measures have typically been ratings 
or rankings of performance provided most 
often by the subject's work supervisor. There 
have also been a number of instances in 
which investigators had access to quantitative 
productivity measures (units produced, sales 
volume, etc.). We hypothesized that more 
variance would be explained with the latter 
measures because of the reliability problems 
that frequently occur with performance rat- 
ings or rankings. 

It was also possible to examine a number 
of issues dealing with characteristics of the 
force measure used. One important issue had 
to do with the number of outcomes (i.e., 
consequences of performance) assessed. Both 
Heneman and Schwab (1972) and Mitchell 
(1974) have suggested that a theoretical ori- 
entation would require that a large number 
of such outcomes be included in any test of 
the theory. This would be necessary to ensure 
that all outcomes of potential relevance to 
subjects are included in the measure of force 
to perform. No harm would come from this 
strategy because outcomes of no importance 
(with zero valence) would fall out of the 
force equation. For example, an outcome 
such as improved recreational facilities 
might not be important to most subjects. 
They would be expected to assign such an 
outcome zero valence, and hence it would 
not enter their force-to-perform equations. 
Nevertheless, including such an outcome 
would presumably enhance the force-to-per- 
form estimates of those subjects for whom 
this outcome has some nonzero valence. 

Again, however, psychometric issues po- 
tentially conflict with theoretical precision. 
Outcomes of little importance to subjects 
may contribute to unreliability of measure- 


| ! ment and hence to reduced predictability. The 
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potential conflict between theoretical and 
psychometric considerations suggests that a 
nonlinear relationship between number of 
outcomes and variance explained might be 
hypothesized. Up to a certain point increases 
in the number of second-level outcomes may 
increase variance explained because the mar- 
ginal increment in total valence is large rela- 
tive to the reliability decrement. We hy- 
pothesized, however, that beyond some point 
the marginal increment would be offset by 
the reliability decrement, resulting in a re- 
duction in the variance explained. 

The data permitted the examination of 
two issues involving the measurement of 
valence of second-level outcomes. One of 
these issues has to do with the verbal anchor- 
ing of the valence measures. Vroom (1964) 
defined valence as indicating anticipated 
satisfaction with, or desirability of, second- 
level outcomes. Both Connolly (1976) and 
Mitchell (1974) noted that many studies have 
anchored their valence scales with impor- 
tance, which potentially represents an alter- 
native construct. Connolly (1976) argued 
that unless the use of importance can be 
justified in variance-explained results, “There 
is a good argument for returning to the 
original conception of valence as anticipated 
satisfaction, or a close analog such as at- 
tractiveness, desirability, or anticipated util- 
ity” (p. 40). The second issue regarding 
valence has to do with the numerical anchors 
used. Again Mitchell (1974) has argued 
that theoretical purity requires that the an- 
chors range from positive to negative values 
instead of using just the positive values that 
are frequently reported. Both of these val- 
ence issues (importance vs. desirability and 
positive to negative vs. positive only) were 
examined in the present study, although di- 
rectional hypotheses were not specified a 
priori. 

The final issues considered in the present 
study pertain to the measurement of ex- 
pectancy. In their review Heneman and 
Schwab (1972) pointed out that most of the 
initial studies failed to measure expectancy 
at all or confounded expectancy with mea- 
sures of instrumentality. They urged that 
future research include unconfounded mea- 
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sures of expectancy. This plea has apparently 
been heeded because a number of investiga- 
tions have now been reported that include 
expectancy measures unconfounded with in- 
strumentality. Moreover, Campbell and Prit- 
chard (1976) concluded that measures of 
expectancy, considered singularly, tend to 
be positively correlated with measures of ef- 
fort and performance. Thus, it was hypothe- 
sized that variance-explained values would 
be greater when the force measure included 
an expectancy term than when it did not. It 
was also hypothesized that more variance 
would be explained when this measure was 
not confounded with instrumentality. A re- 
lated issue was the measurement of expec- 
tancy in those studies that included an un- 
confounded measure. Vroom (1964) defined 
expectancy as the subjective probability that 
an outcome (e.g, performance) will follow 
à specified level of effort, Thus, a theoretically 
correct measure of expectancy would assess 
it in likelihood terms, although in a number 
of studies it has been measured in alterna- 
tive ways (e.g., having subjects compare the 
importance of personal effort relative to 
other potential determinants of performance; 
Schwab & Dyer, 1973). Following this the- 
ory we hypothesized that more variance 
would be explained when expectancy was 
measured in subjective-probability or likeli- 
hood terms. 


Method 
Dependent Variable and. Population 


The dependent variable was the amount of vari- 
ance explained in a measure of effort or performance 
by a measure of force.| These values were easily 
obtained in studies that reported results in correla- 
tional terms by computing coefücients of determina- 
tion (^ or Ж), The R° values were corrected for 
number of independent variables using Nunnally’s 
(1967) correction formula. In two studies the results 
were not directly presented in correlational terms, 
but it was possible to derive the correlation from 
the information presented (Lawler, 1966; Turney, 
1974). 

A number of decisions about choice of the de- 
pendent variable were made. Many studies reported 
a variety of analyses; for example, performance 
might be correlated with a number of alternative 
force formulations. In these instances, data were 
used only from the model that’ most closely approxi- 
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mated the multiplicative force model specified 
earlier? In addition, we included only the force 
measure that used the total number of outcomes 
assessed in the study. In the four studies that used 
cross-lagged correlation analysis (Kopelman & 
Thompson, 1976; Lawler, 1966; Lawler & Suttle, 
1973; Sheridan, Slocum, & Richards, 1974), only 
the predictive results (force at time 1, effort or 
performance at time 2) were used. In all instances 
only one model of force was included from each 
study. However, an observation corresponding to 
each relationship between the force measure and 
alternative measures of effort or performance was 
included from those studies that had multiple mea- 
sures of effort or performance. 

A total of 32 published studies (using effort or 
performance measures as criteria) were found in 
which the results were reported in such a manner 
that a variance-explained value was presented or 
could be derived. Using the decision rules specified 
previously, a total of 160 observations were ex- 
tracted from the 32 studies. For analysis purposes 
we viewed this N — 160 as the population of vari- 
ance-explained values in between-subjects studies. 


Independent Variables 


The studies were reviewed to obtain the necessary 
independent variable information. Since the infor- 
mation was relatively straightforward, little am- 
biguity occurred in reviewing the studies and coding 
the data. The following independent variables were 
used in the present study: (a) whether effort or 
performance was measured, (b) whether the effort/ 
performance measure was self-reported or externally 
assessed, (c) whether the externally assessed measure 
was based on objective data (e.g., sales volume and 
productivity) or subjective appraisal, (d) number 
of second-level outcomes (trichotomized into cate- 
gories of approximately equal numbers of observa- 
tions consisting of 1-9 outcomes, 10-15 outcomes and 
16 or more outcomes), (e) whether valence was 
scaled positive to negative or only positive, (f) 
whether the verbal anchor for valence was impor- 
tance or desirability, (g) whether an expectancy 


1The dependent variable used in the present 
study resulted in the exclusion of certain well- 
known expectancy studies (e.g., Georgopolous, Ma- 
honey, & Jones, 1957; Porter & Lawler, 1968) 
because their results could not be cast into a vari- 
ance-explained format. Also, negative force-per- 
formance (or force-effort) relationships were coded 
zero in the analysis. 

2 Опе exception to the decision rule was made 
for the Oliver (1974) study. He did not report re- 
sults for the multiplicative model alone. Thus, the 
variance-explained estimate recorded for his study 
consisted of the multiplicative and additive models 
combined. 

3In coding the verbal valence anchors, Graen's 
(1969) essential-unnecessary categories were coded 
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Table 1 
Frequency Distribution of Independent 
Variables 


Variable Js 
Dependent variable 
Productivity 28 
Self-report effort 29 
Self-report performance 15 
Other-report effort 13 


Other-report performance 75 


Number of outcomes 


<9 46 
10-15 61 « 
216 53 
Valence characteristic 
Importance 
Negative-positive 1 
Positive only 58 
Desirability 
Negative-positive 72 
Positive only 29 
Expectancy characteristic 
Unconfounded 
Likelihood estimate 27 
Other 69 
Confounded 51 
None 13 
Note. N = 160. 


measure was included in the force measure, (h) 
whether the expectancy measure was unconfounded 
with instrumentality, and (i) whether expectancy 
was measured as a likelihood estimate. 


Analysis 


All independent variables were dummy coded 
(Cohen, 1968). The frequency with which each 
category appeared in the studies reviewed is shown 
in Table 1. Table 1 shows, for example, that 28 of 
the observations had an externally assessed produc- 
tivity measure, that 29 had a self-report measure of 
effort, and so forth, Frequencies within each general 
category (dependent variable, number of outcomes, 
valence characteristics, and expectancy character- 
istics) sum to the total (V = 160) because they are 
made up of mutually exclusive and exhaustive sub- 
categories, The mean variance explained was calcu- 
lated for each of the categories shown in Table 1. 


as important-unimportant. Coded as desirable-un- 
desirable were good-bad (Hackman & Porter, 1968; 
Lied & Pritchard, 1976; Matsui & Terai, 1975), at- 
tractive-unattractive (Pritchard & Sanders, 1973), 
and preferences among pairs of outcomes (Sheridan 
et al, 1974). 
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In addition, the variance-explained values were 
regressed on various combinations of the dummy 
categories. This analysis was performed to assess 
how much of the variability in the results of pre- 
vious research could be accounted for by the pro- 
cedural characteristics of the studies. All multiple 
coefficients of determination reported in subsequent 
tables are significant (p < .05). However, significance 
levels are not reported in the tables because we view 
this analysis as describing the population of between- 
subjects, variance-explained estimates of effort and 
performance. Some inferential implications of this 
study are considered in the discussion. 


Results 


The first analysis involved an examination 
of the variance-explained values from the 160 
observations in terms of the characteristics 
of the variables used to measure effort and 
performance. Table 2 reports the average 
variance explained between measures of force 
and the five classifications of the dependent 
variables employed in the studies reviewed. 
As hypothesized, force and self-report mea- 
sures of effort and performance are more 
highly related than are force and others’ 
assessment of effort or performance. For ex- 
ample, measures of force account for 10% of 
the variance on the average in self-report 
measures of performance and for 7% in per- 
formance measures assessed by others. Table 
2 also shows, as hypothesized, that quantita- 
tive measures of productivity are more pre- 
dictable than others’ ratings or rankings of 
performance or effort. On the other hand, 
effort was less predictable than performance 
only for others’ ratings and rankings. Overall, 
method of categorizing the measures of effort 
and performance accounted for 8% of the 
variability in the variance-explained values. 


Table 2 ; 

Variance Explained as a. Function of Type of 
Effort or Performance Measure 
_-______________н- 


M variance 

Dependent variable explained 
Productivity 13 
Self-report effort 13 
Self-report performance 10 
Other-report effort 03 
Other-report performance 07 
R .08 
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Table 3 
Variance Explained as a Function of 
Number of Outcomes 


Number of М variance 


outcomes explained 
«9 .08 
10-15 14 
>16 .05 
R 42 


Table 3 shows the variance that force 
measures account for in measures of perform- 
ance and effort as a function of the number 
of outcomes assessed. As hypothesized, high- 
est average variance is explained in studies 
with an intermediate number of outcomes 
(14%), somewhat less variance is explained 
in studies with 9 or less outcome (896), and 
the least variance is explained in studies with 
16 or more outcomes (5%). Twelve percent 
of the variability in the results of the studies 
reviewed is accounted for by this categoriza- 
tion of the number of outcomes. 

Preliminary analysis found that the nu- 
merical (negative to positive vs. positive only) 
and verbal (desirability vs. importance) 
anchoring procedures for valence measures 
were not independent (see Table 1). As a con- 
sequence, four categories were established for 
valence, as shown in Table 4. It can be seen 
that desirability scalings on the average have 
yielded higher variance-explained estimates 
than have importance scalings. Thus, Vroom’s 
(1964) original definition receives some em- 
pirical support in terms of the verbal anchors, 
On the other hand, Table 4 shows that posi- 


Table 4 

Variance Explained as a Function of 

Valence Characteristics 
————————— 


M variance 
Characteristic explained 
Importance 
Negative-positive 05 
Positive only 08 
Desirability 
Negative-positive :07 
Positive only 16 
R 10 
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tive-only anchors, especially when combined 
with desirability anchors, have resulted in the 
highest average variance explained. All told, 
the scaling of valence accounts for 1096 of 
the variability in the results obtained in the 
expectancy research reviewed here. 

Table 5 shows average variance explained 
as a function of the measurement of expec- 
tancy. Contrary to the hypothesis, greatest 
average variance is explained in studies that 
did not include a measure of expectancy 
(1296) or that confounded this measure with 
instrumentality (14%). Moreover, slightly 
less variance is explained in studies that used 
a likelihod estimate (5%) than in those that 
used alternative procedures (696) among 
studies that did include an unconíounded 
measure. Twelve percent of the variability in 
the results of the studies reviewed is ac- 
counted for by the categorization of the pro- 
cedures used to assess expectancy. 

A final equation was generated by regres- 
sing the variance explained in previous ex- 
pectancy studies on all of the independent 
variables simultaneously. The signs of the 
regression coefficients in this equation were 
the same as in the equations generated to 


Obtain R? values in Tables 2-5. The R? for 


this last equation was .42. Thus, 42% of the 
variability in the results of the studies re- 
viewed is accounted for by the categoriza- 
tions of the dependent variables and force 
measures, 


Discussion 


At the outset it is important to recognize 
that our analysis was necessarily constrained 
by the measures and procedures used in the 
studies reported in the literature. Thus, as ап 
example, Heneman and Schwab (1972) called 
for comparisons of results obtained using 
additive versus multiplicative combinations 
of force measures. Schmidt's (1973) criticism 
of multiplicative analyses aside, the present 
study was forced to consider multiplicative 
models because of those studies included in 
this review, only the Dyer and Weyrauch 
(1975), Oliver (1974), Pritchard and San- 
ders (1973), and Schwab and Dyer (1973) 
studies reported additive (or additive plus 
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Table 5 


) 


Variance Explained as a Function of 
Expectancy Characteristics 


M variance 
Characteristic explained 
Unconfounded measure 

Likelihood estimate .05 
Other .06 
Confounded measure 14 
None 42 
R 12 


interactive) combinations of all force com- 
ponents in addition to the multiplicative mode 
of combination. 

An additional issue that could have been 
investigated, but was not, is the distinction 
between models that contrast results obtained 
using so-called intrinsic versus extrinsic out- 
comes. House et al. (1974), Wahba and 
House (1974), and Mitchell (1974) have all 
urged that such distinctions be made, and, 
indeed, a number of studies have purportedly 
done so (e.g, Mitchell & Albright, 1972; 
Oliver, 1974). However, such distinctions 
seem arbitrary in view of Dyer and Parker’s 
(1975) demonstration that one social scien- 
tist’s extrinsic outcome is another’s intrinsic 
outcome and vice versa. As a consequence, 
we considered only the models in each study 
that included all outcomes. 

Nevertheless, the issues that were investi- 
gated in the present review accounted for a 
substantial portion of the variance in the 
results of between-subjects expectancy theory 
studies designed to predict effort or perform- 
ance. This was accomplished by categorizing 
several characteristics regarding the opera- 
tionalization of the dependent variable and of 
force to perform. We found that self-report 
measures were more highly related to mea- 
sures of force than were measures provided 
by other evaluators. This finding has been 
observed by other reviewers and probably 
reflects spurious method covariation. Ob- 
jective measures of performance were also 
associated with greater variance explained 
than were measures obtained from other 
evaluators. There are at least two possible 
explanations for the higher predictability of 
objective measures. One possibility is that 
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quantitative measures are more reliable than 
measures provided through an appraisal pro- 
cess. An alternative explanation has to do 
with the possible boundary conditions of the 
theory. Dachler and Mobley (1973) have 
suggested that the theory may only be pre- 
dictive in situations in which outcomes are 
objectively linked to behaviors. It may be 
that in situations in which performance is 
objectively measured, the organization is 
more likely to use contingent reward systems 
(particularly those involving monetary re- 
wards). 

The composition of the force measure used 
was also related to the results obtained by 
previous researchers. Studies that used 10-15 
second-level outcomes obtained stronger rela- 
tionships between force and performance or 
effort than did studies that used either fewer 
or more outcomes. Additionally, studies that 
scaled valence only positively and that used 
desirability-undesirability verbal anchors re- 
sulted in more variance explained than al- 
ternative formulations of valence. Studies that 
did not measure expectancy at all or that 
confounded expectancy with instrumentality 
measures obtained stronger results than those 
that measured expectancy in a more theo- 
retically correct fashion, 

It is obvious from these results that maxi- 
mum variance explained in between-subjects 
predictions of performance or effort has not 
been obtained by making force measures ad- 
here to the theory. Indeed, every finding re- 
garding the measurement of force could be 
interpreted as contrary to the theory except 
for the verbal anchoring of valence. Models 
have yielded the strongest results without a 
theoretically appropriate expectancy measure, 
with a moderate number of second-level out- 
comes, and with valence scaled only in a 
positive direction. 

It is tempting to infer from these findings 
the likely results that would be obtained if 
one were to conduct a between-subjects study 
of performance or effort based on expectancy 
theory. The major probable constraint on 
inference, however, stems from the fact that 
multiple observations were taken from many 
of the studies reviewed. This clustering of 
observations within studies results in the 
probable underestimation of the standard 
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error of estimates and hence in the overesti- 
mation of the corresponding F values (Kish, 
1957). Unfortunately, since the noninde- 
pendence of dependent values within clusters, 
if any, is confounded with the impact of the 
independent variables investigated, we know 
of no way to identify the magnitude of the 
problem and still retain all 160 observations. 

We attempted to obtain some information 
regarding the appropriateness of generaliza- 
tion by replicating the analyses performed on 
the population in two independent subsamples 
drawn from the population. These samples 
were obtained by randomly choosing one ob- 
servation per study, subject to the constraint 
that an observation could appear only once 
in the two samples. This procedure resulted 
in subsamples of п = 31 and s — 27 (six 
studies had only one observation). Variance 
explained was used as a dependent variable, 
as were two transformations aimed at gen- 
erating a dependent distribution approximat- 
ing normality. The first was Fisher’s r to z 
transformation, and the second was a trans- 
formation derived according to procedures 
suggested by Hinkley (1977). 

Generally speaking, the direction of results 
from these analyses was similar to the direc- 
tion of results obtained on the population of 
observations, Self-report dependent variables 
were more predictable than others’ reports in 
both samples, as in the population. Produc- 
tivity was more predictable than others’ re- 
ports in one sample, but the two were about 
equally predictable in the other. The inter- 
mediate number of outcomes was associated 
with highest average variance explained in 
one sample. In both samples, as in the popu- 
lation, lowest variance explained on the 
average occurred in the category with 16 or 
more outcomes, Also as in the population, 
valence scaled only positively resulted in 
greater variance explained in both samples. 
Moreover, in both samples the average vari- 
ance explained: was greater in studies that 
did not measure expectancy or that con- 
founded this measure with instrumentality. 

However, none of the coefficients of deter- 
mination generated on the variance-explained 
values or on the transformed values were 

statistically significant (2 < .05) in either 
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sample. This lack of significance is probabl 
due to the small sample sizes that were ne 
essary to achieve independence of obse 
tions, as well as due to the large numbers 
independent variables (relative to samp 
size). Thus, inferences drawn about the pro 
able outcomes of future research findin 
from the results obtained here must be ma 
cautiously. 

Despite these qualifications, there is X 
nagging suspicion that expectancy theor| 
overintellectualizes the cognitive proci 
people go through when choosing alternatiy 
actions (at least insofar as choosing a leve 
of performance or effort is concerned). 
results of the present review are consisten 
with this suspicion. At the very least, wheth 
for theoretical or measurement reasons, ош 


force have not aided prediction in betweem 
subjects investigations. 
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The Solzhenitsyn Finger Test: 


A Significance Test for Spontaneous Recovery 


Edmund B. Coleman 
University of Texas at El Paso 


Spontaneous recovery is generally considered to be an unreliable phenomenon, 
but this is because of the fact that an inappropriate significance test has been 
used in past research. The research hypothesis states that spontaneous recovery 
is a monotonic function of time. Time is an ordered continuous variable; it 
flows steadily in one direction. The interaction test that has been traditional, 
however, is only appropriate for an unordered discrete variable. To illustrate 
this point, the study most frequently cited as evidence against spontaneous 
recovery was retested with significance tests appropriate for an ordered variable 
(specifically, the correlation coefficients tau and rho), and the findings were 
shown to be evidence for, not against, a recovery effect. 


Spontaneous recovery in verbal learning is 
not widely considered to be a reliable phe- 
nomenon. In his general text on verbal learn- 
ing, Jung's (1968) first sentence about the 
topic is, “Evidence regarding spontaneous 
recovery has been weak" (p. 118). Saltz 
(1971) wrote, “None of the studies . . . has 
reported evidence supporting the spontane- 
ous recovery theory of proactive inhibition" 
(p. 218). Much the same evaluation is found 
in other texts and in reviews (Hall, 1971, p. 
492; Hulse, Deese, & Egeth, 1975, p. 353; 
Keppel, 1968, p. 185). 

But after an overall review of the research, 
Brown (1976) reached a different conclusion 
and suggested that the problem may be “that 
a more sensitive statistical or analytical mea- 
sure may be required” (p. 336). The pur- 
pose of this article is to state Brown’s sug- 
gestion in stronger language. The spontane- 
ous recovery design needs a correct analysis; 
the interaction test that has been used for 
over 20 years does not fit the research hy- 
pothesis, ` 

The most frequently cited study of spon- 
taneous recovery (Koppenaal, 1963) can 
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serve to illustrate the design and traditional 
analysis. Koppenaal used an A-B, A-C para- 
digm; that is, his subjects first learned an 
A-B list (shiny-bitter) and then an A-C list 
(shiny-pretty). There was also a control 
group who learned a single list. Figure 1 
presents his findings, but the plot has been 
simplified by omitting data unrelated to the 
present argument. The traditional test for 
spontaneous recovery has been to test the 
mean square of the interaction between lists 
and time. Koppenaal’s nonsignificant F of 
1.30 and similar findings by Slamecka 
(1966) have been widely cited as evidence 
against spontaneous recovery. But given 
Brown’s (1976) analysis of the overall evi- 
dence, the negative evaluation of spontane 
ous recovery findings by those in the field 
becomes unconvincing. A reexamination of 
the statistical logic appears in order. 


CC OO 


The Solzhenitsyn Finger Test 


In August 1914 (1971/1972) Solzhenitsyn 
has his alter ego demonstrate to General 
Samsonov that the Russian íront line i5 
dangerously overextended with what could 
be called the Solzhenitsyn Finger Test, a? 
important advance in instrumentation ove 
the Interocular Traumatic Test (Edwards, 
Lindman, & Savage, 1963). Colonel Voro- 
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Figure 1. Number of correct responses for List 1 
and the control list. The data are from Koppenaal 
(1963). 


tyntsev, using his fingers as dividers, swings 
off six spans on the war map, showing Sam- 
sonov that his front-line troops are 6 full 
marching days ahead of staff headquarters. 

Let us apply Colonel Vorotyntsev's finger 
test to the curves of Figure 1. The research 
hypothesis is that as time passes, the sup- 

ressed associations in List 1 spontaneously 
ecover, thus causing its curve to converge 
toward the curve of the control list. Using 
your fingers as dividers, measure the differ- 
ence between List 1 and the control list for 
the 1-minute interval. It will be about 3.2 
responses. Now do the same for the 20-min- 
ute interval. It will be smaller—about 2.8 
responses. If you continue for the longer and 
longer intervals, you will get smaller and 
Smaller differences. 

The charade has been pushed dangerously 
Close to the point of insult. The point is that 
| time is an ordered continuous variable; it 
“only flows in one direction. Thus, the re- 
Search hypothesis predicts an ordered de- 
| Стеазе in differences; the significance test 
рек has been in use for over 20 years, how- 

er; is unordered. Researchers have been 
wp e interaction between lists and time, 
E id interaction is based on the un- 

The He ; variance between the differences. 
ў €raction mean square for Koppenaal's 


— — 


149 


curves would have been exactly the same if 
the largest difference (3.2) had occurred at 
the second interval, the third interval, or 
at any other interval including the one for 
1 week. 

There are a number of powerful tests that 
fit an ordered hypothesis like the one for 
spontaneous recovery—the difference Бе- 
tween correlation coefficients, the difference 
between slope coefficients, the linear com- 
ponent of the differences between the curves, 
and the like. Applying the more powerful 
tests to Koppenaal’s data, however, would 
require estimation of certain terms and ex- 
tensive justification of these estimations. A 
more economically presented test could be 
based on Kendall’s (1962) tau. Since the 
test would use the degrees of freedom for 
the seven intervals instead of those for Kop- 
penaal's 168 subjects, it would be some- 
what overconservative; but perhaps that is 
a virtue in the present context. 

For the seven intervals, the differences 
between the two curves are 3.2, 2.7, 2.15, 
1.55, 2.2, 1.6, and .5. The research hypothe- 
sis (that as List 1 spontaneously recovers 
over time, the differences will become pro- 
gressively smaller) can be expressed as 18 
predictions: The difference between the 
curves at the 1-minute interval will be 
larger than the following six differences (all 
6 predictions are correct). The difference at 
the 20-minute interval will be larger than 
the following five differences (all 5 predic- 
tions are correct). The difference at the 90- 
minute interval will be larger than the fol- 
lowing four (3 predictions are correct and 1 
is incorrect), and so on, as per Kendall 
(1962). Out of the 18 predictions, 15 con- 
form to the research hypothesis. The ratio 
of 15 to 3 can be evaluated by Kendall's 
table of S values (5 = 18 — 3 = 15) and is 
significant (p < .015, one-tailed). 

An approach with the same underlying 
logic is to correlate the seven differences 
with their intervals (р = .82, p < .025). If 
one uses a more powerful test that replaces 
the degrees of freedom for the seven intervals 
with the degrees of freedom for Koppenaal’s 
(1963) 168 subjects, the significance level 
will become more extreme. In short, the 
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finding most frequently cited as evidence 
against spontaneous recovery is, on the con- 
trary, strong evidence in its favor. 

In his Table 1, Brown (1976) listed 21 
studies of spontaneous recovery. All 21 
showed a trend toward recovery, but 8 of 
them (including Koppenaal's) reported a 
nonsignificant interaction. If the studies were 
retested with an ordered test, it is likely that 
several more of the negative reports would 
turn out to be evidence in favor of spon- 
taneous recovery. 

It is important to note that Type I errors 
are possible when the interaction is used to 
test an ordered function. In the case of 
spontaneous recovery, it is true, the op- 
posite error is more likely to have been 
made. Because of the excessive weakness of 
the inappropriate test, which in turn led to 
a few incorrect negative evaluations, several 
characteristics of spontaneous recovery that 
are cornerstones of its theoretical relevance 
have not been vigorously pursued as re- 
search topics, for example, recovery over 
long intervals, recovery of specific suppres- 
sions, recovery of bonds not taught in labo- 
ratory lists, and others. Given the warranted 
power of an ordered test, perhaps such char- 
acteristics will appear in future research. 
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On the Nature of Taste Qualities 
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The concept of four basic tastes developed historically on the basis of a number 
of criteria. Modern evidence, largely electrophysiological, has led some inves- 


tigators to reject the concept of basic ta: 


multidimensional space. The present article reviews data that supp 
s concluded that this concept, as well as the separate 
lytic or synthetic sense, is compatible with either of 


ity of the basic tastes. It i 
question of taste as an ana 


the two major positions on sensory coding in taste, 


pattern theory. Taste may profitably be 
but intera: 


skin senses. 


Every introductory psychology text in- 
forms its readers that there are four taste 
qualities: sweet, salty, sour, and bitter. Such 
a statement may reflect the views of a ma- 
jority of workers in taste at the present time, 
but it glosses over a long historical develop- 
ment and considerable current ferment (Mc- 
Burney, 1974). On the one hand, those in the 
psychophysical tradition tend to accept the 
consensus of four taste qualities because of its 
obvious convenience and predictive usefulness. 
Those who are more physiologically oriented, 
however, tend to reject the notion of four taste 
qualities as naive. For example, Uttal (1973) 
wrote, “The psychophysical evidence con- 
tinually seems to evoke the use of the four 
basic taste words. Yet . . . it is moot whether 
this must be interpreted as a reflection of the 
underlying biological mechanisms or of the 
evolved language of gustatory experience” (p. 
603), Similarly, Schiffman and Erickson 
(1971) commented with respect to their psy- 
chophysical model of taste, “We would like 
to emphasize that the present model does 

_ hot require, or support, the idea of taste 
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stes in favor of a taste continuum or 
ort the valid- 


labelled line theory and 
considered to comprise four distinct 


cting sensory modalities or submodalities analogous to the (other) 


primaries. That is, although the ordering of 
stimuli may, with some imagination, be seen 
as resulting in four rather indistinct stimulus 
groups . . . we do not suggest here that . . . 
any of these stimuli may be usefully de- 
scribed by primary tastes" (p. 631 Ne 

It is our purpose in this article to review 
the history of the development of the concept 
of taste qualities and to discuss the theoreti- 
cal issues involved in relating the psycho- 
physical and physiological approaches to un- 
derstanding sensory coding in taste. We argue 
that the four taste qualities together exhaust 
the qualities of taste experience. We propose 
further that the four taste qualities, far from 
being arbitrary, may profitably be thought of 
as representing separate but interacting sen- 
sory systems in the same way that we think 
of the skin senses. Although the focus of the 
article is on the status of the qualities of 
taste, it is necessary to discuss the related 
issues of the nature of the stimulus dimen- 
sion(s) for taste and the physiological mech- 
anisms that code the dimension(s). The two 
major positions of sensory coding in taste 
are known as the labelled line theory and the 
across-fiber pattern theory. The labelled line 
theory holds that the quality information is 
carried by the fiber that is most responsive 
to a particular stimulus, whereas the pattern 
theory holds that the information consists in 
the relative firing rate of two or more fibers. 
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Furthermore, the question of whether taste is 
analytic or synthetic in the way it responds 
to mixtures has become an issue in the de- 
bate over gustatory sensory coding. We argue 
that the question of the status of the four 
taste qualities is independent of the resolu- 
tion of both the analytic-synthetic debate 
and the labelled line versus pattern theory 
debate. 

It is necessary at this point to define taste, 
for the purpose of this article, as those sensa- 
tions mediated by the taste buds. As such, the 
sense of taste is part of a perceptual system 
that involves all of the chemically sensitive 
nerves and end organs of the oral and nasal 
cavities that aid in the investigation of the 
chemical environment (cf. Gibson, 1966). 
Thus, taste is distinguished from flavor, 
which comprises other sensations mediated 
by olfaction, touch, temperature, and even 
vision. The distinction between taste and 
flavor has formed the basis of the vast bulk 
of the work in taste and is crucial to the 
understanding of the logical status of the 
taste qualities. 


Relation of Psychophysical Data to 
Physiological Data 


A fundamental problem in the analysis of 
a sensory system is to relate physiological 
and psychophysical data, that is, to develop 
what Brindley (1960) called linking hypoth- 
eses: "If a physiological hypothesis, ie. a 
hypothesis about function that is stated in 
physical, chemical and anatomical terms, is 
to imply a given result for a sensory experi- 
ment, the background of theory assumed in 
conjunction with it must be enlarged to in- 
clude hypotheses containing psychological 
terms as well as physiochemical and anatomi- 
cal" (p. 145) ones. 

The problem in taste research is that there 
is, at present, little agreement on the status 
of the psychological terms that are presum- 
ably to be explained by the physiological data 
by the way of linking hypotheses. Although 
some may wish not to be constrained by the 
categories of human taste experience, it is 
certainly true that terms like sweet and salty, 
which are unquestionably derived from hu- 
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man experience, are used in taste physiology 
in ways that appear definitive. Are the taste 
qualities primaries and, if so, in what sense 
of the word? Are they arbitrary terms bor- 
rowed in a naive way from our language? Or, 
are they simply arbitrary locations in a multi- 
dimensional perceptual space? 


Uses of the Term Primary 


The psychological distinctiveness of sweet, 
salty, sour, and bitter has tempted many to 
label these qualities primary. The term pri- 
mary appears to have been borrowed from 
vision, and its application to any other sen- 
sory system unavoidably invites comparison 
with color perception. The term can be a 
source of confusion because it is used in a 
number of different ways. It 15 used vari- 
ously to refer to (a) sensations that are 
psychologically (phenomenologically) distinct 
and that cannot be analyzed introspectively 
into two or more categories, such as the color 
yellow; (b) stimuli that may be mixed to 
produce sensations that are normally associ- 
ated with a third stimulus, as with the visual 
mixture primaries (it should be noted that 
the primaries in this usage are not unique in 
the case of vision; there are a large number 
of possible mixture primaries); (c) stimuli 
that are the best exemplars of some sensa- 
tion, such as NaCl for salty or 585 nano- 
meters for yellow; (d) receptors, neurons, ог 
neural mechanisms that are especially sensi- 
tive to a particular stimulus. This last usage 
often implies a one-to-one correspondence 
between two or more of the meanings of the 
term primary, for example, primary stimulus = 
primary neural element — primary sensation. 
It will be recognized that this sort of iso- 
morphism does not hold in the case of vision. 
The mixture primaries need not be psycho- 
logically primary; the receptor primaries are 
not the same as the sensation primaries, and 
the neural primaries are still different. It is 
little wonder that workers in vision tend to 
eschew the use of the term primary alto- 
gether. 

The related problems of primaries and psy- 
chophysical linking hypotheses are illustrated 
by Frey’s (cited in Melzack & Wall, 1962) 


TASTE QUALITIES 


theory of cutaneous sensory coding. As Mel- 
zack and Wall pointed out, Frey's theory in- 
volves a number of assumptions: (a) that 
there is a one-to-one correspondence between 
dimensions of sensory experience and types 
of receivers in the brain; (b) that there are 
distinct afferent nerves; (c) that there are 
distinct receptors in the skin; and (d) that 
there are distinct stimulus dimensions that 
subserve the dimensions of sensory experi- 
ence. The same assumptions are often made 
in discussions of taste primaries. However, 
the naive notion of a direct correspondence be- 
tween levels of processing need not hold for 
the notion of psychological primaries (the 
first usage of the term mentioned earlier) to 
be valid. We prefer the term basic taste to 
primary taste because it is used unambigu- 
ously to refer to the sensation. 


Problems of Gustatory Sensory Coding 


There are at least three separate problems 
to solve before it can be said that we have 
achieved a theory of sensory coding in taste: 
(a) the nature of the sensations, (b) the na- 
ture of the stimulus dimension(s), and (c) 
the physiological mechanisms that subserve 
the coding of the sensations. Each of these is 
discussed below. As one will see, the analysis 
of taste sensation has had a longer historical 
development than that of the other two prob- 
lems. Our knowledge of the physiological 
mechanisms involved in the coding process 
and of the nature of the stimulus dimensions 
is far from complete. Therefore, one is forced 
to start with the sensory analysis and work 
backward to the stimulus to achieve a theory 
of sensory coding. This process involves what 
Boynton and Onley (1962) called a converse 
linking hypothesis, that is, inferring stimulus 
or neural events from sensation rather than 

. the reverse. As Boynton and Onley pointed 
out, it is much more tenuous to conclude that 
a particular converse linking hypothesis is 
true, because many different stimuli can give 
tise to the same response, whereas the same 
stimulus (or the same neural activity) should 
always give rise to the same sensation. 

Much of the psychophysical and physiologi- 
cal taste research has involved the converse 
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linking hypothesis either implicitly or ex- 
plicitly. Thus, the position taken with respect 
to the nature of taste sensations has an im- 
portant practical impact on the approach to 
the investigation of stimulus dimensions and 
physiological mechanisms. For example, the 
systematic search for similarities among a 
large group of stimuli that all give rise to the 
same taste quality rests on the assumption 
that there are a limited number (e.g., four) 
of discrete taste qualities. Birch (1976), for 
example, reviewed the correlation between 
the molecular structure of sugars and the in- 
tensity of their sweet taste. And similarly, 
Dastoli and Price (1966) extracted “sweet- 
sensitive” proteins from the bovine tongue. 
The choice of exemplars of salty, sour, sweet, 
and bitter stimuli in almost all psychophysi- 
cal and physiological research usually rests 
on the assumpution that these four qualities 
are the taste sensations. On the other hand, 
by assuming the continuous nature of taste 
sensations, Schiffman and Erickson (1971) 
found that taste stimuli formed a multidi- 
mensional space that they believed to be in- 
compatible with four basic qualities. 


Historical Development of Taste Categories 


The origins of the current views on the 
nature of taste qualities may be found in 
the historical development of the psychologi- 
cal sensation categories and their relationship 
to the taste system as it was known prior to 
electrophysiological recording. Our summary 
of some of the historical aspects of the taste 
system is based largely on Bartoshuk’s 
(1978) review of the history of taste. 

The naming and categorization of taste 
sensations date back to the Greeks. Aristotle 
(384-322 B.C.), for example, listed seven 
basic tastes: sweet, bitter, sour, salty, as- 
tringent, pungent, and harsh. Between the 
time of the Greeks and the 19th century, the 
number of taste sensations described as basic 
varied among the seven named by Aristotle, 
the eleven by Haller (in 1786), an unlimited 
number by Rudolphi (in 1823), the six by 
Wundt (in 1880), the five by Ohrwall (in 
1891), and the four by Kiesow (in 1896). 
During the 1800s the investigation of the 
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anatomy of the taste system prompted the 
elimination of some previously named basic 
tastes as actually being the result of olfaction 
or touch (Bartoshuk, 1978; Öhrwall, 1891; 
Skramlik, 1926). Thus, by the early 1900s 
salty, sour, sweet, and bitter were generally 
agreed upon as basic tastes with alkaline or 
insipid often also included. 


Classical Bases for the Four Taste Qualities 


The basic taste qualities were arrived at 
largely by introspective analysis, but other 
evidence was used to support their existence. 
One justification for the four qualities con- 
sidered was the adaptiveness of the associa- 
tion between the various taste qualities and 
the nutritive or poisonous characteristics of 
chemical substances. For example, salt for 
some animals is a vitaly important and 
scarce nutrient; for others the problem may 
be to avoid the osmotic consequences of high 
salinity (Dethier, 1977). Sweet is correlated 
with high calorie substances, bitter with 
poisons, and sour with corrosive substances. 
In fact, analysis of taste stimuli suggests 
broad and reasonably powerful predictive 
validity for the taste qualities, particularly 
between sourness and the H* ion, but also 
between sweetness and carbohydrates, bitter- 
ness and alkaloids, and saltiness and alkali 
halide salts. Other evidence for four qualities 
was the work showing the differential sensi- 
tivity to the four qualities across the tongue 
(cf. Boring, 1942; Hanig, 1901); the selec- 
tive disappearance of quality sensitivity fol- 
lowing the application of topical anesthetics 
(Bartoshuk, 1978; Kiesow, 1894b); the 
selective modification of taste quality follow- 
ing gustatory contact with certain plant sub- 
stances—G ymnema sylvestre and Synsepalum 
dulcificum (Kiesow, 1894b; Shore, 1892; 
Skramlik, 1926); and the lack of synthetic 
effects of taste mixtures (Kiesow, 1894a, 
1896). 


Modality Versus Quality 


By the turn of the century the number of 
the basic tastes was generally agreed upon, 
but agreement did not exist as to the nature 
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of these taste categories. The views of Helm- 
holtz (1879/1968) on perception in general 
and his distinction between sensory qualities 
and sensory modalities in particular had a 
great influence on those who were to begin 
the debate on the nature of basic tastes nearly 
a quarter of a century later. Helmholtz clas- 
sified sensations as belonging to separate 
modalities if there were no transitions be- 
tween them; for example, “blue, sweet, warm, 
and the pitch of tones" (p. 210) belong to 
separate modalities. The term quality was 
applied to different perceptions within the 
same sense; for example, blue and violet 
would be two qualities within color vision. 

The modality-quality distinction was ap- 
plied to taste, and the debate began. On one 
side were those who thought of salty, sour, 
sweet, and bitter as four qualities within one 
sense; the alternative view was that the four 
basic tastes actually represent four separate 
sensory modalities. Kiesow, a student of 
Wundt, was one of many who thought of the 
four basic tastes as qualities. He went fur- 
ther, however, and proposed a theory to re- 
late these qualities that was based on an 
analogy to the opponent-process theory of 
color vision proposed by Hering, with whom 
he was personally acquainted (Murchison, 
1930). Hering's theory was based on his ob- 
servations of the simultaneous contrast ex- 
hibited by the primary visual hues of red 
and yellow or blue and green. It was Kie- 
Sow's view that the qualities salty, sour, 
sweet, and bitter are actually the four taste 
"primaries." However, he had only moderate 
success in demonstrating simultaneous and 
successive contrast in taste (Kiesow, 1894a, 
1896). Óhrwall (1891, 1901), on the other 
hand, argued that there was no "transition" 
to be found among any of the four basic 
tastes and that they were, therefore, four 
separate modalities, Contrary to Kiesow, he 
maintained that the phenomena of simulta- 
neous contrast and successive contrast did 
not exist in taste. Óhrwall criticized Kiesow's 
taste theory analogy with color vision as 
faulty, but Kiesow’s view that the four tastes 
are primary qualities prevailed. 

Research on the response of the taste sys- 
tem to a mixture of two or more sapid sub- 
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stances does not support these analogies to 
color vision. First, no new qualities are pro- 
duced by a taste mixture, which strongly 
suggests that taste sensations are limited 
(Bartoshuk, 1975). That is, a mixture of 
NaCl and НСІ will taste salty and sour but 
never sweet and/or bitter and/or something 
else. There is no gustatory analogue to 
orange. Second, it is not possible to produce 
a tasteless mixture by an appropriate com- 
bination of components, unless the compo- 
nents are themselves so weak as to be nearly 
tasteless. Thus, there is no gustatory ana- 
logue to gray. On these bases, taste has been 
considered an analytic sense rather than a 
synthetic sense. 

Most researchers preferred to think of 
salty, sour, sweet, and bitter as four qualities 
as opposed to the stronger view that they 
are four separate modalities. However, there 
was still disagreement among those support- 
ing the more popular quality view as to the 
relationship among those qualities. Applying 
the label primary to each quality and making 
analogies with theories of color vision domi- 
nated and continue to dominate most the- 
ories of taste. For example, in the early part 
of the 20th century, Henning (1927) sug- 
gested that the relationships between the four 
basic tastes could be represented by a hollow 
tetrahedron. By placing salty, sour, sweet, 
and bitter on the corners, all other tastes 
could be placed either along the continuum 
between two tastes (e.g, on the salty-bitter 
edge) or on the surface formed by three 
basic tastes. Thus, all tastes could be repre- 
Sented as appropriate mixtures of two or 
three of the four "primaries." Representing 
à much less popular alternative to the pri- 
Mary qualities - color vision analogy, Hahn 
(1934; Hahn, Kuckulies, & Bissar, 1940; 
Hahn, Kuckulies, & Taeger, 1938; Hahn & 
Ulbrich, 1948) argued that salty, sour, sweet, 
and bitter (and alkaline) are independent 
qualities (Bartoshuk, 1978). Hahn based 

views on his work in adaptation and 
Cross-adaptation and their effects on taste 
thresholds, 

Although there was disagreement on 
Whether the taste system operates like color 
Vision, descriptions of the relationships 
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among different tastes rested on some varia- 
tion of Müller's influential “doctrine of spe- 
cific nerve energies” (cited in Boring, 1950). 
The general assumption was that there is a 
simple, if unspecified, correspondence be- 
tween some physical attribute of the stimu- 
lus, the operation of the nerve, and the 
sensation produced. With the development 
of techniques for recording the electrical ac- 
tivity of single nerve fibers, the search for 
neural correlates of the human taste experi- 
ence was broadened to other mammalian 
nervous systems. Although the historical psy- 
chophysical data suggested a search for spe- 
cific salty, sour, sweet, and bitter fibers, 
Pfaffmann’s (1941) finding that the four 
basic tastes were not mediated by specific 
fibers created a problem that is still with us 
today. 


Modern Search for Taste Qualities 


With the advent of the vacuum tube and 
the subsequent development of gustatory 
electrophysiology, it became apparent that 
no neat correspondence exists between taste 
qualities and individual taste nerve fibers 
(Pfaffmann, 1941). One of the results of this 
failure of the straightforward extension of 
the doctrine of specific nerve energies has 
been to intensify the sort of theorizing typi- 
fied by Kiesow (1894a, 1894b, 1896), which 
makes use of the analogy of the senses. It 
is inevitable that such comparisons be made, 
and, of course, they are beneficial to the ex- 
tent that the various senses display common 
mechanisms of neural functioning. As one 
sees below, the analogy of the senses has 
been used to raise a number of objections to 
the four taste qualities. 


Cultural Relativity Objection 


The first objection is the notion that the 
taste qualities are culture bound; that is, if 
one were to study other cultures, or to throw 
away the usual categories and start from 
scratch, one might end up with a completely 
different set of taste categories. Perhaps. 
But the best example we have of such a pos- 
sibility is the number of names for snow em- 
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ployed by Eskimos. However, this example 
is not very pertinent because, with some 
training, any person could discriminate the 
bases for the distinctions among the names. 
Studies of non-Western cultures (Chamber- 
lain, 1903; Myers, 1904) find that various 
languages often lack specific names for. one 
or more of the four taste qualities. Even 
though one or more of the four may be lack- 
ing, one does not seem to find cultures that 
make more distinctions than the four basic 
tastes that cannot be explained as describing 
flavor as opposed to taste, for example, as- 
tringent. Recently it has been found that 
Malay speakers use the same taste words as 
English speakers but tend to use more modi- 
fying adjectives (O'Mahony & Muhiudeen, 
1977), There is, of course, a parallel con- 
troversy in color vision (e.g., Bornstein, 
1973). Heider (1972) studied recognition 
memory and learning of associations to focal 
and nonfocal colors by New Guinea Dani, 
who lack names for hues in their language. 
(Focal colors are those that best exemplify 
common color names, e.g., red, orange, yel- 
low, etc.) She found that the Dani made 
fewer errors in recognition of focal com- 
pared to nonfocal colors and learned associa- 
tions to focal colors faster than to nonfocal 
colors. Heider also found that native speak- 
ers of all language families used shorter 
words or phrases to describe focal colors 
than to describe nonfocal colors. Heider’s 
experiments show that the effect of language 
on perceptual categorization has been over- 
stated, at least for hues. In the absence of 
similar work in taste, Heider’s data imply 
that taste names are unlikely to be influenced 
greatly by language, The weight of evidence 
favors the notion that the lack of compara- 
ble names for certain colors in various cul- 
tures has a basis in the relative lack of util- 
ity of these names in primitive cultures 
(Woodworth, 1910) rather than in a funda- 
mentally different way of perceiving. 


Multidimensional Space Objection 


Similar to the cultural relativity objection 
is the suggestion made by Frings (1948) 
that “the primary modalities of taste might 
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. .. be simply points of familiarity in an 
unbroken series of stimulative values. . . 
The work of Pfaffmann on electrical impulses 
from tastebuds, although not fully explicable 
by this hypothesis, is certainly more nearly 
explained by it than by the hypothesis of 
primary modalities" (p. 32). This sugges- 
tion makes explicit the notion of a con- 
tinuum, although none has been identified. 
Erickson and his co-workers have followed in 
the direction suggested by Frings (cf. 
Doetsch, Ganchrow, Nelson, & Erickson, 
1969; Erickson, 1963, 1967, 1968; Erick- 
son, Doetsch, & Marshall, 1965; Erickson 
& Schiffman, 1975). What Erickson and his 
co-workers were looking for was a taste “wave- 
length" that would permit an isomorphic 
correspondence among the chemical, neural, 
and psychological processes. “Stimuli which 
are proximate in a multidimensional space 
on the basis of the similarity of the neural 
inputs [in terms of across-fiber patterns] 
are also similar in terms of psychophysical 
judgments" (Schiffman & Erickson, 1971, p. 
632). Erickson et al. (1965) attempted to 
find an isomorphism between the neural and 
the chemical domains. They asked, "Is it 
possible to derive the NRFs [neural response 
function, ie., “some measure of the neural 
activity as a function of a stimulus dimen- 
sion" (p. 262)] for taste without knowledge 
of the relevant stimulus dimensions? Fur- 
ther, is it possible to discover the stimulus 
dimensions? Data presented . . . show that 
the NRFs and stimulus dimensions for taste 
may be determined" (pp. 248-249). But 
even though it might be argued for vision that 
a sort of isomorphism exists among the 
stimulus, receptor, neural, and psychological 
levels, it is a rather rubbery isomorphism 
that involves, among other things, a shift 
from three types of receptors to two types of 
opponent-process neural elements. It is true 
that the spectrum is preserved through all 
of these levels but not isomorphically with 
respect to the number of mechanisms in- 
volved. It seems better in the case of vision 
to postulate something other than a strict 
isomorphism as a linking hypothesis. 
Schiffman and Erickson (1971) have pro- 
posed a model for gustatory quality that is 
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based on multidimensional scaling of taste 
stimuli. They presented a large number of 
solutions to subjects who scaled them for 
similarity and on a number of semantic di- 
mensions. They derived a taste space from 
this procedure that, they argued, does not 
support the notion of taste primaries. There 
are several problems in accepting this con- 
clusion, problems with the experimental 
method, problems with the multidimensional 
scaling, and problems in interpretation. 
First, and fundamental to the whole argu- 
ment of their article, they did not take ade- 
quate precautions to assure that they stimu- 
lated only the sense of taste, as it is usually 
defined for psychological purposes, namely, 
taste bud stimulation. Their subjects wore 
nose plugs “to reduce olfactory input,” but 
this is not sufficient to eliminate the sense of 
smell. A better procedure would have been 
to use the technique of forcing a gentle 
stream of air into the nostrils (Mozell, Smith, 
Smith, Sullivan, & Swender, 1969), thereby 
- preventing the reflux of odorous air into the 

nasal cavity. This is a curious procedural 
deficiency, since the purpose of the experi- 
Ment was to determine the dimensionality of 
a taste space. Any contribution from the 
Sense of smell seriously distorts the taste 
Space. This weakness is particularly signifi- 
cant considering that the history of taste 
qualities shows a trend toward a decrease in 
the number of taste qualities as a result of 
Increasing care in eliminating sensations medi- 
ated by other sensory systems. It was such 
Careful psychophysical work that ruled out 
the alkaline and metallic tastes many years 
ago. Skramlik (1926) reviewed work by Frey 
and Herlitzka that showed clearly that the 
alkaline and metallic tastes resulted from 
Olfactory sensations and disappeared when 
these were excluded. Another standard 
Method of determining whether a sensation 
quality can be attributed to taste is to place 
the substance on parts of the tongue, such 
as the center, that are known to be devoid 
Of taste buds. Such observations may well 
have obviated discussion of other nontaste 
Qualities like bitey, burning, and tingling. 
\ Bp review demonstrates clearly that 

> 


earlier workers were keenly aware of 
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these problems in the psychological analysis 
of taste qualities. 

The scaling problem in the Schiffman and 
Erickson (1971) experiment is inherent in 
the use of multidimensional scaling. Although 
multidimensional scaling has been used a 
great deal with considerable impact in cer- 
tain areas, it may fairly be said that its con- 
tribution to sensory processes has not been 
the discovery of unknown dimensions as much 
as the improvement of our understanding 
of the metric relationships among known 
dimensions. Further, multidimensional scal- 
ing is extremely sensitive to the choice of 
stimuli to be scaled. If most of the stimuli are 
very similar to one another, one sort of di- 
mensionality will result. If a single stimulus 
is added that is very different from all of 
the other stimuli in the sample, an entirely 
new dimension will emerge as a result. For 
this reason, scaling of a small subset of 
stimuli often gives fundamentally different 
results from the scaling of the entire set. 
This is particularly true in smell (e.g., Schiff- 
man, 1974) and taste (e.g., Gregson, 1966). 
In addition, different multidimensional scal- 
ing methods often give different results. This 
can be seen in the data of Schiffman and 
Erickson (1971), in which the scaling based 
on similarity led to three dimensions and 
the scaling based on the semantic differ- 
ential data gave two. Schiffman and Erickson 
chose to accept the similarity data for the 
space, but they interpreted the dimensions 
by means of semantic differential scales. On 
this basis they came up with the following 
three dimensions: molecular weight, hedonic, 
and deviation of pH from neutrality. Keep- 
ing in mind that the method of stimulus pre- 
sentation and the scaling technique are both 
weak, the interpretation of the space presents 
problems. First, one of the dimensions, he- 
donic, is a purely psychological dimension. 
The other two, pH and molecular weight, are 
purely physical. Second, these dimensions 
are not orthogonal; hedonic is correlated 
with molecular weight. In addition, there are 
some large inconsistencies in the locations 
of stimuli along the molecular weight di- 
mension. It is difficult to see how such a 
space could be useful in devising linking hy- 
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potheses about the coding of taste qualities. 
In any case, it is fair to say that the four 
taste qualities fall into four clusters in their 
space. It appears that the data would be at 
least as well described by the four traditional 
taste qualities, with those that fall outside of 
the space defined by the four qualities repre- 
senting, in part, extragustatory (e.g., smell) 
sensation. What Schiffman and Erickson have 
achieved is a description of a psychological 
taste space that neither supports nor denies 
the existence of four basic tastes. 

To return to the problem of interpretation 
of the multidimensional scaling data, con- 
sider the multidimensional scaling of colors 
(cf. Schiffman & Erickson, 1971), which to 
date has accomplished the validation of the 
Munsell color space. Although this is an 
achievement in its own right, it in no way 
constitutes a theory of sensory coding, in and 
of itself, because it is simply a description 
of the psychophysical relationships among 
the colors in terms of the physical stimulus. 

Multidimensional scaling of stimuli pre- 
sented to different senses has been performed. 
For example, Wicker (1968) had subjects 
rate the similarities between pairs of tones, 
Munsell color chips, or tones and chips in an 
effort to study synesthetic relationships among 
these two stimulus domains. He found that 
a single multidimensional space accounted 
well for both the colors and the tones. Other 
intersensory interactions are well-known, as 
the literature on auditory-visual synesthesia 
demonstrates (e.g, Marks, 1975). Although 
not all people experience synesthesia, and it 
may seem somewhat artificial to some, non- 
synesthetes relate color dimensions to auditory 
dimensions in the same way as synesthetes ; 
that is, subjects are very willing to relate 
sensations from systems that are undeniably 
distinct. Nafe (1927), using introspection, 
analyzed the qualities of the sensations medi- 
ated by the skin senses. He concluded “that 
the ‘qualities’ of felt experience . . . are 
analyzable patterns of experience . . . that 
such experiences vary in brightness, volume, 
density, and definiteness of outline" (p. 398). 
Had Nafe been working 50 years later he 
might have done his experiment using multi- 
dimensional scaling. The results he did obtain 
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using the best techniques of his day are con- 
sistent with our conclusion that multidimen- 
sional scaling does not necessarily yield the 
underlying sensory dimensions. In fact, mul. 
tidimensional scaling is often done on very 
diverse stimuli. The dimensions derived are 
not taken to represent any physical relation- 
ship among the stimuli. For example, in scal- 
ing animals, dimensions such as fierceness 
have been obtained (Henley, 1969). These 
are not interpreted as physical dimensions 
or taxonomic classifications (cf. Martindale 
& Hines, 1975). In addition, various interac- 
tions among the various senses are well- 
known. For example, cold and warm seem 
to be different sensory systems as far as their 
anatomy and physiology are concerned, even 
though their perceptual unity could be ar- 
gued. Nevertheless, they obviously interact 
to a great extent, and their interaction may 
even be responsible for the sensation of heat, 

Therefore, little seems to be gained from 
the multidimensional analysis of taste ex- 
perience as far as the question of the nature 
of the taste qualities is concerned. It follows 
that an isomorphism between the stimulus 
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and the sensation is not the only, or even. 


the best, conclusion to draw from multidi- 
mensional scaling of sensation. 


Recent Evidence for the Four Qualities 


Modern research has made much use of 
the four taste qualities, but more out of con- 
venience than out of conviction that they | 
are basic. For example, Bartoshuk (1974), 
wrote, “In light of this [controversy ov 
whether these taste qualities are primaries], 
some investigators have come to follow 4 
somewhat different strategy in studying taste 
quality. Instead of concentrating on finding. 
all primaries, they have turned to studying 
the functional properties of sweet, sour, salty 
and bitter" (p. 279). Even so, evidence for 
the limited number of taste categories comes 
Írom recent psychophysical studies of tbe 
effects of the following: cross-adaptation, 
water taste, drugs, taste-altering substances 
(Gymnema sylvestre and miracle fruit), mix 
tures, and spatial and temporal properties. It 
is true that taken alone these data have 3 
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certain circularity about them as evidence 
for the four basic tastes, The four taste qual- 
ities were used as response categories and 
may be seen as forcing the data in the direc- 
tion of evidence for basic tastes. It should 
be noted, however, that subjects in some of 
the experiments discussed below were given 
the opportunity to use additional categories 
to describe their taste sensations, but did not 
do so with any regularity (e.g., Bartoshuk, 
McBurney, & Pfaffmann, 1964). The point 
we wish to emphasize is that the four taste 
qualities are both necessary and sufficient to 
account for the qualitative and quantitative 
differences in the psychophysical data. 


Cross-Adaptation 


Psychophysical studies of the response of 
the taste system to one stimulus following 
adaptation to another support the limited 
number of taste sensations. It is true that 
this work on cross-adaptation could not have 
been done without having the subjects re- 
port on the taste qualities of the compounds. 
But it is clear that the effect of adaptation 
is to eliminate or substantially reduce the re- 
sponse to the quality to which the tongue has 
been adapted, whatever its name. Where the 
cross-adaptation was between stimuli sharing 
the same quality, the following results were 
reported: (a) Sucrose adaptation reliably 
reduced the sweetness of all sweet-tasting 
compounds tested (McBurney, 1972); (b) 
adaptation to NaCl reduced the saltiness of 
every salt tested (Smith & McBurney, 1969); 
(c) adaptation to citric acid reliably reduced 
the sourness of all other acids tested (Mc- 
Burney, Smith, & Shick, 1972); and (d) the 
results for bitter were not as clear-cut, since 
adaptation to quinine HCl reduced the bit- 
terness of some bitter compounds without 
affecting others (McBurney et al., 1972). 

_ The psychophysical data on cross-adapta- 
tion have implications for the number of re- 
ceptor mechanisms that must operate in the 
taste system. The effects of cross-adaptation 
ate found to a greater or lesser extent in all 
four qualities, and there is little across-qual- 
у adaptation, thus implying the operation 
of separate receptor mechanisms. The data 
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do not prove the existence of specific receptor 
sites, but they do imply that there are a 
limited number of receptor mechanisms. For 
example, it is likely that a single receptor 
mechanism codes the sweet taste, another 
codes saltiness, and a third codes sourness. 
The mechanism for bitterness, however, is 
more complex. 


Water Taste 


The water taste phenomenon, that is, the 
taste of water after adaptation to a com- 
pound, also supports the limited number of 
taste sensations. Each of the four qualities 
and no others have been produced as water 
tastes after suitable adaptation (Bartoshuk, 
1968; McBurney, 1969; McBurney & Bar- 
toshuk, 1972; McBurney & Shick, 1971). It 
should be noted that the water taste phe- 
nomenon cannot be explained by a simple op- 
ponent process. For example, adaptation to 
NaCl will cause water to have a bitter or 
sour-bitter taste; bitter and sour substances 
induce a sweet water taste; sweet substances 
produce a sour or bitter water taste; and 
salty water tastes have only been reliably pro- 
duced by urea and closely related compounds. 
Apparent cross-enhancement between qualities 
was found in some studies that used the 
cross-adaptation paradigm. However, the in- 
crease in the perceived intensity of a stimu- 
lus after adaptation was shown to be ex- 
plained by the water taste phenomenon (Mc- 
Burney & Bartoshuk, 1973). The increase 
was always in the quality normally pro- 
duced as a water taste rather than in the 
dominant quality of the compounds. 


Taste-Altering Substances 


The specificity of the effects of two taste- 
altering substances, found in Gymnema syl- 
vestre and Synsepalum dulcificum (miracle 
fruit), has provided information about the 
mature of taste sensations and possible taste 
receptor mechanisms. The replication and 
extension of some of the historical work on 
the effects of chewing Gymnema sylvestre 
showed that only the sensation of sweet is 
abolished, including the sweet water taste 
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(Bartoshuk, Dateo, Vandenbelt, Buttrick, & 
Long, 1969). Since stimuli that elicit a single 
taste quality (sweet) are all influenced in 
the same way by Gymnema sylvestre, the 
implication is that the taste quality of sweet- 
ness is distinct from the other qualities. 
There is also the implication that there is a 
single receptor mechanism that codes the 
sweet taste and that the gymnemic acid op- 
erates by competing with sweet compounds 
for the sweet receptor sites. Miracle fruit is 
a berry that, when chewed, causes sour sub- 
stances to taste sweet. The effect of this 
taste modifier can be explained by analyzing 
its effect on the particular qualities of sour 
and sweet. K, Kurihara (1971) and K. Kuri- 
hara, Y. Kurihara, and Beidler (1969) sug- 
gested that the action of miracle fruit is a 
result of the interaction on the taste cell 
membrane between the acid of a normally 
sour substance and the glycoprotein that is 
the active ingredient in the fruit. They sug- 
gested that the glycoprotein does not itself 
normally stimulate any receptor sites but 
that the presence of acid causes a change in 
the shape of the membrane in such a way 
as to cause contact between the glycoprotein 
and sweet sites. Thus, it is possible to under- 
stand the sweetening effect as the addition 
of a new taste (sweet) and the lack of in- 
crease in total intensity as a case of mixture 


suppression of sour by sweet (Bartoshuk, 
1975). 


Effect of Locus 


A reexamination of the variation in sensi- 
tivity over the tongue has produced evi- 
dence for quality specificity and some in- 
sight into the complexity of the receptor 
mechanisms that operate in the taste System. 
Collings’ (1974) investigation of the effect 
of locus of stimulation on taste threshold 
generally confirms Hanig's (1901) data on 
the differential sensitivity of the tongue and 
supports the distinctiveness of the taste sen- 
sations. The exception she found was that 
sensitivity to bitter was greater at the front 
of the tongue than at any other tongue locus, 
not at the back as Hanig reported. 

Stevens (1969) argued that the exponent 
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of the power function that relates sensation 
magnitude to stimulus concentration is prin- 
cipally determined by the characteristics of 
the receptor system. (Even though fairly 
consistent differences are found among the 
exponents for the four qualities | Meiselman, 
1972], we have not used this as evidence 
for basic tastes. The exponent is sensitive to 
several variables in addition to quality, 
namely, method of stimulation; range of 
stimuli; psychophysical method; presence; 
value, and location of standard; and so on 
[Stevens, 1975]. Furthermore, the exponents 
have never been determined with the exact- 
ness that would justify making a strong case 
on this basis.) Evidence for the distinctness 
of receptor mechanisms for taste qualities 
comes from the fact that, in general, locus 
of stimulation also has a differential effect on 
the psychophysical function that relates sen- 
sation to stimulus concentration, both be 
tween qualities and within one quality (Cok 
lings, 1974). The differences Collings found 
between qualities are compatible with һе | 
existence of four distinct sensations, but the 
differences she found across loci within qual- 
ities imply a complex receptor mechanism 
that is difficult to explain. Because Smith 
(1971) showed that varying the number of 
receptors stimulated (at the front of the 
tongue) does not change the psychophysical 
function for a given taste compound, Collings 
concluded that the observed differences in 
threshold were not simply the result of dit 
ferences in the number of particular receptors | 
present at each location. | 


Temporal Properties 


Another aspect of taste that has been 
studied is the temporal properties of th 
gustatory system (cf. McBurney, 1976)! 
Early research (1914-1955) on the temi 
poral response of the taste system falls int 
four areas: reaction time (Bujas, 1935; 
Piéron, 1914), growth of sensation (Виј 
& Ostojcic, 1939), relationship between ti 
and intensity (Bujas, 1934; Hara, 1955) 
and rate of adaptation (Bujas, 1953; Habit 
1949). The first three areas are closely 7 
lated, and the early research indicates tha 
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the temporal response of the system varies 
according to the quality (e.g. salty, sour, 
sweet, or bitter) of the stimulus. In particu- 
lar, reaction times to a stimulus were found 
to be a hyperbolic function of intensity and 
varied among qualities in increasing order 
from salty, sour, and sweet to bitter. A simi- 
lar ordering was found to hold for growth of 
sensation. Results concerning rate of adapta- 
tion are not so clear. Problems with the 
methods used to measure rate of adaptation 
and questions concerning the existence of 
complete adaptation prompted the search for 
another technique for studying temporal 
properties. 

The most recent psychophysical method of 
investigating the nature of taste qualities is 
the application of the techniques of linear 
systems analysis to the gustatory system. 
Work such as this has been going on for 
some time in vision (e.g., Sekuler, 1974). In 
our laboratory we attempted to investigate 
the temporal properties of the taste system 
(McBurney, 1976). Briefly, linear systems 
techniques can be used to study the input- 
output relationships of a linear system. By 
finding the ratio of a sinusoidally varying in- 
put to the output across a range of fre- 
quencies, a transfer function can be calculated 
that is characteristic of the temporal prop- 
erties of the system. Extension of this tech- 
nique to the gustatory system was made by 
Presenting the tongue with a solution of 
sinusoidally varying concentration. The out- 
Put studied was the threshold for several 
frequencies of stimulus presentation. A tem- 


Poral-modulation sensitivity function was ob- 


tained for each of the four qualities. Our re- 
Sults agree with the earlier data in that the 
ordering of the sensitivities of the four qual- 
ities as represented by the sensitivity func- 
tions was as follows: salty, sweet, sour, and 
bitter. These functions differentiated among 
qualities and therefore imply that the func- 
tions represent temporal characteristics of 
four separate systems. 


Inner Psychophysics of Taste 


Because the approach of this article has 
been Psychophysical, it is appropriate here 
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to point out a distinction made over 100 
years ago by the founder of psychophysics, 
G. T. Fechner—that between outer and inner 
psychophysics (Howes & Boring, 1966). Outer 
psychophysics is the familiar enterprise of 
relating sensation to external stimulus. Inner 
psychophysics is the relationship of sensa- 
tion to the physiology of the nervous system. 
Of course in Fechner’s day this was only a 
far-off dream, but we have made some prog- 
ress along these lines in recent years. The 
purpose of this section is to review the inner 
psychophysics of taste, the relationship of 
the physiological data on neural mechanisms 
(receptor sites, cells, and fibers) to the ex- 
istence of taste qualities. We discuss the 
implications these data may have for linking 
hypotheses concerning the neural coding of 
taste quality. 


Receptor Sites 


The nature of the receptor site(s) for 
taste is still poorly understood. However, 
Beidler (1971) has determined that NaCl 
and sucrose each bind to one receptor site 
and that KCl, NH,Cl, and other salts that 
do not have pure salty tastes bind to more 
than one site. Such a finding is consistent 
with the position that the four qualities are 
basic tastes. If the taste of NaCl were but one 
of many tastes located along a single taste 
dimension, then one might expect every sub- 
stance to stimulate a separate site or expect 
NaCl or sucrose, for example, to stimulate 
more than one. 


Neural Coding of Quality 


Pfaffmann, in his classic study (1941), 
and all subsequent investigators that used 
vertebrate subjects found that single taste 
neurons typically respond to stimuli repre- 
sentative of more than one taste quality. As 
a result, Pfaffmann (1959a, 1959b) proposed 
that the sensory code for taste quality is 
carried by the relative firing rate of two or 
more fibers. This hypothesis linking the non- 
specific neurons to specific sensations has 
been called the across-fiber pattern theory. 
Zotterman (1959), on the other hand, em- 
phasized that each fiber typically has a stim- 


162 


ulus to which it responds best and argued 
that the activity in each fiber signals the 
quality of its best stimulus. This position 
has come to be known as the labelled line 
theory. The difference between the two hy- 
potheses has not yet been resolved, since the 
weight of experimental evidence does not 
clearly favor the across-fiber pattern or the 
labelled line. 

For example, Frank (1973) recorded from 
the chorda tympani of the hamster and found 
the same sort of broad tuning that was found 
by Pfaffmann in the rat. However, she dis- 
covered that if the stimuli were arranged in 
the order sweet, salty, sour, bitter, there was 
always one maximum of responding, with 
responsiveness decreasing monotonically on 
either side of the maximum. Thus, instead 
of the responsiveness of the neurons to vari- 
ous qualities that appear to be random, 
there was some orderliness for the first time 
since the beginning of electrophysiological 
study of taste neurons. Frank pointed out 
that the order of stimuli that produces the 
simplification of the profiles happens to be 
from acceptance to rejection. This is a curi- 
ous finding and may simply be fortuitous. 
However, Frank's results have led to a re- 
newed interest in the idea that the sensory 
code may be contained in a single neuron, or 
in the labelled line, as it has come to be called. 


Comparison of Psychological and 
Psychophysical Data Relative 
to Coding 


Direct comparisons of neural and psycho- 
physical responses in the human are obviously 
limited to special circumstances, but such 
studies have been done. Electrophysiological 
responses of the exposed chorda tympani 
nerve of otosclerotic patients were found to 
be closely correlated with their magnitude 
estimates of the intensity of various sweet 
stimuli. Also, the neural as well as the psy- 
chophysical response to sweet stimuli was 
abolished following the application of Gym- 
nema sylvestre. In addition, it was found 
that the time required for complete adapta- 
tion to NaCl agreed with the psychophysical 
reports (Borg, Diamant, Oakley, Strom, & 
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Zotterman, 1967; Diamant, Oakley, Strom, 
Wells, & Zotterman, 1965). 

The electrophysiological investigation of 
the effects of cross-adaptation has produced 
data that are remarkably similar to those 
from the psychophysical studies mentioned 
above. Smith and Frank (1972) studied 
cross-adaptation betwen salts in the rat 
chorda tympani and found that the transient 
response to a salt was affected by adaptation 
to other salts. They compared the amount 
of cross-adaptation they obtained with that 
obtained by Smith and McBurney (1969) for 
those pairs of stimuli that were common to 
the two experiments. The degree of cross- 
adaptation observed was very similar in the 
two studies. In addition, it was found in 
both studies that salts with common cations 
cross adapted more than did salts with differ- 
ent cations. The implication of these two 
studies is the same, namely, that there is a 
common neural mechanism for saltiness. As 
discussed earlier, this does not necessarily 
imply that separate receptor sites or fiber 
types exist. In fact, the data of Smith and 
Frank (1972) are very similar to the corre- 
lations obtained by Erickson et al. (1965) 
from the responses of individual rat chorda 
tympani fibers to pairs of salts and are thus 
equally compatible with an across-fiber pat- 
tern theory of quality coding. 

The most determined attempt to extend 
Pfaffmann's across-fiber pattern theory has 
been made by Erickson. From the relative re- 
sponsiveness of each neuron to pairs of salts, 


Erickson et al. (1965) derived what they | 


termed a neural response function (NRF) for 


each neuron that described its sensitivity to | 


many salts and that allowed them to order the 
salts as if they fell along a continuous dimen- 


sion. They reasoned that the NRFs for gusta- | 


tory neurons were similar to the relative ac- 
tivity of visual neurons in response to the 
visual spectrum. The crucial assumption of 
Erickson’s approach to taste coding is the 
existence of a continuous stimulus dimension, 
and all of Erickson’s hypotheses about sen- 
sory coding are based on this assumption. It 
is because of this assumption that the NRF 
work suffers from the same weakness as the 
multidimensional scaling work described 
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earlier. Specifically, it relies on a converse 
linking hypothesis, in this case a converse 
physiological-physical linking hypothesis. 
Erickson attempted to find the stimulus di- 
mension from an analysis of the relationships 
among the physiological responses. Such an 
approach will lead to a solution, but other 
interpretations of the results of the NRF 
work are possible. For example, one could do 
the physiological equivalent of the multidi- 
mensional scaling of cutaneous stimuli dis- 
cussed above as a thought experiment, This 
time consider establishing NRFs for many 
stimuli presented to the chorda tympani 
nerve, which subserves touch and tempera- 
ture as well as taste. It is likely that NRFs 
could be established for these stimuli and 
that one could develop some sort of space 
based on them. However, it would not dem- 
onstrate the existence of a physical stimulus 
dimension that underlies all of the various 
stimuli. 


The Analytic-Synthetic Distinction and 
Coding Theory 


Erickson's work was a very creative ex- 
tension of Pfaffmann's approach to taste cod- 
ing, and one of its most important contribu- 
tions was to make explicit a number of hy- 
potheses about sensory coding. However, it is 
clear that there are difficulties created by 
the assumption of a single stimulus dimen- 
Sion, as, for example, in Erickson's explicit 
hypothesis about the coding of taste mix- 
tures. Erickson (1968) claimed that syn- 
thetic systems are coded by across-fiber pat- 
terns and analytic systems by specific fibers. 
On this basis, taste must be a synthetic sys- 
tem in spite of the fact that the qualities of 
the components in a mixture are not fused or 
otherwise lost and that no new qualities are 
produced. It is true that color vision is both 
Synthetic and has a pattern theory of coding. 
But, audition is clearly analytic, and the 
auditory nerve displays fairly broad tuning 
at the level of the first-order neuron (Kiang 
& Moxon, 1974). Relatively specific firing is 
found only at more central levels. Therefore, 
the correlation Erickson proposed does not 
Seem to hold. Erickson suggested that the 
mixture of two stimuli in a sense that requires 
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a pattern theory will result in a new pattern 
that has a different maximum than the two 
original patterns and hence in a new (syn- 
thetic) sensation. One can, in fact, match a 
stimulus like KCl, which is salty and bitter, 
with a mixture of NaCl and quinine, and a 
subject may not be able to distinguish such a 
mixture from KCl. It should be noted, how- 
ever, that КС! tastes salty and bitter and not 
something else. The “missing orange” seems a 
strong argument against taste being syn- 
thetic. But, the question of whether КС! is 
itself a mixture to begin with is actually 
begged. Erickson has suggested that КС! may 
be a mixture taste: The taste of NaCl itself 
could be considered a mixture taste because 
the stimulus solution is composed of two 
ionic components (Na* and Cl-) that might 
be located at two different points along the 
stimulus dimension (Erickson & Schiffman, 
1975). This suggetion cannot be dealt with 
here in detail except to say that Beidler’s 
(1965, 1970, 1971) theory of sensory trans- 

duction in taste, which is the most widely 

accepted theory, considers the cation to be 

responsible for stimulation and the anion to 

be inhibitory. Further, sour, sweet, and bitter 

tastes cannot be similarly considered mix- 

ture tastes in this context because the stim- 

uli are either nonionic (sweet and bitter) or 

the stimulation is accepted to be due to the 

cation (acids). This seems to create a prob- 

lem for the assumption of a continuous stim- 

ulus dimension, 

Two tests of the analytic-synthetic ques- 
tion have recently been reported. Erickson 
(1977) tested reaction times of subjects who 
had to choose between high and low concen- 
trations of NaCl when MgCl, also varied in 
concentration and vice versa. He found that 
variation in the second compound had an 
effect on reaction time, being facilitating 
when the two covaried and inhibiting when 
they varied independently. However, other 
interpretations are possible. Total intensity 
of the mixture would be a relevant cue in 
the covarying condition and would be a dis- 
tractor in the independently varying condi- 
tion. Further, ignoring the overall intensity 
of the mixture, one has two qualities that 
vary independently in intensity. There are a 
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number of studies that suggest that an ana- 
lytic sense would not necessarily code these 
two qualities independently. Hamlin, Stone, 
and Moskowitz (1955) had subjects sort cards 

. according to shape of symbol. Performance 
was slower when the symbols varied in color 
than when they were always black. Egeth 
(1967) and Well (1971) reviewed other simi- 
lar experiments. Although performance has 
not always been found to suffer in such ex- 
periments, the case made by such a design 
for the synthetic nature of taste is extremely 
tenuous. Nowlis and Frank (1977) tested 
the synthetic-analytic question by using the 
conditioned aversion paradigm (Garcia, Han- 
kins, & Rusiniak, 1974) with rats. Rats that 
have been poisoned after drinking a solution 
will avoid that solution in a later test. If they 
are poisoned after drinking two solutions, 
one of which is familiar and one novel, they 
will avoid only the novel solution, In a very 

. clever design, Nowlis and Frank made their 
rats sick on a mixture of NaCl and sucrose. 
When the sucrose was the familiar stimulus, 
they later avoided the NaCl as much as they 
avoided the mixture and did not avoid the 
sucrose. Results were similar when NaCl was 
the familiar stimulus. When neither was 
familiar the rats avoided both sucrose and 
NaCl. This is strong evidence that the rats 
sorted out the two components of the mixture 
as separate and independent tastes. Thus, 
Erickson's (1977) hypothesis linking across- 
fiber patterning to synthetic systems does not 
seem to hold, The evidence argues against 
taste being a synthetic sense. 


Sunimary 


In summary, support for basic tastes in the 
gustatory system is found in psychophysical 
work, both historical and modern. First, the 
history of thinking on taste has shown a 
narrowing of the number of taste qualities 
to the present four on the basis of careful 
introspective work. Classical support for the 
four taste qualities also included the associ- 
ation between taste quality and the nutritive 
or poisonous nature of the chemical, the cor- 
relation between chemical structure and 
taste, the differential sensitivity of the four 
qualities across the tongue, the effects of 
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topical anesthetics, and the lack of synthetic 
effects of mixtures. Recent support for the 
four taste qualities comes from work on 
cross-adaptation, water taste, taste-altering 
substances, the effects of locus on the thres- 
hold and on the growth of sensation with 
stimulus intensity, and the temporal proper- 
ties of the taste system. 

Electrophysiological work on taste coding 
began by looking for nerve fibers that would 
correlate with the four taste qualities. When 
that appeared to fail, some investigators felt 
compelled to abandon the concept of the four 
qualities as basic. However, attempts to infer 
the dimensionality of the stimulus and solve 
the sensory code for taste without making 
use of the four qualities have not been pro- 
ductive. Either of the two major positions on 
sensory coding in taste, labelled line or pat- 
tern theory, appears to us to be equally 
compatible with the existence of basic tastes. | 
In addition, the question of the analytic or 
synthetic nature of taste experience seems 
independent of both problems, namely, basic 
tastes versus a multidimensional taste space 
and the labeled line versus the pattern 
theory. 


What Is a Sense? 


Implicit in much of the discussion so far 
has been the idea that taste is a single sense 
or sensory system. To say that the four quali- 
ties exist raises again the question of whether 
they might well be considered separate sen- 
sory systems. Often the four qualities have 
been called submodalities, implying more dis- 
tinctiveness among them than is implied by 


1 After this article was written, an article by 
Dethier (1978) appeared that discusses the question 
of taste qualities from the comparative point of 
view. Although he does not reject the four taste 
qualities for man, Dethier feels that they constitute à 
straightjacket when applied to other animals. We 
freely accept that many other animals have different 
taste worlds, However, as argued in the present 
article, we believe that the behavioral and physio- 
logical work on mammals supports the substantial 
similarity of taste mechanisms in man and other 
mammals. Our main point, that taste qualities até 
not arbitrary conventions, is fully compatible with 
Dethier’s position that there may be either more oF 
fewer than four taste qualities in other organisms. 
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the primary color sensations but less than is 
implied by the difference between, say, red 
and F sharp. Can one imagine a smooth 
transition between sweet and salty or salty 
and bitter? We deny this and argue that 
they meet Helmholtz’s (1879/1968) criterion 
for separate senses, But we must admit that 
there is room for difference of opinion. How- 
ever, that the four taste qualities share the 
same receptors, neurons, and neural projec- 
tions seems not to be especially important 
when one considers that the skin senses have 
a similar degree of commonality among them. 
We feel it may be more useful to consider 
taste as comprising four senses: salty, sour, 
sweet, and bitter. However, this is not to say 
that the four taste modalities are as distinct 
as vision is from audition. Clearly vision and 
audition are different sensory systems be- 
cause they not only meet Helmholtz's cri- 
terion but also because they have separate 
receptors, neural pathways, and projections 
in the sensory cortex of the brain. And the 
skin senses, although together considered to 
make up the cutaneous system, are separate 
from each other largely because of the psy- 
chological distinctiveness of the various sen- 
sations, even though there seem to be sepa- 
rate receptors in some cases. 

Thus, the listing of senses must remain 
somewhat arbitrary. The "special" senses, 
vision and audition, are relatively but not 
completely distinct (synesthesia), and the 
taste senses are related but not unitary. 

If one must make analogies among the 
Senses, and one must, then we argue that 
the proper senses with which one should com- 
pare taste are the skin senses. After all, the 
Sense of taste shares the same receptor sur- 
face, nerves, neurons, and central projections 
as do the other skin senses, When one makes 
these analogies, one sees that the notion of 
four basic qualities or submodalities of taste 
has a great deal of empirical evidence to 
Tecommend it and avoids the pitfalls into 
which the analogy with vision has led us. 
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Tests of Significance in Stepwise Regression 
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Tests of significance of the sample squared multiple correlation (R?) in step- 
wise multiple regression have not been possible because its distribution is un- 
known. The present study used Monte Carlo simulation and least squares 
smoothing to construct tables of the upper 95th and 99th percentage points 
of the sample R? distribution in forward selection. A survey of published 
psychological research that used stepwise regression found a substantial infla- 
tion of reported significance levels when compared to the tabled values. Rec- 
ommendations are given for use of these tables in evaluating results from 
forward selection and other stepwise methods. 


Stepwise regression has a controversial role 
in statistical data analysis. Since the intro- 
duction of various automated techniques for 
selecting the “best” subset of a set of pre- 
dictor variables in a multiple regression re- 
searchers have been warned about their in- 
discriminate use (e.g, Kupper, Stewart, & 
Williams, 1976; Brandt, Note 1). The pri- 
mary reason for this caution is that for any 
Subset selection procedure based on inspec- 
tion of the sample data, the usual F statistic 
for testing the significance of the multiple 
correlation is biased (Pope & Webster, 1972). 
Unfortunately, the most widely used com- 
puter programs print this statistic at each 
step without any warning that it does not 
have the F distribution under automated 
Stepwise selection (Armor & Couch, 1972; 
Barr, Goodnight, Sall, & Helwig, 1976; Dixon, 
1975; Nie, Hull, Jenkins, Steinbrenner, & 
Bent, 1975). Researchers encouraged by a 
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significant multiple correlation from a step- 
wise analysis are often surprised to find how 
much it shrinks under cross-validation 
(Schmitt, Coyle, & Rauschenberger, 1977). 
For one solution to this problem, the present 
study used Monte Carlo methods to provide 
tables of percentage points of the distribu- 
tion of the sample stepwise squared multiple 
correlation (R?) under the null hypothesis 
that the population multiple correlation is 
zero. The tabled values can be substituted 
directly for those on the computer printout 
when forward selection is used, and they can 
serve as approximations when other subset 
selection methods are used. 


Distribution of the Sample R? in 
Stepwise Regression 


Given a sample of size n from & + 1 multi- 
variate normal variates, the sample R? be- 
tween the first and the remaining k variates 
under the null hypothesis has the beta dis- 
tribution. In this case, the transformation 


F-R(n—k—1)(—R»)k (1) 
has the F distribution with degrees of freedom 
k and л — k — 1. These distributions apply 
whether the values of the predictors are 
fixed or vary across samples, as long as the 
null hypothesis is true. 

Complications arise when & predictors are 
chosen from m on the basis of the sample 
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data. Two straightforward cases occur, how- 
ever: when $ = m and when k= 1. If k= 
m, the above distributions obviously apply. 
If k = 1, and if the predictor that maximizes 
the sample R* is chosen, then the sample R* 
has an independent beta extreme-value dis- 
tribution. For this case, the F statistic in 
Equation 1 may be used with critical value 

a* = 1 — (1— a)'/n, (2) 
where a is the family critical level. 

Beyond these two cases, no exact distribu- 
tions are known. When the subset of size & 
(1 € k < m) is chosen to maximize the sam- 
ple R°, the distribution of the sample R? is 
an extreme-value distribution of a set of 
dependent beta variates, This approaches the 
independent extreme-value distribution when 
m is large and # is small asymptotically. 
When the predictors are correlated, this 
asymptotic convergence is slower, since de- 
pendencies among the sample R? values for 
all (2) subsets are greater. Most stepwise 
algorithms do not necessarily maximize the 
sample R?, however, and the distributions in 
these cases would be even more complicated 
whether or not the predictors were mutually 
independent. 

Diehr and Hoflin (1974) simulated the 
sample R? distribution for the best R? subset 
among (7) given independence among the 
Predictors, This distribution is particularly 
Important because its percentage points pro- 
Vide an upper bound on sample R? values 
from any subset selection method on inde- 

- Pendent or correlated predictors. For each 
m= 2-8, k = 1-т, and n = 10, 25, 50, 100, 
and 200, they computed 100 R? values. Each 
Value was obtained by computing all possible 
Tegressions among (7) and selecting the one 
With the largest R?. The computing time for 
this task limited the number of replications 
and parameter values, but they were able to 
Provide a function that approximates their 
Monte Carlo results: 

Rh, m, n, а) = (1 — 2%), (3) 
with w and о determined iteratively from the 

own values of R2(1, m, n, а) and R*(m, 

"п, о). Furnival and Wilson (1974) devel- 


~ 


169 


oped a rapid “leaps and bounds” algorithm 
for computing the best subset R? that can be 
used for extending Diehr and Hoflin's results. 

Rencher and Fu-Ceayong (Note 2) used 
Monte Carlo simulation to compute upper 
percentage points of the sample R? distribu- 
tion in stepwise selection and elimination 
(Draper & Smith, 1966, p. 171). They gen- 
erated both uncorrelated and correlated pre- 
dictors for selection. As in the Diehr and 
Hoflin (1974) study, their results were ap- 
proximations from a relatively small number 
of replications (200-400), but they included 
a wider range of parameter values. 

The present study used a simple algorithm 
for choosing subsets: forward selection 
(Draper & Smith, 1966, p. 169). This method 
is faster than other selection procedures and 
exceeded many of them in a Monte Carlo 
study that involved several cross-validation 
criteria (Dempster, Schatzoff, & Wermuth, 
1977). Furthermore, forward selection is the 
standard or most basic option in most widely 
used statistical programs for stepwise regres- 
sion. 


Method 


Because of the computing time needed for estimat- 
ing each point in the distributions by Monte Carlo 
simulation (between 10 sec and 5 min.), a four- 
stage procedure was used: (a) initial approximation 
of values in order to select suitable nodes for Monte 
Carlo estimation, (b) Monte Carlo simulation of 
the selected nodes, (c) smoothing of the Monte Carlo 
estimates by graphical and least squares methods, 
and (d) testing fitted values by new Monte Carlo 
simulation. Since most of the tabled values were 
not initially estimated by Monte Carlo, a check on 
the smoothing process itself was possible in the last 


stage. 


Approximation 


The iterative function (Equation 3) was used to 
generate values of R?(k, m, n, o) for k = 1-т; m= 
2-30; n — m — 1 — 10, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 150, and 200; and a=.05 and .01. From these 
values, suitable nodes were selected for fitting nomo- 
graphs in the two tables to be constructed. These 
nodes were the known values at k —1 and k = m 
for п—т=— 1 = 10, 20, 50, 100, and 200, plus the 
subsets (k, m) = (2, 3), (3, 4), (2, 7), (4, 7), (3, 12),* 
(7, 12), (3, 20), (7, 20), and (12, 20) at the same 
degrees of freedom. 
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Upper 95th Percentage Points of Distribution of Sample Squared Multiple 


Correlation in Forward Selection 


А ————————————————— 
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Note. Decimals are omitted; m = number of predictors; k = number of predictors selected; n = sample siz 


Data Generation 


Two FORTRAN subroutines were written for the 
Monte Carlo simulation, The first generated a 
sample correlation matrix directly from a standard- 
ized Wishart distribution given m and n and using 
an algorithm from Odell and Feiveson (1966). The 

, second subroutine used the abbreviated Gauss- 
Doolittle method with forward selection to compute 
a sample К° value (Draper & Smith, 1966, р. 178). 
Sample R? values were sorted and noted at the 95th 


and 99th percentage points after 500 replication’ 
Additional replications in blocks of 100 were COP: 
tinued in each run until a stopping rule was satis: 
fied. The stopping rule was that the upper 99 

percentile value be bound on either side by a valut 
differing from it by lesss than .01. Computation 
were done in double precision on an IBM 370/198) 
using the FORTRAN н extended compiler and the 8 
(KBOIAD) and normal random number (FA034) 
routines from the Harwell Subroutine Libra”) 
(1973). 
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Upper 99th Percentage Points of Distribution of Sample Squared Multiple 


Correlation in Forward Selection 


—————————»—»———————M——————— 
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Note. Decimals are omitted; m = number of predictors; 


"Curve Fitting 


isa nodes from the Monte Carlo simulation and 
à e known values at k= 1 and Ё = m were used to 
a nomographs for m—m—1=10-200, m= 
18 RS Е = 1-т. Initial estimates of the entries 
Ns les 1 and 2 were read off these nomographs. 
b of these values were not estimated directly 
У Monte Carlo, 


) 


k = number of predictors selected; n = sample size. 


To smooth further the results from the nomo- 
graphs, the 656 entries in each table (including the 
known values) were predicted by multiple regres- 
sion. Twenty linear and nonlinear terms plus their 
interactions were constructed from approximations 
contained in Diehr and Hoflin (1974), Zirphile 
(1975), and Kendall and Stuart (1969, p. 330). The 
standard error of the estimate in predicting the 95th 
percentile table values from the best 15 terms in 
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the equation was .0054; for the 99th percentile 
table, it was .0047. The entries in Tables 1 and 2 
were taken from the regression estimates. 


Check on Accuracy 


As a check on the accuracy of the table values, 
simulations were run to predict 20 points that had 
not previously been estimated by Monte Carlo. 
Runs were continued in blocks of 1,000 replications 
until a stopping rule was met. The stopping rule 
was that two cumulative runs result in a successive 
difference of less than .01 in both the 95th and 99th 
percentile values. Of the 40 values tested in the 
combined tables, 17 differed from the Monte Carlo 
value by .01, and 23 agreed exactly with the printed 
accuracy. No significant differences in errors were 
found across tables. 


Results 


Histograms of the simulated data at se- 
lected parameter values resembled beta dis- 
tributions, although no beta-type function 
could be found to reproduce accurately the 
upper tail values. Tables 1 and 2 give the 
95th and 99th percentage points of these dis- 
tributions. 


Use of Tables 


The tables have been constructed to cover 
a full range of practical parameter values. 
Linear interpolation works quite well for m 
and n. For interpolations оп k, however, 
graphical plotting of the table values provides 
more accurate estimates. Extrapolation may 
be used moderately, since the values near the 
margins of the table change slowly. The 
known values when m and n are large, for 
example, can be extrapolated accurately on 
fine graph paper with a flexible drafting curve. 


Discussion 


The tables clearly illustrate the inflation 
of the sample А? in stepwise regression that 
has frequently been noted by statisticians. 
For example, R?(4, 4, 35, .05) = .26, whereas 
R?(4, 20, 35, .05) = .51. This inflation has 
not always been noted by researchers, how- 
ever. À computer-assisted search for articles 
in psychology using stepwise regression from 
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1969 to 1977 located 71 articles. Out of these 
articles, 66 forward selection analyses ге. 
ported as significant by the usual F tests were 
found. Of these 66 analyses, 19 were not 
significant by Table 1. 

The extent of this artifact may have con- 
tributed to the poor reputation of subset se- 
lection methods in multiple regression through | 
failures to replicate published research. This 
situation is ironic because of the clear evi- 
dence of the substantial superiority of for- 
ward selection over ordinary least squares in 
a variety of prior distributions. Forward se- 
lection can be almost as effective as ridge! 
regression in minimizing prediction and beta 
weight errors with highly correlated pre- 
dictors (Dempster et. al, 1977). Further- 
more, forward selection offers a more par- 
simonious model than does ridge regression 
because only & predictors out of т are in- 
cluded in the equation. 

Three questions remain, however, regard- 
ing the application of these tables: (a) Can 
the tables be used for other subset selection 
methods? (b) Can they be used when pre- 
dictors are intercorrelated? (c) Do they 
apply when & is not known prior to the 
analysis? 

In answer to the. first question, the tabled 
values may be compared to results from 
Monte Carlo studies of two other subset sê- 
lection methods: best subset (Diehr & Ноћ, 
1974) and stepwise selection and elimination 
(Rencher & Fu-Ceayong, Note 2). Diehr 
and Hoflin’s approximation, given in Equa- 
tion 3, yields values higher than those in 
Tables 1 and 2. The differences range from 
.01 when m = 3 to .10 when m = 20. This 
discrepancy is due partly to the approxima 
tion and partly to the fact that the sample 
R? percentage point for the best subset case 
is an upper bound for all subset R? pet 
centage points at the same parameter values. 
Rencher and Fu-Ceayong’s results fit the 
values within both tables closely. For 15 
comparable parameter values in both tables, 
the largest discrepancy was .02, with most 
values less than .01. This fit indicates that 
the tables should be appropriate for stepwist 
selection and elimination. Although the vati 
ous other suboptimal selection methods us! 


| 


in most computer programs occasionally re- 

| sult in different subsets of size # from (2), 
the distribution of the sample R? in these 
cases may nevertheless fit these tables, par- 
ticularly when m is large and & is small. 

The question of intercorrelated predictors 
requires further Monte Carlo simulation for 
an answer. Rencher and Fu-Ceayong investi- 
gated this problem using stepwise regression 
on random predictors intercorrelated in vari- 
ous ways. The upper percentage points of the 
sample R? distributions from correlated pre- 
dictors were only slightly lower than those 
for independent predictors in all cases. The 
results for forward selection should be similar. 
In any case, loss of power should not be 
substantial when these tables are used for 
correlated predictors, provided & is small rela- 
tive to m. 

Finally, when & is unknown prior to the 
analysis, stopping rules must be applied 
(Bendel & Afifi, 1977). In this case, Ё is a 
random variable instead of a fixed constant. 
Further Monte Carlo research is needed to 
identify the effect of stopping rules on the 
sample R? distribution. The most common 
stopping rule for forward selection is to con- 
tinue stepping until the sample partial cor- 
relation is “nonsignificant” by a standard F 
test (Draper & Smith, 1966, p. 71). As an 
alternative, simultaneous inference may be 
used for subset selection to control the Type 
I family error rate (Aitkin, 1974). This 
method is conservative, however, like most 
simultaneous test procedures; for a given 

, critical level, it will eliminate fewer subsets 
Containing “unimportant” predictors than 
will forward selection with a sequential stop- 
Ping rule. 

Problems remain regarding tests of signifi- 
Cance of coefficients in stepwise regression. 
In lieu of such tests, cross-validation should 
be performed, even though the standard er- 
Tors of the coefficients may be smaller than 
those in the corresponding ordinary least 
Squares equation on m variables. Research- 
ers should carefully consider the advantages 
and disadvantages of various subset selection 
Methods and other biased estimation meth- 
ods for analyzing particular data sets (for 
reviews, see Hocking, 1976, and Jennrich, 
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1977). Users of standard, automated step- 
wise computer programs, however, should 
choose forward selection, ignore the tests of 
significance printed at each step, and consult 
Tables 1 and 2 to evaluate the significance 
of the final equation they select. 
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Suppression of Responding During Signaled and 
Unsignaled Shock 


Norman Hymowitz 
New Jersey Medical School 
College of Medicine and Dentistry of New Jersey 


The empirical basis for Seligman's safety signal hypothesis derives largely from 
studies of response suppression during signaled and unsignaled shock and from 
studies of animals’ preference for signaled over unsignaled shock. Recently, the 
literature on preference for signaled over unsignaled shock has been the subject 
of serious criticism and controversy, thus weakening the empirical foundations 
of the safety signal hypothesis. The present article reviews the literature on 
response suppression to determine if this, too, has been the subject of contro- 
versy and criticism. To the contrary, the suppression literature provides strong 
support for the safety signal hypothesis and also reports data that are com- 
patible with much of the choice literature. This agreement between the two 
tests of the safety signal hypothesis increases confidence in the reliability of 
the data and the adequacy of the hypothesis. Despite this agreement, emerging 


data on response suppression during signaled and unsignaled shock suggest that, 


at best, the safety signal hypothesis emphasizes only 


one of the many deter- 


minants of differential response suppression. 


Seligman (1968) reported that food-main- 
tained responding was more readily sup- 
pressed by a given intensity of electric shock 
when shock delivery was unsignaled than 
when it was signaled. Furthermore, animals 
exposed to unsignaled electric-shock delivery 
developed significantly more gastric ulcers 
than did animals exposed to comparable sig- 
naled shock. Seligman interpreted these find- 
ings in terms of a safety signal hypothesis. 
According to this hypothesis, the presence 
and absence of the preshock signal or condi- 
, tioned stimulus (CS) specify shock (“dan- 
ger”) and shock-free (“safety”) occasions, 
respectively. During signaled shock, the ani- 
mals spend most of the session in safety. 
With unsignaled shock, most of the session 
is spent in danger. Presumably, this differ- 
ential exposure to danger and safety is re- 
sponsible for differential response suppression 
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and ulceration during signaled and unsig- 
naled shock. 

Since Seligman's report the analysis of 
the behavioral effects of signaled and unsig- 
naled electric-shock delivery has proceeded 
in two directions. On the one hand, a host of 
researchers have attempted to systematically 
replicate Seligman's (1968) original findings 
with response suppression as the data of in- 
terest. As noted by Selignan and Binik 
(1977), such studies have uniformly con- 
frmed Seligman's finding of more response 
suppression during unsignaled than during 
signaled shock. A second avenue of investiga- 
tion involved choice of signaled over unsig- 
naled shock delivery as the dependent mea- 
sure. According to the safety signal hypothe- 
sis, animals ought to select signaled over 
unsignaled electric-shock delivery. Indeed, a 
host of investigators have suggested that this 
is the case (e.g., Badia & Culbertson, 1972; 
Badia, Culbertson, & Harsh, 1973; Harsh & 
Badia, 1975; Lockard, 1963; Perkins, Sey- 
mann, Levis, & Spencer, 1966). However, 
some negative findings (e.g., Biederman & 
Furedy, 1973, 1976b; Crabtree & Kruger, 
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1975) and serious methodological criticisms 
of the choice literature (e.g, Biederman & 
Furedy, 19762; Furedy, 1975) exist. 

As noted by Biederman and Furedy 
(19762), many of the earlier studies of ani- 
mals' preference for signaled over unsignaled 
shock used unscrambled grid shock (e.g., 
Lockard, 1963). Since animals may modify 
or avoid unscrambled shock by postural ad- 
justment, unambiguous evaluation of such 
studies is not possible. Moreover, Biederman 
and Furedy (1973) found preference for sig- 
naled over unsignaled shock when the shock 
was unscrambled but not when it was 
scrambled. 

Two studies reported preference for sig- 
naled shock over unsignaled shock when 
shock was delivered to the animals through 
electrodes attached directly to the animal's 
tail (Miller, Daniel, & Berk, 1974; Perkins 
et al., 1966). Since it is not possible to modify 
shock delivered through tail electrodes, the 
data seem to support the safety signal hy- 
pothesis. However, Biederman and Furedy 
(1976a, 1976b) pointed out that the tail 
electrode procedure was unreliable and that 
many of the animals were eliminated from 
the studies because of damage to the elec- 
trodes. In view of the unreliable procedure and 
possible sampling bias, the data may hardly 
be viewed as strong support for the safety 
signal hypothesis. 

A more recent study by Miller, Marlin, 
and Berk (1977) that employed tail shock is 
less susceptible to serious criticism. Miller 
and his students perfected their apparatus 
and technique so that they were able to suc- 
cessfully and reliably deliver shock directly 
to the tail of freely moving animals (Berk, 
Marlin, & Miller, 1977). Five of the eight 
animals studied showed consistent prefer- 
ences for the side of the shuttle box associ- 
ated with signaled shock. The three other 
animals revealed an initial preference for 
signaled shock, but failed to maintain the 
preference during the course of several re- 
versal conditions in which the side of the 
shuttle box associated with signaled shock 
was changed. 

Perhaps the most thorough analyses of 
animals’ preference for signaled over unsig- 
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naled shock are those that were conducted 
by Badia and his students (Badia & Culbert- 
son, 1972; Badia et al, 1973; Harsh & 
Badia, 1975). Badia used a changeover pro- 
cedure to study the choice behavior of rats, 
Typically, the animals were exposed to un 
signaled electric-shock delivery. Responses 
on the changeover lever produced, for a brief 
period of time, a correlated stimulus in the 
presence of which shocks were preceded by 
a signal. He also used scrambled grid shock. 
Hence, his studies are not subject to the criti- 
cisms mentioned earlier. However, Bieder- 
man and Furedy (1976a, 1976b) criticized 
Badia's studies on several other counts. Ас 
cording to Biederman and Furedy, Badia 
typically confounded the correlated stimulus 
and the CS; that is, the same response that 
produced the correlated stimulus (light) also 
produced the preshock signal or CS (tone). 
Biederman and Furedy (1973) showed that 
during shock animals pressed to produce à 
light (correlated stimulus) whether or not 
shock was preceded by the CS. They sug 
gested that changeover responding was main- 
tained by stimulus change or photic rein 
forcement, not by signaled shock per se. Bied- 
erman and Furedy (1976a, 1976b) also noted 
that Badia’s studies were not balanced. Ani- 
mals changed over from dark, unsignaled 
shock to light, signaled shock. If given the 
opportunity, they might have changed ove 
from light, signaled shock to dark, шп 
naled shock. 

It is not the purpose of the present article 
to evaluate the pros and cons of the various) 
discrepancies and controversies in the choice 
literature (see Badia & Harsh, 1977a, 1977b): 
Polemics are no substitute for carefully 
planned experiments. However, closer atten 
tion to the first avenue of investigation met 
tioned above, the suppression of responding 
during signaled and unsignaled shock, may 
advance the analysis of the behavioral effects 
of signaled and unsignaled shock and also 
may bear upon some of the discrepancies Ш 
the choice literature. When, for example, ? 
variable is shown to influence choice of 818 
naled over unsignaled shock and differential 
response suppression during signaled and UP 
signaled shock in the same manner, more сој“ 


М 
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fidence can be placed in the generality and 
reliability of the data and in the hypothesis 
under test (Ghiselin, 1969). Thus, one pur- 
pose of the present review of the literature on 
response suppression during signaled and un- 
signaled shock is to determine, wherever pos- 
sible, whether variables purported to influence 
the choice of signaled shock also influence 
differential response suppression. Such an 
analysis should enhance our understanding 
of the behavioral effects of signaled and un- 
signaled shock and should suggest additional 
studies that bear critically on existing con- 
troversies in the choice literature. 

A second purpose of the present review is 
to evaluate further the utility of the safety 
signal hypothesis. It is important to note that 
the safety signal hypothesis originally was 
formulated on the basis of findings on differ- 
ential response suppression (Seligman, 1968). 
Can the hypothesis accommodate the data 
that have been generated since 1968? How 
general are Seligman's (1968) original find- 
ings? The following review of the literature 
provides answers to these questions. 

Naturally, all reviews of the literature 
must be selective. I did not include in the 
review the growing literature on the somatic 
effects of signaled and unsignaled shock. For 
the most part, the focus of research in this 
area has been upon producing somatic reac- 
tions with little attention given to the ani- 
mal’s behavior. Where appropriate, studies 
from related areas of aversive conditioning 
are cited and discussed. In particular, studies 
of choice of signaled over unsignaled shock 
àre presented in detail. To facilitate the com- 
Parison of variables that influence choice and 
Suppression in the same manner, the ensuing 
literature review is organized according to 
the variables of which differential response 
Suppression is a function. 


Experimental Design 


Two basic experimental designs have been 
employed to study differential suppression. 
With between-groups designs (e.g., Seligman, 
1968), the data of interest are comparisons 
between the rate of responding in the absence 
of the CS for one group of animals and the 
Tate of responding during comparable un- 
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signaled shock delivery for another group. 
For within-subject designs (e.g., MacDonald, 
1973) the rate of responding in the same 
animal is studied under signaled as well as 
unsignaled shock conditions (cf. Sidman, 
1960). 

MacDonald (1973; MacDonald & Baron, 
1973) studied rates of responding in the rat 
under a two-component multiple schedule in 
which each component consisted of a two- 
link chained schedule. Under this schedule, 
responding in one of the initial links of the 
chain schedules produced one of the two 
terminal links during which food reinforcers 
and either signaled or unsignaled shocks were 
presented. When signaled and unsignaled 
electric-shock delivery was scheduled in the 
separate terminal links, the rats responded 
at lower rates in the initial and terminal 
links associated with unsignaled shock. Hymo- 
witz (1976b, 1977b) studied responding 
within the same animal during multiple and 
mixed schedules of signaled and unsignaled 
electric-shock delivery. Much more response 
suppression occurred in the components of 
the multiple schedule that were associated 
with unsignaled shock than in the compo- 
nents associated with comparable signaled 
shock. Differential response suppression dur- 
ing signaled and unsignaled shock delivery 
was not obtained when the animals were ex- 
posed to the mixed schedule of shock delivery. 
The significance of the latter finding is dis- 
cussed in another section. 

Although the between-groups and within- 
subject designs yield compatible results, the 
within-subject design seems more suitable 
for parametric analyses. Each animal may 
differ with respect to the conditions necessary 
to reveal differential suppression (cf. Hymo- 
witz, 1977b). Factorial designs provide in- 
formation about parameter values, but do 
not overcome the differences in sensitivity of 
individual animals to experimental condi- 
tions. A case in point is the present con- 
troversy in the literature on the preference 
for signaled over unsignaled shock. 

Several researchers (Biederman & Furedy, 
1973, 1976a, 1976b; Crabtree & Kruger, 
1975; Furedy & Biederman, 1976) failed to 
find preferences in animals for signaled over 
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unsignaled electric-shock delivery. Other in- 
vestigators (e.g., Badia & Culbertson, 1972; 
Harsh & Badia, 1976) routinely report posi- 
tive results. One difference among these re- 
searchers is their selection of experimental 
designs. The former have typically employed 
between-subjects designs, whereas the latter 
have studied the behavior of individual ani- 
mals under a wide range of parameter values. 
At some parameter values, animals selected 
signaled shock; at others, they did not. Often, 
parameters such as shock intensity required 
adjustment and manipulation during the 
course of the study to successfully demon- 
strate preference. Although each of the re- 
searchers presented data that bear on our 
understanding of the preference-of-signaled- 
shock phenomenon, one wonders whether the 
difference in the nature of their findings is 
not in part because of their selection of ex- 
perimental designs. Within-subject designs 
are, by definition, better suited for the analy- 
sis of phenomena that require constant ad- 
justment of parameter values for individual 
animals. 

Recent studies (Hymowitz, 1977a, 1977b) 
showed that parametric considerations are of 
paramount. importance in the analysis of dif- 
ferential suppression during signaled and un- 
signaled shock. These studies are discussed in 
more detail in later sections, but briefly their 
results showed (a) that animals differ with 
respect to shock intensities that yield differ- 
ential suppression and (b) that for the same 
animal, a given shock intensity may yield 
differential suppression of one response but 
not of another. The failure to adequately 
take into account individual differences in 
sensitivity to important controlling param- 
eters may lead to seriously misleading con- 
clusions, 


Duration of the CS 


Does the duration of the CS in any way 
affect the differential suppression of respond- 
ing during signaled and unsignaled shock? 
No study systematically attempted to an- 
swer this question. It is known that CS dura- 
tions of 60 sec (e.g., Seligman, 1968), 10 sec 
(Hymowitz, 1977a), and 5 sec (e.g., Hymo- 
witz, 1976b) readily yield differential sup- 
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pression during signaled and  unsignaled 
shock. In preference situations, it does not 
seem to matter whether CSs of 3, 5, 10, 20, 
or 30 sec are employed. Animals readily 
select the signaled over the unsignaled shock 
condition (French, Palestino, & Leeb, 1972), 

Very likely, the effects of the CS duration 
on differential response suppression and on 
preference for signaled over unsignaled shock 
depend on the relative duration of the CS com- 
pared to non-CS occasions. Such a finding 
would not be inconsistent with Seligman's 
(1968) safety signal hypothesis. In studies of 
conditioned suppression (Stein, Sidman, & 
Brady, 1958), the suppression of responding 
during the CS depended on the relative dura- 
tions of CS and non-CS occasions. In brief, 
when the CS duration was short, relative to 
between-stimuli durations, responding tended 
to be more suppressed during the CS than on 
occasions when the duration of the CS was rel- 
atively long. Indeed, Harsh and Badia (1976) 
employed a CS duration of 30 sec and 
showed that rats failed to prefer signaled 
over unsignaled shock when the shock was 
delivered on the average of once every 45 
sec. Preference was observed within the same 
animals when shock was delivered less fre- 
quently. Perhaps the 43-sec shock delivery 
Schedule would have yielded preference for 
signaled shock if a shorter CS had been em- 
ployed. 


Perkins et al. (1966), on the other hand, | 


failed to find preference for signaled over un- 
signaled shock when the CS was only .5 sec; 
although shock was delivered on the average 
of once every 5 min. They did report prefer- 
ences for signaled shock with CS durations 
of 3 and 18 sec. These findings are com- 
patible with Perkins’ (1968) preparedness 
hypothesis. According to this hypothesis, one 
important reason why animals select signaled 
over unsignaled shock (or suppress respond- 
ing more during unsignaled than during sig- 
naled shock) is that the CS enables animals 
to prepare for the impending shock. Pre- 
sumably, .5 sec is not enough time for such 
preparation. It would be highly informative 
to determine if differential response suppres 


Sion during signaled and unsignaled shock | 
occurs. with a brief .5-sec duration CS. A® | 
| 
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answer to this question would bear critically 
on the preparedness hypothesis. 


Shock Parameters: Intensity, Duration, 
and Frequency 


Shock Intensity 


There is little doubt that the suppression 
of food-maintained responding generally is 
directly related to shock intensity. The higher 
the intensity, the greater is the suppression 
(cf. Azrin & Holz, 1966; Church, 1969; 
D'Amato, 1970). Similarly, the suppression 
of responding in the presence of the CS in 
the conditioned suppression paradigm also is 
a function of shock intensity (cf. Kamin, 
1965). 

Seligman and Meyer (1970) studied the 
effects of shock intensity on responding dur- 
ing signaled and unsignaled shock. Using a 
between-groups design and two shock inten- 
sities, .60 and 1.00 mA, they showed (a) that 
for both shock intensities, more response sup- 
pression occurred in the groups of rats ex- 
posed to unsignaled shock; (b) that respond- 
ing during unsignaled, mild (.60 mA) shock 
recovered considerably over sessions; and (c) 
that little if any recovery during unsignaled 
1.0 mA shock occurred. 

Hymowitz (1977b) extended these find- 
ings. He conducted a within-subject analysis 
of shock intensity in which each animal was 
systematically subjected to a wide range of 
Shock intensities. Lever pressing was studied 
first during 20-25 sessions of multiple fixed- 
ratio 10 fixed-ratio 10 (multiple FR 10 
FR 10) (Rat 1) and multiple FR 20 FR 20 
(Rats 2 and 3) food delivery schedules. A 
multiple variable-time 120-sec variable-time 
120-sec (multiple VT 120-sec VT 120-sec) 
Schedule of shock delivery was then super- 
imposed upon the food schedule. In one com- 
ponent of the multiple schedule, a 5-sec tone 
Preceded each shock (signaled shock). In the 
other component, the preshock stimulus was 
hot presented (unsignaled shock). Blocks of 
5-10 successive food-and-shock sessions at a 
Constant shock intensity alternated with 
blocks of 3-5 successive food-alone sessions. 
Each component was 6 min. in duration, and 
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each daily session terminated after eight com- 
ponents. 

The major findings of this study may be 
summarized as follows: (a) Each animal 
differed with respect to the intensities of 
shock that led to differential response suppres- 
sion; (b) relatively mild intensities of shock 
had little effect on responding in the com- 
ponents of the multiple schedule associated 
with signaled and unsignaled shock, inter- 
mediate intensities disrupted responding pri- 
marily in the component associated with un- 
signaled shock, and more severe intensities 
disrupted responding in both components; 
and (c) continued testing under signaled 
and unsignaled shock delivery affected the 
minimum intensity of shock required to pro- 
duce differential suppression. Originally, .30 
mA shock failed to yield differential response 
suppression in two of the animals. Following 
additional testing under higher intensities of 
shock, signaled and unsignaled shock inten- 
sities of .20 mA and .30 mA readily led to 
more response suppression during unsignaled 
than during signaled shock in each of the 
animals. 

It should be noted that Hymowitz’s 
(1977b) suppression data are in fairly close 
agreement with choice data reported by 
Harsh and Badia (1976). In Badia’s (e.g., 
Badia & Culbertson, 1972; Harsh & Badia, 
1976) studies, the animals were exposed 
to unsignaled response-independent electric- 
shock delivery. Responses on a changeover 
lever produced for a brief period of time a 
correlated stimulus in the presence of which 
a brief tone preceded shock delivery. Harsh 
and Badia (1975) reported that changeover 
responding from unsignaled to signaled shock 
was a function of shock intensity. At inten- 
sities of shock of less than .60 mA, little 
changeover responding occurred initially. As 
the intensity increased, the animals spent 
more time in the changeover condition. More- 
over, when the intensity was reduced to .40 
mA, changeover responses still occurred at a 
high rate. 


Duration of Shock 


Like shock intensity, the duration of shock 
is directly related to the suppression of re- 
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sponding. For a given shock intensity, the 
longer the duration, the greater is the response 
suppression (eg, Church, Raymond, & 
Beauchamp, 1967). Only MacDonald (1973) 
and MacDonald and Baron (1973) em- 
ployed shock durations of longer than 50 
sec, They used a duration of 2.00 sec and 
found more suppression in the links of a 
multiple chain schedule associated with un- 
signaled shock ‘than in the links associated 
with signaled shock. It is unfortunate that 
more systematic data on the effects of shock 
duration on differential suppression are not 
available. Studies of choice of signaled over 
unsignaled shock (Badia et al., 1973) sug- 
gest that changeover from unsignaled to sig- 
naled shock is a function of shock duration. 
At durations of signaled shock beyond 2.00 
sec, animals ceased changeover responding 
from briefer unsignaled shock. 

It is interesting to note that studies that 
failed to find preferences for signaled over 
unsignaled scrambled shock delivery used 
shock durations of 5.00 sec (Biederman & 
Furedy, 1973, 1976b; Furedy & Biederman, 
1976) and 2.00 sec (Crabtree & Kruger, 
1975). Studies that showed preference for 
signaled and unsignaled scrambled shock de- 
livery (eg, Badia & Culbertson, 1972; 
Hymowitz, 1973c) used brief .50-sec shock 
durations. Parametric studies of the effects 
of shock duration on response suppression 
during signaled and unsignaled shock de- 
livery may better describe the relationship 
between shock duration and responding dur- 
ing signaled and unsignaled shock. 


Shock Frequency 


The frequency of shock delivery, like the 
duration and intensity of shock, generally is 
directly related to response suppression. The 
more frequent the shock, the greater is the 
suppression of responding (cf. Azrin & Holz, 
1966). The frequencies of shock delivery 
under which differential response suppression 
during signaled and unsignaled shock de- 
livery has been found range from an average 
of one shock every 11 min. (Davis, Mem- 
mott, & Hurwitz, 1976) to one every 4 min. 
(MacDonald & Baron, 1973), 2 min. (Hym- 
owitz, 1976a), 1 min. (Hymowitz, 1973a), 
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and 45 sec (Imada & Okamura, 1975). Al- 
though shock frequency undoubtedly in- 
fluences differential suppression, it is clear 
that differential response suppression occurs 
under a wide range of shock frequencies. 

It is of some interest that preference for 
signaled over unsignaled shock is affected by 
the frequency of shock delivery. If shock de- 
livery is too frequent (ie. once every 45 
sec), little if any preference for signaled 
shock is found (Harsh & Badia, 1976). In 
terms of Seligman’s (1968) safety signal hy- 
pothesis, the failure to obtain preference at 
relatively short intershock intervals suggests 
that some minimum period of shock-free time 
is required to maintain changeover responses. 

Again, an analysis of the effects of rela- 
tively dense and lean schedules of shock de- 
livery on differential response suppression 
would be informative. Such an analysis might 
accompany an analysis of shock intensity; 
that is, for each shock frequency, one would 
determine the range of shock intensities that 
(a) mildly affects responding during signaled 
and unsignaled shock, (b) suppresses respond- 
ing primarily during unsignaled shock, and 
(c) severely suppresses responding under sig- 
naled as well as unsignaled shock. Very likely, 
the shock intensities required to produce 
these effects would vary as a function of 
shock frequency. It is an empirical question, 
however, whether some minimum period of 
safety is necessary to produce differential re- 
sponse suppression during signaled and un- 
signaled shock. 


Response-Independent and Response- 
Dependent Electric-Shock Delivery 


Most of the studies of the suppressive ef- 
fects of signaled and unsignaled shock em- 
ployed response-independent shock delivery. 
Two studies (Hymowitz, 1976b; MacDonald, 
1973) used response-dependent shock de- 
livery schedules. Although it is not possible 
to discuss the relative contribution of the 
contingency between shock and responding 
to differential response suppression on the 
basis of the available literature, it is clear 
that each mode of shock presentation is con- 
ducive to differential suppression. 
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It is surprising that more data on this 
variable are not available. Although there is 
some question as to whether responding is 
suppressed more by response-dependent than 
by response-independent shock delivery (e.g. 
Azrin, 1956; Camp, Raymond, & Church, 
1967; Church, 1969; Church, Wooten, & 
Matthews, 1970; Hoffman & Fleshler, 1965; 
Orme-Johnson & Yarczower, 1974; Rachlin 
& Herrnstein, 1969), it is generally accepted 
that responding in the absence of the CS in 
conditioned suppression studies is more aí- 
fected by response-independent than by re- 
sponse-dependent shock that is delivered in 
the presence of the CS (Hoffman & Fleshler, 
1965; Hunt & Brady, 1955; Orme-Johnson 
& Yarczower, 1974). Such differences be- 
tween the two modes of shock delivery might 
be expected to influence the differential sup- 

' pression of responding during signaled and 
unsignaled shock. 


Schedule of Electric-Shock Delivery 


Only one study (Hymowitz, 1973a) com- 
pared the rate of responding during signaled 
and unsignaled shock delivery during differ- 
ent schedules of shock presentation. For both 
fixed-interval and variable-interval shock de- 
livery schedules, more suppression of base- 
line responding was found for animals exposed 
to unsignaled than to signaled shock. There 
Was little if any difference in suppression be- 
tween the signaled fixed- and variable-inter- 
val conditions. 

When the percentage of occasions on which 
‘the CS preceded shock was decreased from 
100% to 50%, the importance of the sched- 
ule of shock delivery was apparent. Much 
More suppression of baseline responding was 
found in the 50% variable-interval than in 
the 50% fixed-interval condition. For the 
Variable-interval shock schedules, as much 
or more suppression occurred in the 50% as 
m the unsignaled shock condition (0%). 
For the fixed-interval shock delivery schedule, 

* most suppression occurred during unsig- 
naled shock, the least during 10096 signaled 
je and an intermediate amount during 
1 % signaled shock. These findings are dis- 
nes further in the section entitled Predict- 
МЭШН of Shock. Considering the importance 
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of the schedule of shock delivery for the sup- 
pression of responding in other situations 
(e.g, Azrin, 1956; Camp, Raymond, & 
Church, 1966; Ferraro, 1967; Morse & Kel- 
leher, 1966), it is not surprising that the 
schedule interacts with the predictability of 
shock to influence the course and degree of 
response suppression. 


Appetitive Factors; Food Schedule and 
Food Deprivation 


Schedule of Food Delivery 


The schedule and frequency of food de- 
livery influence the-manner in which respond- 
ing is suppressed by noxious events (e.g., 
Azrin & Holz, 1966). Relatively little is 
known about their influence on differential 
response suppression during signaled and un- 
signaled shock, although it is expected that 
the frequency and schedule of food delivery 
will interact with the shock delivery condi- 
tions. 

In terms of the generality of the effects of 
signaled and unsignaled shock delivery, it is 
noteworthy that Hymowitz found differential 
suppression when food pellets were delivered 
under variable-interval (Hymowitz, 1976b), 
fixed-ratio (Hymowitz, 1977b), and fixed-in- 
terval (Hymowitz, 1977a) schedules. Imada 
and Okamura (1975) studied the effects of 
signaled and unsignaled shock delivery on 
operant licking in water-deprived animals. 
Licking was reinforced on a continuous rein- 
forcement schedule. More suppression of lick- 
ing was found during unsignaled than during 
signaled shock conditions. These data cer- 
tainly enhance the generality of Seligman’s 
(1968) original findings obtained with vari- 
able-interval food delivery schedules. 

Although direct comparisons among the 
different food schedules used in the above- 
mentioned studies are not possible, it is note- 
worthy that the manner in which responding 
is affected by signaled and unsignaled shock 
is influenced by the schedule of food delivery. 
With fixed-ratio food delivery, for example, 
shock led to increases in the length of the 
postpellet pause, but had little effect on the 
running rate until much more intense shocks 
were delivered (Hymowitz, 1977b). For a 
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given shock intensity, the pausing was much 
more apparent in the component of the mul- 
tiple schedule associated with unsignaled 
shock. At some intensities, little if any paus- 
ing occurred during signaled shock, whereas 
marked pausing occurred during unsignaled 
shock. 


Food Deprivation 


As noted by Millenson and de Villiers 
(1972), the contribution of food deprivation 
to response suppression, though very im- 
portant, is a relatively neglected area of re- 
search. This is somewhat surprising because 
food deprivation plays an important role in 
motivational analyses of punishment (Estes, 
1969) and conditioned suppression (Millen- 
son & de Villiers, 1972). Moreover, there is 
little doubt that this variable interacts with 
the schedule and frequency of food delivery 
to determine the resistance of responding to 
the suppressive effects of electric-shock de- 
livery. Future analyses of response suppres- 
sion during signaled and unsignaled shock 
ought to include this variable in the design of 
the experiment, At present, virtually no data 
on the effects of food deprivation on differ- 
ential suppression during signaled and unsig- 
naled shock have been published, although 
the suppression of responding in the presence 
of the CS is influenced by food deprivation 
(Millenson & de Villiers, 1972). 


Dependent Variable 


Myers (1971) noted that too much of our 
knowledge about response suppression de- 
rives from studies in which operant lever 
pressing served as the dependent variable. He 
questioned the generality of the lever-press 
findings to other response systems. Studies of 
autoshaping, learned taste aversions, species- 
specific defense reactions, and schedule-in- 
duced behaviors (cf. Bolles, 1972; Seligman, 
1970) produced data that seriously question 
the generality of principles of learning de- 
rived primarily from analyses of key pecking 
in pigeons and lever pressing in rats. One 
may similarly question the generality of the 
data on signaled and unsignaled shock, since 
most investigators employed schedule-con- 
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trolled lever pressing as the dependent vari- 
able. Some data recently collected with sched.) 
ule-induced water intake as the dependent 
variable serve to extend the generality of the 
lever-press findings (Hymowitz, 1977a). 

Schedule-induced water intake was firsti 
reported by Falk (1961) and refers to розе 
pellet water intake that occurs when food: 
deprived rats, mice, pigeons, or monkeys art 
exposed to any one of a number of inter 
mittent schedules of food delivery (Falk, 
1971; Segal, 1972; Staddon, 1975, 19774 
1977b). Induced water intake may be dis 
tinguished from normal regulatory drinking} 
With schedule-induced drinking, the animals 
are not water deprived, and the amount of 
water consumed may be far in excess of 
physiological requirements (polydipsia). In 
duced drinking may also be distinguished 
from paradigms in which licking serves 8$ 
an operant on which the delivery of food 8 
dependent. Schedule-induced licking follows 
pellet delivery and persists even though lick 
ing may postpone the delivery of the food 
pellet (Falk, 1971). { 

An adequate explanation of schedule-it: 
duced water intake must take into accounti 
that drinking is but one of a number of 1€ 
sponses that animals may engage in follow: 
ing pellet ingestion. Other schedule-inducel 
behaviors are schedule-induced wheel rum 
ning, pica, escape, air licking, and attack 
(Falk, 1971). A common feature of each be: 
havior is that they occur during those 0028 
sions when the probability of food deliver 
is low. Thus, they may be related to aversa 
aspects of intermittent positive-reinforceme™) 
schedules. Falk (1971) also noted that th 
induced behaviors seem not to be related t 
the task at hand—the procurement of fot 
pellets. He termed these behaviors adju" 
tive behaviors and likened them to Фра 
ment activities more commonly studied bj 
ethologists. Staddon (1975, 1977a, 1977 
distinguished between interim behaviors, SUC 
as schedule-induced water intake and 81009 
ing, which occur during occasions when foo 
pellets are not available, and terminal У 
haviors, such as lever pressing, which 000 
when food pellets are available. Interact” 
between the two behaviors may provide © 
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underlying structure for schedule perform- 
ances observed in the operant chamber. 

Although the exact nature of schedule-in- 
duced water intake eludes researchers at 
present, schedule-induced water intake may 
serve as a useful dependent variable for the 
analysis of the behavioral effects of signaled 
and unsignaled electric-shock delivery. Sched- 
ule-induced licking shares with schedule-con- 
trolled lever pressing many features of a de- 
sirable dependent variable. It is a highly 
quantifiable sample of behavior that remains 
quite stable from session to session. More- 
over, it also is sensitive to variables such as 
the frequency of food delivery (Falk, 1967; 
Freed & Hymowitz, 1972), the magnitude of 
the food pellet (Falk, 1967; Freed & Hymo- 
witz, 1972), body-weight loss (Freed & 
Hymowitz, 1972), shifts in the rate of rein- 

. forcement (Jacquet, 1972), and the delivery 
of electric shock (e.g., Hymowitz & Freed, 
1974). Comparisons of the effects that vari- 
ous conditions of signaled. and unsignaled 
shock have upon the suppression of schedule- 
controlled and schedule-induced behavior may 
provide useful insight into the role that the 
nature of the response has in the suppression 

_ of behavior in general. 

Recent studies (Bond, Blackman, & Scru- 
ton, 1973; Dunham, 1971; Freed, Hymowitz, 
& Fazzaro, 1974; Hymowitz, 1973b, 1976a; 
Hymowitz & Freed, 1974) showed that 
Schedule-induced licking was suppressed by 
shock in very much the same manner as lever 
Pressing. However, the intensity of shock re- 

і Quired to suppress induced licking was lower 
than the intensity required to suppress lever 
Pressing (Bond et al, 1973; Hymowitz & 
Freed, 1974). Subsequent studies (Hymo- 
Witz, 1976a) showed that the greater sensi- 
tivity of induced licking to shock was not 
simply due to the possibility that shock was 
more often delivered while the animals were 
licking than while they were pressing. Sched- 
ule-induced licking was similarly suppressed 
ЈИ shock that was separated in time from 
icking by 1, 5, 10, and 15 sec (Hymowitz, 
19762). 

In а recent study, Hymowitz (19772) em- 
ployed schedule-induced licking, as well as 

} Schedule-controlled lever pressing, as a de- 


183 


pendent variable. Food was delivered under 
a fixed-interval 40-sec schedule. Following 
the acquisition of stable levels of schedule- 
induced and schedule-controlled behavior, the 
animals were exposed on separate occasions 
to signaled and unsignaled electric-shock de- 
livery. Electric shock was delivered according 
to a variable-time 70-sec schedule. During 
signaled shock, a 10-sec signal light preceded 
each shock. Blocks of successive food-and- 
shock sessions at a constant shock intensity 
alternated with blocks of successive food- 
alone sessions. 

The data produced by this study were con- 
sistent with the author's previous findings 
(Hymowitz, 1977b). For both lever pressing 
and schedule-induced licking, differential re- 
sponse suppression during signaled and un- 
signaled shock was a function of shock in- 
tensity. At some intensities, pressing and lick- 
ing either were unaffected during signaled and 
unsignaled shock, suppressed primarily dur- 
ing unsignaled shock, or equally suppressed 
during signaled and unsignaled shock. More- 
over, the intensity of shock required to sup- 
press licking generally was lower than for 
lever pressing. Hence, licking often ceased 
during signaled shock delivery, whereas lever 
pressing was maintained at a high rate. The 
finding that one behavior occurs at a high rate 
during signaled shock whereas another is 
totally suppressed has considerable theoreti- 
cal importance. It is not clear how such a 
finding can be explained by Perkins’ (1968) 
preparedness hypothesis or Seligman’s (1968) 
safety signal hypothesis. If the animal is pre- 
pared sufficiently to continue pressing or if it 
is safe to press, why is the animal not pre- 
pared sufficiently to continue licking, and 
why is it not safe to lick? 

Differential response suppression during sig- 
naled and unsignaled shock also has been re- 
ported with operant licking (e.g., Imada & 
Okamura, 1975). As noted previously, it is 
important to distinguish between operant and 
induced licking. In the former, the animals 
are water deprived and given limited access 
to water in the test situation. For schedule- 
induced licking, the animals are not water de- 
prived, and licking occurs as an adjunct (cf. 
Falk, 1971; Segal, 1972) to ongoing food- 
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maintained responding. Typically, animals 
lick after the ingestion of each food pellet 
(e.g., Falk, 1961). Clearly, Seligman's (1968) 
original findings pass the test of response gen- 
erality (cf. Sidman, 1960). However, the 
available data suggest the response studied 
may be an important determinant of sup- 
pression during signaled and unsignaled shock. 
Further studies of operant and adjunctive be- 
haviors appear warranted. In particular, it is 
of considerable interest to determine why 
schedule-induced licking is suppressed during 
signaled shock conditions in which schedule- 
controlled lever pressing is maintained at a 
high rate. The greater sensitivity of induced 
licking to shock may simply be due to the 
nature of the response (lever pressing versus 
licking), to the relative strength of schedule- 
induced and controlled behavior per se, or to 
the rate at which each behavior occurs. With 
respect to the latter possibility, Blackman 
(1977) noted that behaviors that occur at a 
high rate are more affected by a given in- 
tensity of shock than are behaviors that oc- 
cur at a low rate. He referred to this as a 
rate-dependency effect. Since licking gener- 
ally occurs at a higher rate than does lever 
pressing, this hypothesis certainly warrants 
further attention. It is noteworthy, however, 
that McKearney (1973) showed that the rate 
of licking was not a factor in determining the 
effects on induced licking of the drug meth- 
amphetamine. With schedule-controlled lick- 
ing, rate was a major determinant of the 
drug's effect (rate-dependency effect). 


Predictability of Shock 


Rescorla (1968) showed that CSs acquired 
their suppressive or aversive properties only 
when they reliably specified the advent of 
shock delivery. When shock occurred with 
the same probability in the absence and in 
the presence of the to-be-conditioned stimu- 
lus, little conditioning occurred. To what ex- 
tent is differential response suppression dur- 
ing signaled and unsignaled shock a function 
of the probability that a CS precedes each 
shock in the signaled shock condition? Hymo- 
witz (1973a, Experiment 3) employed a 2 x 
3 experimental design with sessions as re- 
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peated measures. The factors were the sched- 
ule of shock delivery (fixed interval or vari. | 
able interval) and the percentage of occasions 
on which a 5-sec CS preceded shock delivery 
—100%, 50%, or 0%. 

For the 100% signal condition, the pres- 
ence and absence of the signal reliably indi- 
cated shock and shock-free occasions. For} 
both schedules of shock delivery, the least) 
amount of response suppression occurred in} 
this condition. Responding virtually ceased’ 
during the CS, but was maintained at a high 
rate in its absence. For the 0% signal condi- 
tion, the presence and absence of shock were 
not specified. Although the  fixed-interval 
schedule may have provided temporal cues, 
there was little if any difference in response) 
suppression for the groups of rats exposed 
to either shock schedule. Both groups re 
vealed marked response suppression with lit- 
tle recovery during the 10 test sessions. 

Perhaps the most interesting groups were) 
those for which the CS preceded shock on 
50% of the shock occasions. The presence of! 
the CS reliably indicated shock occasions, but 
the absence of the CS did not reliably indi- 
cate shock-free occasions. For variable-in- 
terval shock delivery, as much suppression 
occurred under this condition as under the 
0% condition. For fixed-interval shock de 
livery, in which temporal factors may have 
specified to some degree the presence and 
absence of shock, an intermediate degree of 
response suppression occurred. There wás 
more response suppression than in the 100% 
condition, but much less than in the 0% 
condition. 

Comparable findings were obtained in pref 
erence situations (Badia, Harsh, Coker, = 
Abbott, 1976; Hymowitz, 1973c). Hymow! 
(1973c) used a free-operant shuttle box a 
rangement in which 10 animals were expo 
to a VT 65-sec schedule of scrambled 1006 
shock (.40 mA). For five of the animals, 0 
side of the shuttle box was associated W! 
signaled shock (100%), and the other side 
was associated with unsignaled shock (0%) 
For five other animals, one side oí the b? 
was associated with unsignaled shock (0%) 
on the other side, the CS preceded shock 0 
50% of the shock occasions. | 
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The animals showed a marked preference 
for 100% signaled shock delivery over 0% 
or unsignaled shock. This was true of all five 
of the animals tested. However, a clear pref- 
erence for the 50% signal over the 0% signal 
condition was not found. A signal that reliably 
indicated the presence of shock but not the 
absence of shock was not preferred over the 
no-signal condition. 

Badia et al. (1976) used a changeover pro- 
cedure (from unsignaled to signaled shock 
delivery) to answer a related question. Is 
changeover responding controlled by stimuli 
that reliably indicate the presence of shock 
or by stimuli that reliably indicate the absence 
of shock? When the CS precedes shock 100% 
of the time, the CS necessarily serves both 
signaling functions. When the probability of 
the CS preceding shock is varied, the signal 
function of the CS also is altered, depending 
on the manner in which the CS and the shock 
are programmed. 

In one condition, all of the shocks were 
preceded by a CS. However, the probability 
of a CS being followed by shock varied from 
1.0 to .02. Thus, at some CS-shock probabil- 
ities, the presence of the CS did not reliably 
indicate shock occasions, although the absence 
of the CS reliably indicated’ shock-free oc- 
Casions. In this condition, the animals con- 
Sistently changed over from unsignaled to 
signaled shock at each probability level; that 
is, animals readily selected the condition in 
Which the CS dependably specified the absence 
of shock but not necessarily the presence of 
shock, 

In a second condition, all of the CSs were 
followed by shock. Some shocks were not 
Preceded by a CS, depending on the desired 
Probability value. This is the same manner 
of programming the CS and the shock that 
Was used by Hymowitz (1973a, 1973c). As 
the dependability of the CS as an indicator 

„ Of the absence of shock decreased, the ani- 
mals’ preference for the CS condition de- 
creased, even though the CS still reliably 
Specified the presence of shock. These data 
are in close agreement with the preference 
data reported by Hymowitz (1973c) and 
With the suppression data for.the group of 

, *nimals exposed to the variable-interval shock 
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delivery schedule (Hymowitz, 1973a, Experi- 
ment 3). 

One other study also examined differential 
associations between the CS and shock (Na- 
geishi & Imada, 1974). They presented an 
average of three shocks per session. For one 
group of animals, all of the shocks were pre- 
ceded by a CS (100%); for another, two of 
the shocks were preceded by a CS (66%); 
for another, one of the shocks was preceded 
by a CS (3396); in a final group, none of 
the shocks were preceded by a CS (096). 

Nageishi and Imada (1974) reported that 
the degree of response suppression was directly 
related to the degree of shock predictability, 
the least suppression occurring in the 100% 
condition and the most in the 0% condition. 
In their study, the presence of the CS re- 
liability indicated the availability of shock in 
the 66% and 33% conditions. However, the 
absence of the CS did not reliably indicate 
the absence of shock. Based on the preference 
data cited earlier (Badia et al., 1976; Hym- 
owitz, 1973c), Nageishi and Imada's (1974) 
findings are somewhat surprising. The prefer- 
ence data suggest that the utility of the CS 
derives from its specification of shock-free, 
not shock, occasions. 

Nageishi and Imada's (1974) findings are 
in close agreement with Hymowitz's (1973a) 
findings for fixed-interval shock delivery but 
not for variable-interval shock delivery. Al- 
though procedural and methodological differ- 
ences between the studies prohibit direct com- 
parisons, one cannot help but wonder whether 
the small number of shocks (three per ses- 
sion) delivered in Nageishi and Imada's study 
in some way accounts for some of the differ- 
ences. For example, the delivery of the third 
shock may have signaled shock-free occasions. 

Clearly, the percentage of occasions on 
which the CS precedes shock is an important 
determinant of response suppression during 
signaled and unsignaled shock delivery. More- 
over, the manner in which the CS and shock 
are programmed is also an important factor. 
To gain a better understanding of the rela- 
tionship between the predictability of shock 
and differential response suppression, further 
studies are required. One useful approach 
would be to investigate the differential sup- 
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pression of schedule-controlled and schedule- 
induced responding, perhaps within the same 
animal, under conditions in which the per- 
centage of occasions on which the CS precedes 
shock is varied. The CS and shock also should 
be programmed so that the CS may depend- 
ably specify shock-free occasions or depend- 
ably specify shock occasions, but not both 
(cf. Badia et al., 1976). 


Discriminability of Shock-Free Occasions 


There are two important theoretical view- 
points from which to view data on signaled 
and unsignaled shock. According to Perkins 
(1968), the importance of the CS derives 
from the fact that it enables the animal to 
prepare for shock. According to Seligman 
(1968; see also Seligman, Maier, & Solomon, 
1971), the CS is important because it specifies 
Occasions in which shock is not available. 
Such shock-free occasions may allow the ani- 
mal to relax (cf. Denny, 1971) instead of re- 
maining in a chronic state of fear. The rein- 
forcing effects of shock-free occasions have 
been demonstrated quite eloquently in studies 
of avoidance and shock-frequency reduction 
(Hineline, 1977). 

Recent studies by  Hymowitz (1976b, 
1977b), in which animals were studied under 
mixed as well as multiple schedules of sig- 
naled and unsignaled shock, provide consid- 
erable support for the safety signal analysis. 
Multiple schedules contain discriminative 
stimuli that distinguish one component from 
the other, Mixed schedules do not. During 
multiple schedules of signaled and unsignaled 
electric-shock delivery, the animals may read- 
ily discriminate that component in which the 
absence of the CS is associated with the ab- 
sence of shock. During mixed schedules, this 
discrimination is highly unlikely. Although 
both schedules are the same in that the CS 
is reliably associated with the presence of 
shock, they differ in that only during the 
multiple schedule is the absence of the CS 
clearly associated with the absence of shock. 

To the extent that the preparedness value 
of the CS is the determining factor, one 
would predict differential response suppres- 
sion during signaled and unsignaled shock for 
each schedule. To the extent that the dis- 
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crimination of shock-free time is necessary 
for differential responding, differential re- 
sponse suppression is predicted for the mul- 
tiple schedule but not for the mixed schedule, 

The data (Hymowitz, 1976b, 1977b) 
clearly favor the safety signal hypothesis. Al- 
though differential suppression was found 
for the multiple schedule of signaled and un- 
signaled shock, responding during the com- 
ponents of the mixed schedule of signaled and 
unsignaled shock was not differentially sup- 
pressed. Responding was similarly suppressed 
during signaled and unsignaled shock de- 
livery. These findings, as well as those of 
Hymowitz (1973a, 1973c) and Badia et al. 
(1976) referred to previously, strongly sug- 
gest that the discrimination of shock-free oc- 
casions, rather than shock occasions, is one 
of the key determinants of differential re- 
sponding during signaled and unsignaled shock 
(see also Arabian & Desiderato, 1975). 


Discussion 


As noted previously, studies of response sup- 
pression during signaled and unsignaled elec- 
tric-shock delivery uniformly support Selig- 
man's (1968) original report of more response 
suppression during unsignaled than during 
signaled shock. Moreover, the generality of 
Seligman's (1968) original findings was suc 
cessfully extended to include operant and 
adjunctive licking, a variety of shock delivery 
schedules, a wide range of shock parameters, 
and several different schedules of food de- 
livery. In general, the findings on differential 
response suppression also are consistent with 
Seligman’s (1968) safety signal hypothesis. 
In particular, studies that show that the pre- 
dictability of shock-free occasions, not shock” 
occasions, is a key determinant of differential” 
responding (Hymowitz, 1976b, 1977a) lend 
strong support to the safety signal hypothesis: 

It is also important to note that it is highly 
unlikely that the findings on differential re- 
sponse suppression are, in some way, due 
procedural or methodological artifacts. All of 
the studies mentioned in this review 61" 
ployed scrambled  electric-shock delivery: 
Hence, modification of shock by postural ad- 
justment was minimized. Some of the studies 
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(Hymowitz, 1973a, 1976b) used response-de- 
pendent shock, and the response lever was 
included in the shock circuit. With such a 
procedure, it was nearly impossible for the 
animal to escape shock once it was produced. 
Yet, much more response suppression was 
found during unsignaled than during signaled 
shock. In addition, Hymowitz (1976b, 1977b) 
showed that responding was suppressed as 
much during the signaled as during the un- 
signaled shock component of mixed schedules 
of shock delivery. If the animal had used the 
preshock stimulus as a discriminative cue for 
an avoidance-escape response, one would have 
expected less suppression in the signaled com- 
ponent of the mixed schedule. To the con- 
trary, the animals seemed not to benefit from 
signaled shock delivery unless the absence of 
the signal was reliably associated with the 
absence of shock. Such findings are highly 
compatible with the safety signal hypothesis. 

The literature review also revealed a num- 
ber of variables that seem to influence dif- 
ierential response suppression during sig- 
naled and unsignaled shock and choice of 
Signaled over unsignaled shock in the same 
manner. Both depend on the discriminability 
of shock-free occasions and the intensity of 
shock. The agreement between studies of 
choice and studies of response suppression 
15 sO great in some instances that the con- 
fidence one may place in the reliability of the 
data is markedly enhanced, Despite the 
Serious criticisms leveled at much of the 
Choice literature, the good fit between the 
choice and suppression literature enhances 
Our confidence in the findings that animals do 
Prefer signaled over unsignaled shock and 
fhat the variables that control the choice of 
Signaled shock also determine the differential 
Suppression of responding during signaled 
and unsignaled shock, 

Although the data generated during the 
Past decade are generally consistent with the 
Safety signal hypothesis, some of the more 
Fecent findings suggest the need for possible 
revision or extension of the hypothesis. It is 
not clear, for example, why differential re- 
Sponse Suppression should occur for any given 
animal at one intensity of shock or for one 


_ kind of behavior and not at another intensity 
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or for another behavior. The discriminability 
of shock-free occasions plays an important 
role in determining differential responding, 
but it is not the only variable of importance. 
As in most other analyses of response sup- 
pression, the resistance of responding to the 
suppressive effect of shock depends on the 
interplay of a host of food-related and shock- 
related behaviors. As additional analyses of 
the variables that influence differential re- 
sponse suppression become available, it may 
be possible to fit the safety signal hypothesis 
within a broader theoretical context. Safety 
and danger are relative terms, lacking in pre- 
cision and clarity. As additional data emerge, 
it may be more fruitful to analyze the sup- 
pression of responding during signaled and 
unsignaled shock in terms of interactions 
among the signaling of shock, the motiva- 
tional state of the organism, the nature of the 
food and shock schedules, the strength of the 
response under study, and the past experi- 
mental history of the organism. Viewed in 
this manner, safety and danger are but two 
of the many variables that determine the sup- 
pression of responding. Thus, the time may 
be rapidly approaching when the safety sig- 
nal hypothesis will lose its appeal as an ex- 
planatory mechanism. However, as shown in 
the present literature review, the safety sig- 
nal hypothesis has served us well. It has 
proved to be a useful organizing principle for 
a considerable body of experimental litera- 
ture and has laid the foundation for our cur- 
rent knowledge of the effects of signaled and 
unsignaled aversive events on behavior. 
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Infant Crying as an Elicitor of Parental Behavior: 
An Examination of Two Models 


Ann D. Murray 
Macquarie University, Sydney, Australia 


Two models of the compelling nature of the infant cry and its effectiveness in 
eliciting caregiving behavior are examined. The first model is that of the cry as 
a releaser of parental behavior. It is suggested that a good fit between the 
available data and this model depends upon broadening the classical definition 
of the releaser concept to include motivational factors in a manner advocated 
by some of the modern ethologists. A model of the cry as an activator of 
motives of an egoistic or altruistic nature is also examined. This model, based 
on Hoffman's theory of altruistic motivation, contributes to an understanding 
of both the compelling releaserlike effects of the cry as well as the wide varia- 
tions observed within and between cultures in the nature and extent of caregiver 
responsiveness, It is further argued that altruistic behavior toward crying in- 
fants in particular must be viewed within the specific context of ontogenetic 


processes that enhance the attractiveness of the young for adult caregivers. 


It is part of the folklore that the cry of a 
young infant is a compelling stimulus. There 
1S an urgency associated with the cry that 
makes a response to it obligatory (Ostwald, 
1963). In addition to its purely motivational 
qualities, the cry evokes intense emotional 
reactions that can be either of a constructive 
or of a destructive nature. It is capable of 
evoking strong feelings of concern and pro- 
tectiveness on the one hand or of extreme 
hostility on the other (Ostwald, 1963). The 
action taken in response to the cry can like- 
Wise consist of nurturant acts or murderous 
ones (Stone, Smith, & Murphy, 1973, p. 
1002), 

In the theoretical and empirical analysis 
that follows, two conceptualizations of the 
mechanisms by which the cry has its power- 
ful impact are examined. First, the question 
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of whether the cry can be considered a re- 
leaser of parental behavior is discussed with 
reference to claims for its evolutionary sig- 
nificance, the physical nature of the stimu- 
lus, its signal value, parental responses to it, 
and possible receptor mechanisms. Subse- 
quently, a model is presented that views the 
cry as a graded signal that activates motives 
of either an altruisitc or an egoistic nature. 
Last, some factors influencing the ontogeny 
of parental behavior are examined. For pur- 
poses of this analysis, the focus is restricted 
to crying in early infancy when the infant is 
not physically mobile and is totally dependent 
on a caregiver for the satisfaction of his or 
her needs. 


The Cry as a Releaser of 
Caregiving Behavior 


Evolutionary Significance of the Cry 


Within the framework of the attachment 
theory formulated by Bowlby (1969) and 
Ainsworth (1969), it has been suggested that 
the infant cry may serve as a releaser of care- 
giving behavior. Crying is considered by these 
theorists to be an attachment behavior that 
promotes proximity to or contact with the 
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caregiver, usually the mother. Using an etho- 
logical perspective, these theorists have hy- 
pothesized that attachment behaviors such as 
crying, smiling and following originally 
evolved to perform a protective function by 
bringing the infant into close proximity with 
the mother, who could then defend him or 
her against predators and other dangers. Al- 
though such close proximity between mother 
and infant is not necessary to ensure protec- 
tion in our present society, Bowlby and Ains- 
worth argued that babies are nevertheless 
genetically programmed to cry when out of 
contact or distressed and that their behavior 
is adapted to the prototype of a responsive 
caregiver. 

To support his thesis concerning the evolu- 
tionary adaptedness of close mother-infant 
proximity and sensitivity to crying, Bowlby 
cited indirect evidence. He argued that species- 
typical characteristics are adapted to the 
environment in which the species evolved. For 
Homo sapiens, it is especially difficult to de- 
termine the adaptive significance of charac- 
teristics because people have modified their 
environment so drastically from that in which 
they originated. The kinds of evidence used 
to support his theory include comparative 
studies of animals and anthropological studies 
of contemporary societies that exist in en- 
vironments that are least modified from the 
“environment of evolutionary adaptedness." 
Since the publication of Bowlby’s work in 
1969 three recent studies have provided ad- 
ditional support for his theory. 

Konner (1972; Devore & Konner, 1974) 
studied the, rearing patterns of a hunter— 
gatherer society in Botswana. He argued that 
98% of the evolutionary history of Homo 
sapiens took place in a hunter-gatherer econ- 
omy and that very little evolution has oc- 
curred since then. He hoped to be able to 
highlight the adaptive consequences of the 
rearing pattern he found with reference to 
the environmental selective pressures associ- 
ated with this preagricultural nomadic exis- 
tence. A major feature of the rearing pattern 
he found was that there was virtually con- 
tinuous contact between the mother and her 
infant. Infants were carried in a sling on the 
mother's side or on the hip. The feeding pat- 
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tern was also described as continual in that 
infants were nursed at least twice an hour 
for a period of 30 sec to 10 min. Konner 
reported that infants rarely cried because 
mothers could anticipate hunger by interpret- 
ing more subtle proximal cues such as bodily. 
movements, facial expressions, and so on. In 
fact, the cry was treated as an emergency 
signal and was responded to as such with an 
average latency of 6 sec. Note that this con- 
trasts sharply with the practice in Western 
cultures in which response is delayed for from 
5 to 30 min (Bernal, 1972) and young in- 
fants cry an average of 1 to 2} hours a day 
(Bernal, 1972; Brazelton, 1962). According 
to Konner, the adaptive consequences of the 
hunter-gatherer rearing pattern were two- 
fold. Close proximity served as a protection | 
from predators and other dangers in early in- 
fancy and fostered strong attachments [0 
adult models of hunting and gathering skills 

Although the comparative evidence pre 
sented by Bowlby has been criticized for its 
heavy reliance on studies using the rhesus | 
monkey (Dolhinow & Olson, Note 1), sup: | 
port has been provided for Bowlby’s theory 
by the findings of Blurton-Jones (1972) in 
his survey of a wide variety of mammalian 
species. Blurton-Jones asked whether human | 
infants and their mothers are adapted to rela- 
tively continuous contact (the pattern com- 
mon among hunter-gatherer societies and 
Old World monkeys and apes) or to rela- 
tively discontinuous contact as exemplified 
by the European and American style of rear- 
ing. To answer this question he surveyed 4 
wide variety of mammals to determine what 
anatomical, physiological, and behavioral fea- 
tures correlated with the two rearing patterns 
of caching and carrying. He then applied the 
correlations found in animals to the huma? 
mother and infant to predict whether humans 
evolved as a caching or a carrying species. Ш 
caching species such as the tree shrew, the 
young are left for long periods in hiding places 
while the mother forages for food, and feeding 
is at widely spaced intervals. In carrying 
Species, the young ride on the mother, and 
feeds are closely spaced. 

The evidence that human beings evolved 
as а caching species was not convincing be 
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cause (a) the young would have to remain 
silent until the mother returned so as not to 
attract predators by crying, (b) cached young 
do not urinate or defecate without maternal 
stimulation as this too would attract predators, 
and (c) the young would have to possess 
mechanisms of thermoregulation. None of 
these conditions were satisfied for the human 
infant. 

The evidence for carrying, however, strongly 
favored continuous contact. Three arguments 
were advanced based on milk composition, 
sucking frequency, and the duration of feed- 
ing. The composition of the mother's milk 
correlated with the schedule of feeding such 
that in widely spaced feeders, there was a 
high protein and fat content, whereas in con- 
tinuous feeders, there was a low protein and 
fat content. For example, the tree shrew, 
Which feeds every 48 hours, had a high pro- 
tein and fat content compared with the higher 
nonhuman primates, which are continuous 
feeders. Human milk was found to be identi- 
cal in fat and protein content to that of the 
continuous-feeding anthropoid apes. Simi- 
larly, sucking frequency correlated with feed- 
ing frequency and milk composition. Animals 
that fed the least often sucked the fastest, and 
fast sucking was associated with a high pro- 
tein and fat content in the mother's milk, 
The slow sucking rate in human infants is, 
therefore, adapted to a pattern of continuous 
contact. The third piece of evidence was that 
widely spaced feeders fed for a short dura- 
tion. For example, the duration of feeding 
for rabbits was 4 to 5 minutes once every 24 
hours. The time spent each day in feeding 
for human infants was found to be compara- 
tively long, suggesting adaptation to the con- 
tinuous-feeding pattern, 

As further support for the hypothesis that 

uman infants are adapted to the prototype 
of a responsive mother, Bell and Ainsworth 
(1972) have claimed that prompt maternal 
Tesponses to crying early in the first year 
is adaptive outcomes for infants. They 

"d that responsiveness to crying in the 
first half year led to a decrease in the fre- 
quency and duration of crying in the second 
alt year, In addition, those infants whose 
Mothers had responded promptly to their 
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cries early on were less likely to use crying 
in an instrumental manner at 1 year of age 
and were more likely to develop other social 
signals such as bodily gestures, facial expres- 
sions, and vocalizations to communicate with 
their mothers. According to Bell and Ains- 
worth, their findings indicate that crying is 
at first expressive and indiscriminate and 
that only toward the end of the first year 
does crying become “goal corrected” and used 
with the intent to influence others. 

If the infant cry is an adaptive species- 
characteristic behavior that evolved as an 
emergency signal, one might expect to find 
reciprocal mechanisms in caregivers that en- 
sure a response in kind to the cry. The con- 
sequences of an appropriate response for the 
infant relate to his or her chances of survival; 
an appropriate response is adaptive for the 
parent also because the infant's survival ul- 
timately contributes to the parent's reproduc- 
tive success. In cases such as this, in which 
contact between conspecifics is adaptive to 
both, the sender of the signal and the re- 
ceiver of the signal will be mutually adapted 
to each other (Eibl-Eibesfeldt, 1975). In this 
vein, it has been suggested that the infant 
cry and other attachment behaviors such as 
smiling may operate as sign stimuli that re- 
lease parental behavior (Bowlby, 1958). 


The Releaser Mechanism 


Since Lorenz's original observations of re- 
leasers in the 1930s, examples of this form 
of adaptive mechanism have been described 
in many species (Eibl-Eibesfeldt, 1975). A 
sign stimulus is a simple, conspicuous, and 
specific stimulus that acts figuratively as the 
key that unlocks a stereotyped motor action 
(often referred to as a fixed action pattern). 
The receiver of the key stimulus is said to 
possess an innate releasing mechanism (IRM), 
the hypothesized neural receptor center for 
receiving and filtering afferent impulses pro- 
duced by the key stimulus. Releasers have 
been found to regulate interactions between 
prey and predator, between sexual partners, 
between sexual rivals, and between parent 
and young in many species (Eibl-Eibesfeldt, 


1975). 
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Although the infant cry is often casually 
likened to a releaser, no formal assessment 
of whether it fits the model has ever been 
carried out. Much effort has been put into 
describing the physical characteristics of the 
stimulus, but little effort has been made to 
determine whether and/or which key features 
affect parental responses. Often the question 
of whether the cry acts as a releaser is linked 
with the issue of whether different cry types, 
for example, for pain versus hunger, are recog- 
nizable and how these cry types differ in 
physical characteristics. In the ensuing para- 
graphs, an attempt is made to assess the 
“goodness of fit” of the model of the cry as 
a releaser. 

Although it is recognized that the releaser 
concept as originally proposed by Lorenz 
(1937) has recently come under attack (cf. 
Hinde, 1974; Klopfer, 1974), the classical 
view of the IRM has been adopted here for 
heuristic purposes. The following list of the 
basic characteristics of releaser systems is 
derived from Eibl-Eibesfeldt (1975), who 
provides the modern expression of the Loren- 
zian ethological view. 

1. The basis for recognition of the key fea- 
tures of the stimulus and for the performance 
of a response is said to be innate in that the 
system is functional in inexperienced animals 
that have had no prior experience with the 
stimulus. 

2. Sign stimuli are made up of simple, con- 
Spicuous, and specific cues. It is even possible 
to trick an animal into responding to simpli- 
fied (artificial) models of the stimulus in an 
unnaturalistic context, 

3. Sign stimuli exhibit the phenomenon of 
heterogeneous summation, The same response 
can be elicited by several different and inde- 
pendent cues that combine additively to in- 
crease the effectiveness of the sign stimulus. 

4. The optimal stimulus is often unrealistic. 
By exaggerating key features of the stimu- 
lus, a "supernormal" stimulus is produced 
that is more effective than a natural stimulus. 

5. The response to the stimulus is a stereo- 
typed behavioral response. 

6. This stimulus-response specificity is ac- 
complished through a neural receptor-effector 

system in which afferent impulses from the 
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senses are linked with efferent impulses from 
motor centers leading to the occurrence of 
fixed action patterns. 

Each of these characteristics is reviewed in 
relation to the available literature on infant 
crying. 


Innateness of Stimulus Recognition 
| 

For all practical purposes, the criterion of. 
innateness cannot be examined for adults 
who, even if never exposed to an infant’s cry, 
cried as infants themselves and have had the 
experience of crying during their life spans. 
The only research that possibly bears on this 
issue are the studies of contagious crying in 
newborns conducted by Simner (1971) and 
Sagi and Hoffman (1976). Simner found that 
infants tested at about the age of 70 hours 
cried more in response to a tape recording of 
a newborn cry than they did to either the cry 
of a 54-month-old infant or to two artificially 
produced sounds (a computer-synthesized cry, 
designed to contain features similar to the 
newborn cry and a series of white noise 
bursts included to control for the nonvocal 
properties of the cry). Furthermore, a tape 
recording of the infant's own cry produced. 
more crying than the cry of another newborn. 
Contagious crying was found to be greater 
among female than among male infants. Sim- 
ner speculated that imitation of sounds not 
very discrepant from those with which the 
infant was already familiar could account 
for the greater effectiveness of the two new- 
born cries when compared with the cry 0 
the older infant and with the artificial sounds. 

Sagi and Hoffman (1976) replicated Sim- 
ners (1971) results with younger infants 
(average age was 34 hours) using only the 
newborn cry and the synthesized cry from 
Simner's tapes. The newborn cry elicited more 
crying than did the synthetic cry and a con- 
trol period of silence. The sex difference i? 
contagious crying was also replicated. The 
authors took issue with Simner's original cog- 
nitive interpretation of these findings, be 
cause the contagious crying they observ 
appeared to be indicative of genuine distress 
rather than just a vocal response in imitatio? 
of a vocal stimulus. They interpreted Феї 
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results in terms of inference theory: Distress 


- cues from another person evoke associations 


with the observer's own past distress and re- 
sult in a distressed state in the observer. This 
interpretation rested on a classical-condition- 
ing paradigm. Because they felt that evidence 
for conditioning in 1-day-old infants was 
weak, Sagi and Hoffman suggested that the 
distress response to another's distress could 
be innate. 


The Cry as a Simple, Conspicuous Stimulus 


Studies of cry sounds have been not only 
numerous but also inspired by many di- 
verse interests on the part of investigators. 
These studies fall into two groupings: (a) 
detailed acoustical descriptions of cries and 
cry types inspired by the new technology of 


' the sound spectrograph and other instruments 


(normative studies) and (b) research on the 
ability to recognize cry types (signal value 
Studies). 

Normative studies. Although there was 
one early attempt at an acoustical study of 
the infant cry (Fairbanks, 1942), most of the 
Normative studies followed the invention of 
the spectrograph. These normative studies 
have established the temporal pattern of the 
newborn cry; it can be described as an ex- 
piratory cry that lasts .6 to 1.4 sec and is 
followed by a brief silence (.2 sec); it is 
Sometimes followed by a brief inspiratory 
Whistle (.1 to .2 sec) and by another brief 
period of rest (.2 sec) before another ex- 
piratory cry begins (Sedlackova, 1964; Truby 
& Lind, 1965; Wolff, 1969). The fundamental 
frequency of the expiratory cry is 400 Hz on 
the average, with the inspiratory whistle hav- 
ing a somewhat higher (550-600 Hz) fre- 
quency (Truby & Lind, 1965; Wolff, 1969). 
ee 5 often a rising-falling pattern of 
У quency, for example, an expiratory cry of 
as in length might begin with a funda- 
m ntal frequency of 300-350 Hz for .2 sec, 
Se to 500 Hz for .2 sec, and fall to 300-200 
1994.7 2 sec (Truby & Lind, 1965; Wolff, 

). The loudness of the cry is about 80 
the Taen measured 12 inches (30.5 cm) from 

= ys mouth (Ringel & Kluppel, 1964). 

* are wide individual differences among 
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babies in the characteristics of their cries, 
but individual cry patterns have been found 
to be stable for each infant (Ringel & Klup- 
pel, 1964). 

Before the emphasis in cry studies shifted 
to the description of cry types associated with 
different causes or emotions, Truby and Lind 
(1965) distinguished three general types of 
expiratory cries based on their inspection of 
hundreds of sound spectrograms. These types 
appeared to be associated with the intensity 
or effort put into the cry. The cries they 
analyzed were obtained from 1- to 12-day-old 
infants who were all pinched to provide a 
standard stimulus to cry. The three cry types 
were phonation, dysphonation, and hyper- 
phonation. They called the basic cry pattern 
phonation because of its harmonic structure 
and the symmetry and smoothness of the 
spectrogram and intensity pattern. Phonated 
cries did not give the impression of great 
distress or discomfort. Dysphonated cries 
were effortful performances in which turbu- 
lence or noise caused by overloading at the 
larynx obscured the harmonics of the basic 
cry pattern. Hyperphonated cries were charac- 
terized by an abrupt shift from or to a very 
high pitch of up to 2000 Hz, were whistle- 
like, and were caused by strain and constric- 
tion of the vocal apparatus. Hyperphonation 
often occurred concurrently with dysphona- 
tion for the most vociferous response to dis- 
comíort. Strain during egression was usually 
followed by hyperphonation during ingres- 
sion—the inspiratory whistle described earlier. 

The major attempts to differentiate acousti- 
cally among cry types of newborns according 
to their cause or underlying emotion have 
been made by Wolff (1969) and Wasz- 
Hockert, Lind, Vuorenkoski, Partenen, and 
Valanne (1968). Wolff distinguished three 
major cry types: the hunger cry, the mad or 
angry cry, and the pain cry. The hunger cry 
had no causal relation with hunger accord- 
ing to Wolff but was really just a basic 
rhythmical pattern. The basic pattern that 
Wolff described is similar to the phonated 
cry in Truby and Lind's classification. The 
inspiratory whistle that Wolff found to oc- 
cur within the basic cry pattern can be likened 
to the hyperphonated cries of the latter in- 
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vestigators. Wolff's mad or angry cry was 
characterized by turbulence due to an excess 
of air being forced through the vocal cords. 
This mad or angry cry was, then, synonymous 
with the dysphonated cry described by Truby 
and Lind. Wolff’s third cry type, the pain 
cry, was differentiated not on the basis of 
acoustical attributes, but on the basis of the 
temporal pattern of the first two or three ex- 
piratory cries following application of a pain- 
ful stimulus. Wolff's pain cries were recorded 
during the general hospital practice of prick- 
ing the neonate's heel to take a blood sample. 
This painful stimulus probably produced more 
discomfort than the pinch used by Truby 
and Lind. The cardinal features of Wolff's 
pain cry were (a) a sudden onset of loud 
crying (as opposed to a gradual buildup for 
the hunger or basic cry), (b) an initial long 
cry (as long as 4 sec compared with 1 sec 
or less for the basic cry), and (c) an extended 
period of breath holding after the initial cry 
(for as long as 7 sec). In Wolff's study, the 
pain cry settled down to the basic temporal 
pattern after two or three long expiratory 
cries. Except for pointing out the temporal 
pattern of the cry following a very painful 
experience, Wolff's formulation adds little to 
that provided by Truby and Lind. Wolff him- 
self disavowed that the basic cry had a causal 
connection with hunger, and the mad or angry 
cry can be described as just a more effortful 
performance indicative of greater discomfort, 
as in Truby and Lind's formulation. For con- 
venience in the following discussion of re- 
search concerning cry types, the labels used 
by the authors concerned, for example, 
"hunger" cry, have been retained, although 
the implied causal connections may be ques- 
tionable. Е 
Following what appeared to be a rather 
simple and appealing description of the basic 
cry pattern (phonation) with various super- 
imposed features indicative of effort (dys- 
phonation, hyperphonation, and length of the 
expiratory cry), Wasz-Hockert et al. (1968) 
performed a multivariate study of character- 
istics of cry types based on 11 attributes: 
length of the expiratory cry, pitch (minimum, 
general, and maximum), shift (cf. hyper- 
phonation), voice (voiced, voiceless, and half 
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voiced; cf. phonation vs. dysphonation), mel 
ody types (rising-falling, rising, falling, fla 
and no melody), continuity of signal, glottal 
plosives, vocal fry, nasality, tenseness, and 
subharmonic break. Using examples of birth, 
hunger, and pain cries, a multiple discrimi 
nant function analysis was used to arrive at 
decision rules for classifying cries into typ 
A rising-falling melody was associated will 
hunger cries; any melody other than rising 
falling and with a length of more than 15 
sec was a pain cry, and melodies other than 
rising-falling with lengths of less than 1$ 
sec were birth cries, Although shift (hyper 
phonation) and voice (phonation vs. dys 
phonation) did not contribute substantialh 
to the classification based on the multiple 
discriminant function analysis, the authori 
mentioned that shift occurred most frequently 
with pain cries, that birth cries were usually 
voiceless, and that hunger and pain cries dit 
not differ in voice. Given that so few of 
measures used were predictive and that 
major bases for differentiation could be aid 
to be the length of the signal and types 0 
phonation (because the scoring of melody 
was confounded with voice, i.e. voiceless 
sounds could not be scored for melody), the 
Wasz-Hockert results seem to indicate thal 
the cries were not uniquely different accordi 
ing to what caused them but rather diffe 
in intensity according to the degree of 
comfort experienced by the infant. One mi 
expect that a baby experiencing hunger Wik 
be less distressed than one experiencing birth 
or pain. The authors themselves noted that 
hunger cries when left a long time unattended 
began to resemble pain cries, 
Signal value studies. The studies col 
ducted to determine whether different @ 
types are recognizable are fraught with methi 
odological differences that make conclusion 
difficult to reach. Some studies have 1569] 
multiple-choice techniques, others have noy 
Some have controlled for the durations 0 
the cry sequences, others have not. One E 
used single expiratory cries, thus eliminatihl 
information provided by the temporal patte! 
of cries. The choices of cry types to be identi 
fied and the methods of evoking the crie 
have differed. In some studies, investigato™ 
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have narrowed their selection of cries only to 


*cries that they say are "typical" of their type. 


The landmark study of this type was con- 
ducted by Sherman (1927). Without the 
benefit of magnetic recording devices, he had 
to make live presentations of his stimuli. 
Groups of observers separated from the in- 
fants by a screen listened to cries elicited by 
four stimuli: hunger, sudden dropping, re- 
straint of the head and face, and sticking with 
a needle. Observers were exposed to each cry 
for approximately 10 to 15 sec. With no pre- 
determined cry-evoking categories provided, 
the 23 observers exhibited little agreement 
and mentioned a total of 12 emotions includ- 
ing hunger, pain, fear, colic, rage, and so 
forth. Sherman noted that when observers 
were allowed to listen for a longer period of 
time (2 to 3 min), they were able to dis- 
tinguish between cries caused by the applica- 
tion of an external stimulus and cries associ- 
ated with an internal organic condition (hun- 
ger or colic), as the latter tended to be more 
Prolonged than the former. 

The strongest criticisms of Sherman’s work 
have been voiced by Izard (1971) and Ek- 
man (Ekman, Friesen, & Ellsworth, 1972). 
According to Izard, the most remarkable thing 
about Sherman’s work is that no one ques- 
tioned why observers should be expected to 
differentiate among emotional reactions that 
all included distressful crying. Izard considers 
суше to be primarily the expression of one 
emotion; distress-anguish. This emotion is 
equivalent to what Darwin (1872/1965) de- 


„ribed as the emotion of suffering. Ekman 


оу 
. t angry emotions, 


by wi 


ce Sherman for failing to distinguish 
vee judgments of emotions and judg- 
dm im events and for treating responses of 
ie anger as separate and pain and hurt 
is Ferte when these are synonyms. Fur- 
своеа Tzard’s concern that all four 
(n in may have elicited the same emo- 
ps егеаѕ Ектап suggested that the 
CR. апі emotion expressed may have 
El Er таве based on the frequency with 
ao servers nominated this emotion, Izard 
that distress would have predominated 

The next Series of studies was conducted 
4sz-Hockert and his colleagues. In the 
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first study (Wasz-Hockert, Partenen, Vuoren- 
koski, Michelsson, & Valanne, 1964), nurses 
listened to six examples of “typical” cries of 
four types: birth, hunger, pain, and pleasure. 
The length of the cry samples varied from 5 
to 17 sec with a mean length of 12.3 sec. 
Nurses correctly identified an average of 16 
out of the 24 cries using a multiple-choice 
method in which they were told to select from 
among the four cry types. The poorest score 
of 11 was significantly different from a chance 
score of 6. 

In the next study (Wasz-Hockert, Partenen, 
Vuorenkoski, Valanne, & Michelsson, 1964), 
these investigators used the same method to 
test males and females who were experienced 
and inexperienced caregivers. Experienced 
women included mothers, children's nurses, 
pediatricians, and  midwives. Experienced 
men included fathers and pediatricians. Inex- 
perienced men and women had not looked 
after infants from 0 to 2 years in age for as 
long as 2 weeks. Experienced women cor- 
rectly identified significantly more cries than 
inexperienced women. Experienced men did 
not significantly differ from inexperienced 
men, The lowest group mean was still sig- 
nificantly different from a chance level. Al- 
though the authors did not test for a sex differ- 
ence, their data indicate that men were con- 
siderbly less accurate than women. Surpris- 
ingly, fathers and pediatricians scored lower 
than women with no experience. In addition, 
the easiest cry to identify was the pleasure 
cry, which was identified correctly almost 
100% of the time. The hardest to identify 
was the birth cry. 

In a later study with only women as sub- 
jects (Wasz-Hockert et al., 1968), a slightly 
different method was used. Six examples of 
each of the four cry types were randomly 
selected from the pool of expiratory cries used 
in the multiple discriminant function analysis 
described earlier. Each single expiratory cry 
was repeated seven times. Again, experienced 
women were found to perform more accu- 
rately than inexperienced women, and the 
lowest group score was higher than chance 
level. Again, the pleasure cry was easiest to 
identify and the birth cry the hardest. Birth 
was often confused with hunger and pain, and 
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hunger and pain were often confused with 
one another. The most poorly recognized 
cries, the authors claimed, were not repre- 
sentative of their type. 

Muller, Hollien, and Murry (1974) criti- 
cized the Wasz-Hockert studies on a number 
of grounds. In particular, they felt that the 
inclusion of birth and pleasure cries was in- 
appropriate and that Wasz-Hockert should 
have controlled for sample duration. They 
reported evidence that subjects could not cor- 
rectly identify cries evoked by pain stimula- 
tion (snapping a rubber band against the 
skin), auditory stimulation (clapping together 
wooden blocks), and hunger stimulation 
(withdrawal of feeding). Note that this last 
cry was evoked differently from previous 
studies in which naturally occurring hunger 
cries were recorded before a meal. Mothers of 
the infants recorded, and a group of mothers 
of infants of comparable age listened to, two 
15-sec segments of crying for each stimulus. 
Neither group of mothers was able to identify 
correctly the cry-evoking situation in a mul- 
tiple-choice situation. Muller et al. concluded 
that the acoustic characteristics of cries carry 
little perceptual information about the cry- 
evoking situation, that the cry generally acts 
only to alert the mother, and that judgments 
of the causes of crying in the home are based 
on additional environmental cues. 

Interest in whether cry types are identifi- 
able has stemmed from two sources. This issue 
is of importance not only to those who have 
been concerned with the ability to correctly 
identify emotional expressions of others but 
also to those who have been tracing the con- 
tinuity of development of infant sounds for 
communication from crying to babbling and 
finally to language (e.g., Irwin, 1948; Lynip, 
1951; Winitz, 1960). It is possible that this 
intense concern represents an overintellec- 
tualization of the problem. On the basis of the 
confusion in the literature over different cry 
types, their unique characteristics, and their 
identifiability, it is more likely that certain 
attributes relate to the intensity of discom- 
fort or distress felt and that any accuracy of 
identification according to evoking situation 
is due to a correlation between the intensity 
of the cries and observers’ prior notions 
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about the intensity of negative emotion 
sociated with particular cry-evoking 
tions. Hunger builds up slowly and i 
narily not as distressing an experience 
for adults; therefore, raucous cries wil 
den onsets are classified as due to pa 
melodious cries with slow buildups as 
hunger. Indirect support for this hyp 
could be derived from the fact that 
no agreement in the literature on the ca 
of crying (see Aldrich, Sung, & Knop, 194 
Aldrich, Sung, & Knop, 1945b; Brazeltt 
1962; Illingsworth, 1955; Lakin, 1957; 
папе & Lennane, 1973; Stewart et al., 19 
and Wolff, 1969 for lists and discussion 
possible causes of crying), and therefore 
would not expect there to be agreement 
the cry types associated with different cau 
Attempts to read into the cry more ай 
tance maintained between parent and inf 
in our culture. Because this distance is 
great as to preclude the use of more sub 
communications (e.g, body movement 
facial expression), great effort has gon 
interpreting the only distal signal avail 
to the infant—the cry. And further, becal 
psychological needs (e.g., for contact 0 
dling) as opposed to physiological n 
often not considered legitimate and de: 
of attention, there is a need on the pi 
the parent to differentiate between 
cries and “fake” cries, Reading into cries 
ferentiated signal values with a view to @ 
dating the beginnings of prespeech commi! 
cative competence also belies the discontinu 
between crying and the development of ¢ 
municative competence. For example, 1 
and Ainsworth (1972) found that ex 
crying was negatively related to the det 
ment of positive social communicatory 
This point is further amplified in a late 
tion when evidence that there are two di 
systems controlling the expression of V 
tary and involuntary vocalizations in humi 
beings is reviewed. 
The question I have addressed in” 
present section is whether the cry is a sim 
conspicuous stimulus with key features | 
portant for stimulus recognition. It 
argued that the key features that emergt 
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this review of the extensive literature relate 

- more to stimulus intensity than to stimulus 
recognition. Whether there are simple features 
that are essential for recognition would have 
to be determined by systematically varying 
the acoustic parameters of the cry and noting 
which cues are essential for identification of 
the sound as an infant cry. In results men- 
tioned earlier, infants differentiated between 
a real newborn cry and a synthesized cry 
(Sagi & Hoffman, 1976; Simner, 1971). This 
may indicate, by the criterion that the or- 
ganism can be tricked into responding to a 
simple model of a releaser, either that the 
cry is not a releaser or that the essential key 
features were overlooked in synthesizing the 
cry. On the other hand, casual observation 
provides evidence that adults are in fact fre- 
quently tricked by the cry of a Siamese cat 
even when it is out of context, for example, 
when no infant is expected to be in the vicin- 
ity of the listener, who is not even a highly 
motivated parent. 


Principle of Heterogeneous Summation 


According to the principle of heterogeneous 
| Summation, Separate and noninteracting cues 
combine additively to produce stimulus recog- 
nition. For example, the male stickleback’s 
fighting response to a rival male is elicited 
either by the red spot on the male’s belly or 
by the head-down position of the rival male 
(Eibl-Eibesfeldt, 1975). However, when both 
cues are combined, the attack response is 
у More reliably elicited, that is, the stimulus is 
Tore likely to be recognized. 
m might be argued that there are separate 
Y features that, when superimposed on the 
didi cry pattern, would be more likely to 
«1: parental responses. These key features 
include dysphonation or turbulence, 


might 
i ies or a shift to high pitch, the 
s 5d ha the expiratory cry, and the sudden- 
(1969) onset. In a small experiment, Wolff 
E while conducting his naturalistic ob- 
E ne of infants, played a tape record- 
ice be infant’s basic hunger cry and, on 
Ses "s € occasion, played a pain cry to mea- 
€ delay before the mother responded. 


eae 
] Claimed that there was a dramatic differ- 
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ence in speed of response, with response to 
the pain cry being almost immediate. As the 
pain cry was likely to possess some of the 
markers of intensity mentioned above, Wolff's 
result may provide some evidence of heteroge- 
neous summation. 


Evidence for Supernormal Stimulation 


Although no one has tried to produce a 
supernormal cry stimulus artificially to mea- 
sure its effectiveness, studies of the cries of 
brain-damaged and other abnormal infants 
may be of relevance to this issue. During the 
past 10 years, Wasz-Hockert and his col- 
leagues have compared the cries of normal 
infants with those of abnormal infants, in- 
cluding those with Down’s syndrome, neo- 
natal asphyxia, brain damage, hyperbilirubine- 
mia, cri du chat syndrome, and mixed or un- 
specified pathology. These studies, summarized 
in two recent reviews (Vuorenkoski, Lind, 
Wasz-Hockert, & Partenen, 1971; Wasz- 
Hockert et al., 1968), have found that the 
cries of abnormal babies are higher or lower 
in pitch than normal cries, have greater vari- 
ability, and have different temporal patterns 
marked by either shorter or longer cry dura- 
tions and cry intervals. The cries of abnormal 
infants are said to be so unpleasant that they 
override differences in maternal style (Ost- 
wald, 1973; Wolff, 1969). Nurses expend 
much energy to keep them quiet, or they 
stay out of earshot. The cribs of these babies 
are said to be often tucked away in the far- 
thest corner of the nursery from the nursing 
station because the cries of these infants are 
so unbearable to listeners (Milowe & Lourie, 
1964). The exaggerated acoustic features and 
temporal patterns of the cries of abnormal 
babies may constitute supernormal stimula- 
tion, but no research has been reported that 
systematically relates these features to the 
effect of these cry sounds on the listener. 


Evidence for a Stereotyped Response 


The question of whether the cry produces 
a stereotyped response can be dealt with in 
two parts; First, does the cry elicit a re- 
sponse, and second, is the response a stereo- 


typed one? 
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It is clear from a number of studies con- 
ducted in Western cultures that the infant 
cry does not always elicit a response from the 
caregiver. Bell and Ainsworth (1972) found 
that primiparous mothers of infants from 0 
to 3 months of age ignored a median of 4696 
of crying episodes. The most responsive 
mother ignored only 496 of crying episodes 
and the least responsive ignored 97% of cry- 
ing episodes. However, lower estimates have 
been provided by Moss and Robson (Note 2) 
and Bernal (1972), who found that an aver- 
age of 17% to 18% of the cries of firstborns 
were ignored. Mothers of second borns in 
Bernal's sample ignored fewer crying epi- 
sodes (8%). The discrepancies between Bell 
and Ainsworth's estimate and the estimates 
of the other investigators could be due to the 
former investigators! having computed a 
median rather a mean, to different definitions 
of crying episodes (e.g, Moss and Robson 
used fusses rather than cries), or to different 
data collection methods (observational meth- 
ods vs. diaries). 

For those cries that were attended to, the 
duration of the delay or the latency of the 
response was reported in two studies. Bell 
and Ainsworth (1972) reported that mothers 
delayed a median of 3.83 minutes per hour 
with a range of 2 minutes to. 9 minutes. 
Bernal (1972) reported that roughly two 
thirds of the cries were responded to within 
10 minutes but that response was delayed 
for from 10 to 30 minutes for one third of 
the cries. The smaller range reported by Bell 
and Ainsworth may be a function of their 
having computed the duration of the delay 
per hour rather than per episode. 

The Western pattern of often ignoring the 
cry and delaying response to it contrasts 
sharply with reports mentioned previously 
that mothers in hunter-gatherer cultures 
never ignore infant cries and respond with 
an average latency of 6 sec (Devore & Kon- 
ner, 1974). In fact, the findings of Bernal on 
mothers’ unresponsiveness to crying in a 
Western culture led Richards (1974) to re- 
mark that “the important lesson for the in- 
fant is how little effect his crying has on his 
caretakers” (p. 90). On the other hand, sev- 
eral other pieces of evidence would seem to 
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indicate that Richards may have ‘overstated 
the case. In a small experiment, Moss and 
Robson (Note 2) wanted to test whether 
mothers of 3j-month-old infants would re 
spond to a button that lit up (on a schedule 
prearranged by the experimenters) to indicate? 
that their infants were crying in another 
room. The investigators had previously ob- 
served these mother-infant pairs in the home 
and had found that mothers responded to 77% 
of their infants’ cries. However, in the but- 
ton experiment, only 13 out of the 54 mothers” 
responded to the lighted button (as reported 
by Harper, 1971), a response rate of only 
24%. 

That the sound of the cry may be neces- 
sary to convey a sense of urgency was hy- 
pothesized by Lenneberg, Rebelsky, and 
Nichols (1965) in their study of infants born 
to deaf parents. They reported that deaf par- 
ents were not compelled to attend to their 
crying infants even if they could see the dis- 
tressed state that their infants were in. It 
was as though some deaf parents could not 
tell whether their infants were in distress by 
looking at them. It appears that the су 
may be a necessary though not a sufficient 
condition for a response in our culture and 
that a cognitive awareness of distress with- 
out the urgency conveyed by the sound itself 
is ineffective in eliciting a response. | 

Turning now to whether motor responses 
to cries can be described as stereotyped, the 
most frequent intervention in the early months 
has been found to involve close physical сове | 
tact (picking up and holding or feeding). | 
Close physical contact accounted for at least 
half of the interventions in the Bell and Ains- 
worth (1972) study with primiparas. Of those 
crying bouts that were attended to, Bernal 
(1972) observed that 69% resulted in feed- 
ing by primiparas and 89% resulted in feeding 
by multiparas. Furthermore, interventions it- 
volving physical contact were 80% effective 
in soothing the infants in Bell and Ains- 
worth's sample. These results are consistent 
with those reported by Korner and Thomai 
(1970), who found that crying newborns we! 
most effectively soothed and brought to ? 
visually alert state by contact that was com- 
bined with vestibular stimulation and an uP 
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right posture. Over time, however, noncontact 
interventions such as approaching, vocaliza- 
tion, and other social stimulation became in- 
creasingly effective and were more frequently 
used by mothers (Bell & Ainsworth, 1972; 
Moss & Robson, Note 2). 

Although the studies reported indicate that 
by far the majority of cries are followed by 
nurturant responses involving close physical 
contact, it must not be overlooked that crying 
is also one of the major precipitants of abuse 
for children under 12 months of age. In one 
study of infants battered by their parents, 
excessive crying was given as the reason for 
battering by 80% of the parents of infants 
less than a year old (Weston, 1968). 

One could argue, however, that the cry 
may not always be a sufficient stimulus to 
elicit any response in our culture, let alone a. 
nurturant response, because of anomalies of 
development. Lorenz (1965) has stressed that 
fixed action patterns are very sensitive to 
"bad rearing" that can cause the disintegra- 
tion of releasing mechanisms. If the parental 
attachment system develops optimally under 
conditions of close parent-infant contact and 
minimal crying, then a pattern involving little 
Contact and excessive crying may result in 
nonfunctional or maladaptive responses to 
the releasing stimulus of crying. In this con- 
hection it is worth noting that sensitivity to 
crying signals typically occurs within a larger 
cultural pattern of rearing characterized by 
close mother-infant contact, prolonged breast- 
feeding, and wide spacing between children. 
Tn a survey of 222 cultures, Mead and New- 
ton (1967) found that cultures clustered basi- 
cally into two types—those with a developed 
and those with a muted transition between 
delivery of the infant and the establishment 
of total physiological separateness. In contrast 
to the close contact pattern in cultures with 
the developed transition period, cultures ex- 
hibiting the muted transition period, primarily 
the industrial nations, were characterized by 
Tother-infant separation in the hospital, early 
Weaning, infrequent feedings, a great amount 
С crying with delayed parental response, and 
Qui trace of cribs, playpens, and other 
ie €s that maintain the baby at a distance 

often out of Sight of the mother. Because 
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close mother-infant contact, sensitivity to 
crying, child spacing, and prolonged breast- 
feeding appear together so frequently in many 
types of primitive and traditional cultures, 
Mead and Newton suggested that “there may 
be strong mechanisms in this interrelation of 
patterning” (p. 186). 


Receptor Mechanisms for the Cry 


Lorenz’s releaser formulation has inspired 
researchers to search for the underlying neural 
mechanisms (“the Holy Grail”) involved in 
stimulus recognition (Brown, 1975). Consist- 
ent with a recognition model, researchers of 
biologically significant sounds have looked for 
evidence of specialized receptor apparatuses, 
processing modes, or feature detection capabil- 
ities (Wordon & Galambos, 1972). Generally, 
the search has focused on filtering mechanisms 
or templates within the auditory processing 
system. Controversy exists over whether tem- 
plates consist of one “pontifical” cell or an 
ensemble of neurons (Wordon & Galambos, 
1972) and over whether filters are located 
centrally or peripherally in the sense organs 
(Brown, 1975). 

Although little neurological research of this 
nature has been conducted with primates and 
none with humans, there is a case reported 
in the literature (Mark & Ervin, 1970) of a 
teenager with a brain dysfunction who mur- 
dered her two sisters when they were babies. 
When tape recordings of infant crying were 
played to her, seizurelike activity was recorded 
in part of her limbic system (amygdala), and 
the teenager reported a very angry and float- 
ing sensation that lasted for several minutes 
after the cry was turned off. The limbic sys- 
tem is typically considered the “seat of the 
emotions" (Gellhorn, 1968), but our knowl- 
edge of it is limited. It is not clear whether 
the electrical responses evoked by the cries 
were determined by general state changes or 
by the specific acoustic features of the stimu- 
lus (Wordon & Galambos, 1972). Relevant 
to this issue but discussed more fully in a 
later section is McClean's (1973) suggestion 
that there may be functional areas in the 
limbic system that control emotions that guide 
species-typical behaviors. 
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Evaluation of the Model of the Cry 
as a Releaser 


From the preceding review of the literature, 
it is evident that whether one accepts the re- 
leaser model of the cry’s impact depends on 
how strictly one interprets the releaser con- 
cept. The evidence that most strongly sup- 
ports the view of the cry as a releaser in- 
cludes the finding of contagious crying in 
newborns, the suggestion that deaf parents do 
not exhibit an urgency about responding to 
crying, and the near universality of interven- 
tions that involve close physical contact. On 
the other hand, the cry signal does not appear 
to be a simple and discrete stimulus with a 
single meaning, but rather its meaning de- 
pends upon intensity cues and contextual 
factors. Cross-cultural variability in respon- 
siveness to crying and the frequent citation of 
crying as the major factor precipitating abuse 
suggest that an appeal to additional, though 
perhaps not incompatible, mechanisms might 
further increase our understanding of the 
cry’s impact on adults. 

Because the releaser formulation originated 
with observations of invertebrates and lower 
vertebrates such as insects and frogs, it has 
been suggested that the IRM sensu stricto is 
not a useful concept when applied to higher 
vertebrates and particularly to primates (Wil- 
son, 1975), Lorenz (1965), while retaining a 
fairly strict definition of the concept, has 
argued for the existence of phylogenetic ves- 
tiges of releasing systems in higher organisms 
(IRMs disintegrated by learning or higher 
development), but his application of the con- 
cept to human behavior in the popular book 
On Aggression (Lorenz, 1966) has been criti- 
cized (Piel, 1970), 

The approach of a number of comparative 
psychologists has been to suggest the aban- 
donment of the releaser concept. altogether. 
They point out, for example, that the con- 
cept underestimates the role of experience in 
the ontogeny of fixed action patterns (Lehr- 
man, 1970), that these behaviors are often 
not as fixed or stereotyped as once thought 
(Klopfer, 1974), and that the IRM is not a 

unitary mechanism that corresponds to a dis- 
crete center in the central nervous system 
(Hailman, 1970). 
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Rather than abandoning the releaser con- 
cept, others have dealt with the restrictive- | 
ness of the model by altering its definition to 
extend its meaning. Tinbergen (1948), for ex- 
ample, argued that sign stimuli are not always 
characterized by simple key features and, fur- 
thermore, that releasers need not elicit full- 
blown motor patterns but only minor elements 
of behavior or even internal reactions. Also 
in this tradition, Hinde (1974) suggested 
that there is a continuum ranging from fixed 
action patterns that are stable or relatively 
unaffected by experience to behavior patterns 
that are labile; because, in this view, most | 
behaviors are affected by both heredity and | 
environment, the question of whether a be- 
havior is innate or acquired is unanswerable. 

The tendency in higher mammals and pri- 
mates is away from elementary sign stimuli 
toward signals that do not convey single fixed 
messages but are graded in intensity and de- 
rive their meaning from both intensity cues | 
and contextual factors (Wilson, 1975). For 
this reason, a model that implicates motiva- 
tional and cognitive influences in responses 
to crying is presented in the next section. It 
should be noted, however, that this model may 
not be incompatible with the broadened social 
releaser concept used by the English etholo- 
gists (e.g, Hinde, 1974; Tinbergen, 1948), 
who, unlike the ethologists in the classical 
tradition (cf. Eibl-Eibesfeldt, 1975), tend to 
view IRMs as motivational entities. Like the 
releaser model, this cognitive-affective ap- 
proach can be described broadly as ethological 
in that it capitalizes upon Bowlby’s sugges- 
tion (1969) that the cry’s impact is releaser- 
like and that adults’ reactions to it may be 
phylogenetically adapted. 


The Cry as an Activator of Emotion 


In this section, a model of the cry as an 
activator of emotions of either an egoistic or 
an altruistic nature is examined. First, an 
analogy is drawn between the infant cry and 
the graded signaling systems of nonhuman 
Primates that function on a motivational as 
opposed to a symbolic level, Subsequently, an 
empathy model that implicates both motiva- 
tional and cognitive factors in responsiveness 
to the cry is presented. 


INFANT CRYING 


The Cry as a Graded Signal 


In the light of the previous discussions of 
the physical characteristics of the cry, infant 
crying can be likened to biologically signifi- 
cant sounds that are graded signals as opposed 
to discrete signals (Wilson, 1975). Discrete 
signals operate in an on-off manner with no 
variation in intensity or duration, whereas 
graded signals are variable and increase in 
intensity with the greater motivation of the 
signaler. In nonhuman primates, discrete sig- 
nals are found among tree-dwelling primates, 
and graded signals are common among ter- 
restrial species (Wordon & Galambos, 1972). 
Discrete signals are adapted to communicat- 
ing territorial messages over large distances 
in noisy environments; graded signals are 
adapted to communication within the troop 
at relatively close ranges. Graded signals can- 
not be easily characterized acoustically and 
are hard to dissect into specific messages. Un- 
like in human language, these graded signals 
reflect the motivational state of the signaler 
rather than state a relationship in symbolic 
terms (Brown, 1975). The meaning of a 
graded signal depends on both acoustic cues 
and nonacoustic cues such as the context in 
Which the signal is employed. However, it is 
possible that dramatic shifts in the intensity. 
of graded signals may result in shifts in qual- 
itative meaning (Wilson, 1975). 

This description of graded signals fits well 
with the discussion presented earlier concern- 
ing the signal value of the cry. It was pro- 
Posed that cries differ in intensity and that 
Intensity Cues may result in listeners’ making 
qualitative distinctions about the underlying 
ee of cries. In addition, the importance 
a context in interpreting the meaning of the 

x has been emphasized by some researchers 
EEN 1972; Muller et al, 1974; Wolff, 
fd n a that occurs immediately after a 
ee qua ikely to be interpreted as a hunger 
alis а erue that occurs 2 hours or more 
LM ај Nus both intensity cues and 

i А 
а. о the interpretation of the cry 
dd is comparative evidence that supports 
well adem of the infant cry (as 
Some adult sounds such as laughter 
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and moaning) as similar to graded primate 
signaling systems. These human utterances, 
which could be described broadly as exple- 
tives, are produced with a comparable sim- 
plicity and steadiness of upper vocal tract 
configuration and with predominant variations 
in the lower vocal tract (Bastian, 1965). Ac- 
cording to Bastian, the lower vocal tract con- 
trols pitch, timing, and intensity and is closely 
tied to the autonomic system of arousal. In 
fact, the anatomical configuration of the in- 
fant vocal tract is said to be more similar to 
that of a nonhuman primate than to that of an 
adult human (Lieberman, Harris, Wolff, & 
Russell, 1972). The infant, like the nonhu- 

man primate, does not have a pharyngeal re- 

gion that can vary in cross-sectional area. 

Lieberman (1973), noting the similarity in 

the vocal tracts of infants, adult chimpanzees, 

and the prehistoric ancestors of Homo safiens, 

placed great emphasis upon the evolution of 

the anatomy of the vocal tract as an important 

factor in the evolutionary development of 

human language. 

That the emission of the infant cry is 
postulated to be under the control of the hypo- 
thalamic and the interrelated limbic system 
(Chauchard, 1963; Torda, 1976) also rein- 
forces its similarity with nonhuman primate 
auditory signaling systems. Robinson (1967) 
has demonstrated that every type of vocaliza- 
tion common to macaques can be elicited by 
stimulating parts of the limbic system. In 
fact, removal of areas of the nonhuman pri- 
mate brain that are homologous to the speech 
centers in human beings does not affect their 
vocalization (Myers, 1968). Both the facial 
and vocal expressions of these monkeys seem 
not to be under voluntary control, but are 
primarily controlled by portions of the brain 
subserving emotional rather than volitional 
functions. Myer further hypothesized that, 
based on lesion studies with humans, there is 
a striking duality in the mode of social com- 
munication and its underlying mechanisms in 
humans. The involuntary expression of affect 
(facial expressions and vocalizations) appears 
to be controlled, as in nonhuman primates, by 
the deep structures constituting the limbic 
system, whereas the voluntary uses of these 
expressions are under cortical control. 
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Thus, in this framework, the cry of the 
newborn is characterized as an involuntary 
reflex action to distress that is at first under 
the control of the hypothalamic/limbic sys- 
tem and only later comes under cortical con- 
trol. Bell and Ainsworth's (1972) hypothesis 
that crying is at first reflexive and only later 
becomes instrumental may reflect increasing 
cortical control over the emission of the cry 
in the second half of the first year. Also in 
this formulation, the cry is regarded as a 
graded rather than a discrete signal that 
increases in intensity with the degree of dis- 
comfort felt by the infant. Any attempt to 
interpret the cause of the distress would make 
use of both acoustic cues of intensity (e.g., 
dysphonation, hyperphonation, and prolonga- 
tion of the signal) and contextual factors. 
This suggests that the meaning of the cry 
for the listener could be fruitfully studied 
using a dimensional approach as opposed to 
the typological approach adopted in the past 
to study the signal value of cries. 

If the infant cry is a graded signal, then 
the manner of its reception by adults may 
likewise bear some similarity to the reception 
of graded signals by nonhuman primates. 
Effective reception of graded signals is predi- 
cated on a modification of the emotional 
disposition of the listener (Bastian, 1965). 
Thus, communication with graded signals, 
both their emission and their reception, can 
be said to be more on a motivational or emo- 
tional basis than on a symbolic one (Brown, 
1975). Graded signals do not convey impar- 
tial messages but have their effect by influ- 
encing the motivational state of the listener. 

The emission of the cry as well as its com- 
pelling effect on the listener may, then, be 
under the control of the limbic/hypothalamic 
system and its modulation of autonomic 
arousal. In spite of the prominence of cortical 
control in humans, it is also believed that a 
considerable role is played by the limbic/ 
hypothalamic system in the reception and 
execution of elementary types of affective 
speech and sound making (Chauchard, 1963). 
Similarly, the triune concept of the brain 
proposed by McClean (1973) emphasizes the 
intermeshing of cortical with subcortical 
functions. The oldest part of the brain com- 

prising the upper brain stem is our inheri- 
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tance from reptiles and controls functions 
like respiration and perhaps also stereotyped 
behavior patterns such as those released by 
sign stimuli. At the next level, the old mam- 
malian brain includes the hypothalamic/ 
limbic system and controls species-typical 
behavior such as agonistic, affiliative, and 
parental behaviors (Altmann, 1966). Mc- 
Clean (1973) presented evidence that the old 
mammalian brain consists of functional areas 
that guide behavior with respect to the two 
life principles—self-preservation and preser- 
vation of the species. The old mammalian 
brain in humans has a similar structure to 
that found in animals and, he argues, con- 
tinues to function at an animalistic level in 
humans with its contribution to the elabora- 
tion of emotional feelings that guide behav- 
ior. At the highest level, the new mammalian 
brain or neocortex functions in skilled, dis- 
criminatory, and exploratory behaviors. 

McClean (1973) hypothesized that the 
interconnections of the limbic system and the 
neocortex represent an evolutionary advance 
in primates that makes possible empathy in 
terms of both shared affect and a cognitive 
understanding of another’s feelings. The com- 
pelling effect of the cry on the listener may 
be partly due to the fact that it produces an 
isomorphic response of distress in the lis- 
tener that is mediated by the limbic system. 
Before discussing the altruistic basis for re- 
sponses to the cry, I first examine claims that 
the motivation to respond to the cry 38 
solely egoistic or self-serving. 


The Egoistic Basis for Response to the Cry 


In arguments advanced against the ге 
leaser model, researchers in the learning 
tradition (Moss & Robson, Note 2) have sug- 
gested that parents respond to the cries of 
their infants for the same reason that they 
respond to any noxious sound, that is, 10 
reduce aversive stimulation: This view rests 
on principles of negative reinforcement 25 
well as on psychophysical assumptions of the 
relationship between the quality of the audi- 
tory experience and the physical character- 
istics of the sound, Whereas the release! 
model emphasizes the uniqueness of а раг" 
ticular stimulus and its receptor mechanisms, | 


psychophysical approach stresses general 
perties of the auditory processing system. 
though no formal psychophysical study 
been conducted with the cry sound, Ost- 
4 (1963, 1972, 1973) has presented many 
lations as to what physical character- 
s of the sound would make it a particu- 
penetrating noise. He compared the cry 
Siren that compresses acoustical energy 
a very sensitive region of the auditory 
) m. He reported that the fundamental 

uency of most cries is approximately 500 
Hz (ranging from 400 to 600 Hz) with heav- 
st reinforcement at 1000 to 2000 Hz, where 
le auditory threshold is lowest. In addition, 
claimed that the infant cry is one of the 
lest sounds human beings ever make, 
it an average level of 83 to 85 dB at 10 
thes (2.54 cm) from the mouth. According 
stwald, this sound level is 20 dB louder 
normal adult speech and is equivalent 
he noise of an unmuffled truck. Ostwald's 
63) conceptualization of the basis for 
ntal response to the cry is similar to that 
vanced by learning theorists: 


Can appreciate why the parent must interfere 
the baby's cry: this sound is too annoying to 
olerated beyond a short period of time, par- 
ly at close range. Thus, the cry cries to be 
d off! The listener who cannot escape usually 
Ces the noise by soothing whatever baby needs 
lon it. (p, 46) 


ychoacousticians (Kryter, 1970) have 
hed, however, that research on the annoy- 
value of sounds has little application to 
ds that convey emotional meanings. The 
5 of Sounds such as the cry that carry 
mation about their sources cannot be 
titatively related to their physical char- 
tics and are therefore rejected from the 
t of Perceived noisiness, However, 
basic attributes related to perceived 
ES Such as impulsiveness and spectrum 
4 Ac level, may set fundamental lim- 
e tolerability of the noise, but emo- 
i. Meaning can greatly alter tolerance 
n these limits, 
Hough the Psychophysical approach to 
Planation of the perceptual mecha- 
nderlying the impact of the cry may 
(0те validity, the greatest weakness in 
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this approach is that it accounts best for 
escape from or avoidance of the crying child 
and less well for approaches to remove the 
source of distress. The motivation is egoistic 
or self-serving in that the parent is motivated 
to reduce his or her own distress rather than 
the baby’s. Hoffman’s (1975) theory of em- 
pathic distress as the basis for altruism pro- 
vides an alternative view of the motivation 
to respond to the cry. 


The Altruistic Basis for Response to the Cry 


Many conceptions of empathy have empha- 
sized either the affective aspect of sharing 
the feelings of others or the cognitive aware- 
ness (recognition) of another’s plight 
(Deutsch & Madle, 1975), whereas Hoffman 
(1975) has presented a formulation that 
synthesizes the cognitive and affective com- 
ponents in a developmental perspective. At 
the basis of all altruism is the response of 
empathic distress or "the involuntary, force- 
ful experiencing of another person’s painful 
emotional state” (p. 613). The experience of 
empathic distress is necessarily discomforting 
and unpleasant. As mentioned earlier, on the 
basis of Simner’s (1971) research, Hoffman 
proposed that empathic distress may exist in 
a rudimentary form at birth or may come 
about through classical conditioning. 

Integrating the research on cognitive de- 
velopment and on helping behavior, Hoffman 
proposed three stages in the development of 
altruistic motivation. Adopting Schacter and 
Singer’s (1962) formulation that the labeling 
of one’s emotion or state of arousal is deter- 
mined by one’s cognitions of the situation, 
Hoffman suggested that one’s cognitive sense 
of the other likewise determines how one 
reacts to the distress response of another. 
Through the development of a cognitive sense 
of the other, the primitive empathic distress 
response develops in three stages into a more 
reciprocal concern for the victim, which Hoff- 
man called sympathetic distress. At first the 
child is unable to differentiate between his 
or her own distress and that of another. 
With the development of the concept of the 
self as distinct from others, the child’s con- 
cern for his or her own discomfort is trans- 
formed into concern for the other’s distress, 


206 


but he or she lacks understanding of the 
cause or remedy of another's distress. At 
the next stage, the child's attempt to allevi- 
ate another's distress is less egocentric and is 
guided by corrective feedback from immediate 
situational cues. At the third and highest 
level, the child can respond not only to situ- 
ation-specific cues of distress but also to a 
general representation of the welfare of the 
victim regardless of the victim's momentary 
state. 

Hoffman further presented some evidence 
to support the following relationships between 
altruistic motives and action: (a) Distress 
cues from another trigger the sympathetic 
distress response in the observer; (b) the 
observer's initial tendency is to act; (c) the 
intensity of the affect and the speed of re- 
sponse should increase with the number of 
pain cues; (d) if the observer does not act, 
the observer will continue to experience sym- 
pathetic distress or cognitively restructure 
the situation to justify inaction, 

It is instructive to relate Hoffman's formu- 
lation to infant crying. First, that the cry is 
often described as a noxious stimulus accords 
with the experience of empathic distress as 
unpleasant. Second, the description of the 
cry as a compelling stimulus reflects the 
compulsion to act by observers of another's 
distress, Third, that the speed of response is 
increased by the salience of the pain cues has 
been borne out by Wolff’s (1969) data on 
latency of maternal response to hunger and 
pain cries. That the intensity of affect should 
increase with the intensity of pain cues would 
also be consistent with the concept of the 
cry as a graded signal, 

If, as Hoffman proposed, failure to act 
results in the observer cognitively restruc- 
turing the situation to justify inaction, we 
may have a clue as to how failures to act 
and even vengeful acts in response to the 
cry could come about. At Hoffman's third 
level of development of altruistic motivation, 
the observer's representation of the general 
welfare of the distressed person may override 
the specific situational cues associated with 
distress. Thus, if one's child-rearing philos- 
ophy is that one should not accede to the in- 
fant's distress signals, one might justify inac- 
tion under the rubric of teaching the child 
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that he or she must not manipulate the par- 
ent. Similarly, conceptions of the relative 
vulnerability of infants at various ages may 
lead parents to be less responsive to the dis- 
tress signals of older infants than to those of | 
younger infants. If, despite this cognitive 
restructuring, the parents continue to experi- | 
ence involuntary distress, they may try to 
escape by increasing the distance between 
themselves and the crying child or, for ex- 
ample, by closing doors to dampen the sound, 
It may be that continued exposure to the’ 
sounds with the attendant involuntary ex- 
periencing of a high level of emotional 
arousal in the parent tips the parent’s moti- 
vation from altruistic to egoistic, that is, the 
motivation is no longer to alleviate the in- 
fant's distress but to alleviate the parent's 
distress at having to listen to the sound of 
crying for prolonged periods of time. This. 
contrasts with the altruistic basis for helping 
behavior in which the motivation to respond 
is aroused by another's distress rather than 
by one's own, in which the major goal is to 
help the other rather than one's self, and in 
which gratification is contingent on reducing 
another's rather than one's own distress 
(Hoffman, 1975). 

The notion that there may be an optimal 
range of distress cues has been hypothesized 
by Hoffman (Note 3) and has been referred 
to as the "critical toxicity" problem by Tomp- 
kins (1963) in his discussion of emotions 
aroused in listeners by infant cries. Distress 
cues from another must be sufficient to acti- 
vate distress in the observer but must not be 
so disturbing as to elicit avoidance of or ag- 
gression toward the victim. Excessive and 
prolonged crying, whether due to constitu- 
tional factors in the infant or parental man- 
agement techniques, may exceed limits of 
tolerability and overly tax parents’ abilities 
to withstand continuing high levels of em 
tional arousal. | 

The model of the cry as an activator of 
altruistic or egoistic motives in the listener 
may shed some light on ideologies and E 
tions in our culture with regard to the social- 
ization of crying in infancy. Tompkins (1963 
argued on the basis of clinical observatio? 
that most people develop articulate philoso- 
phies about crying and that, generally speak 
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ing, there is а polarization of attitudes such 

^ that one is either for or against the crying 
child. These differing attitudes lead to a 
polarization of action as well: Either one ig- 
nores and thereby punishes the crying child 
or one tries to soothe the child by removing 
the source of the distress. In the latter case 
the parent is motivated to action because he 
Lor she experiences sympathetic distress. In 
the former case, however, the cry is seen as 
an attempt on the part of the infant to ma- 
nipulate the parent. The response of the par- 
ent is characterized by irritation, anger, and 
annoyance, By ignoring the cry, these parents 
hope to reduce the child's dependency and 
increase his or her self-reliance. 

The belief described by Tompkins that 
babies could be spoiled by responses to their 
cries was prevalent in child-care advice given 
to American parents in the first half of this 
[century (Bell & Ainsworth, 1972). To the 
Contrary, Bell and Ainsworth have pre- 
Sented evidence that by not responding 
promptly to the cries of infants in the first 
6 months, parents actually increase the likeli- 
hood that their infants will cry more fre- 
Quently and for longer periods of time in the 
Second half of the first year. By this time, a 
vicious spiral has been set up whereby more 
crying leads to more ignoring and more ignor- 
ing leads to more crying. In the context of 
the empathy model proposed, these findings 

piter with Tompkins' observations suggest 
E. one's ideology or cognitive representa- 
B. of the crying infant's well-being may 

E to a caregiving pattern that routinely 

Poses parents to excessive crying. Frequent 

®xposure to excessive crying would be ex- 
E to activate parents' egoistic motives 
па result in anger toward and avoidance of 
em Source of the sound, thereby completing 
я the vicious spiral. Reports that 

Es us parents hold extreme views about 

"Pollock and independence training (Steele & 
ла 1968) also suggest that their man- 
rois techniques may foster the excessive 

8 often given as the reason for abuse. 

and 25 the question can be reversed, 
crying * can ask instead whether prolonged 

well gd foster the punitive ideology as 

| Bell Gale е behavioral unresponsiveness. 
; 1971) argued that much of the 
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research based on the parent-effect model can 
be reinterpreted as demonstrating the ef- 
fects of the child’s behavior on the parent. 
Indeed, recent reports (Lamb, Note 4) sug- 
gest that babies who are abused may be es- 
pecially difficult constitutionally and may 
therefore precipitate their own abuse. In this 
regard, it is interesting to consider Bennett’s 
(1971) report that nurses may apply either 
the altruistic or the punitive ideology de- 
scribed by Tompkins depending on the char- 
acteristics of the particular babies in their 
care. The cries of one very irritable and diffi- 
cult newborn were viewed as exploitive by 
nurses, whereas those of an easier baby elic- 
ited sympathy and were regarded as legiti- 
mate demands, Although both babies were 
easily soothed by the nurses’ interventions, 

quieting was seen as due to the infant’s being 

“spoiled” in the one case and “socially re- 

sponsive” in the other. During the first two 

weeks after delivery, the frequent cries of the 

spoiled baby were often left unattended, but 

those of the socially responsive baby were 

always attended to promptly. 

Although extremes of irritability may be 
the cause of caregiver unresponsiveness in 
individual cases, the wide variations between 
cultures in amounts of crying (Konner, 1972; 
Mead & Newton, 1967) suggest the impor- 
tance of other factors. Prompt responding to 
the cry in cultures with a developed transi- 
tion period in which breast-feeding is the 
norm may be partly mediated by the effect 
of the milk letdown reflex on the mother 
(Mead & Newton, 1967). Greater intimacy 
of contact (Konner, 1972) in these cultures 
also provides the opportunity for caregivers 
to anticipate the needs of babies by inter- 
preting more subtle proximal signals of dis- 
comfort. In addition, the ecological setting of 
many primitive and traditional cultures en- 
sures the availability of many caregivers 
(Marvin, Van Devender, Iwanaga, LeVine, & 
LeVine, 1977). Where several caregivers in 
addition to the mother are concurrently 
available, prompt attention to distress cues is 
more feasible than in the typically Western 
nuclear-family setting with its competing 
demands on parents. With respect to more 
cognitive factors, one might hypothesize that 
in primitive and traditional cultures concep- 
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tions of the vulnerability of infants are 
greater than in industrial nations in which 
infant mortality rates are lower. Cross-cul- 
tural variations in responsiveness may be in- 
fluenced by differing conceptions of the well- 
being of distressed infants and may result in 
greater or lesser attention to specific situa- 
tional cues of distress. That caregivers in a 
primitive culture interpret the slightest cry 
as an emergency signal (Devore & Konner, 
1974) is consistent with this hypothesis. 

The precise interrelationships among em- 
pathic distress, parental behavior, motives, 
ideologies, and infant outcomes require fur- 
ther exploration. In reviewing the available 
empirical evidence, Hoffman's empathy model 
has provided useful hypotheses. Future re- 
search within this theoretical framework 
promises to further our understanding of both 
the compelling effect of the cry and the 
nature of the outcome for the distressed in- 
fant. In addition, it may be useful to sup- 
plement this framework with a consideration 
of the role of nature and nurture in the 
ontogeny of parental behavior. Given that 
from an evolutionary perspective parental 
behavior is not strictly altruistic because it 
contributes to the parent's reproductive suc- 
cess (Alexander, 1974), one might expect 
that there are additional mechanisms under- 
lying the display of parental behavior in gen- 
eral and responsiveness to crying in particu- 
lar. In the next brief section, the role of 
exposure to infants in the ontogeny of pa- 
rental behavior is discussed with reference to. 
possible physiological factors that may pre- 
dispose women to find infantile cues more 
attractive than men find them. 


Factors That Influence the Ontogeny of 
Parental Behavior 


Because care of the young is crucial for 
genome survival, the question explored in 
this section is whether there may be mecha- 
nisms that foster empathic responsiveness 
toward infants in particular in addition to 
the processes outlined by Hoffman (1975) that 
contribute to a general capacity for empathy. 
A variety of mechanisms can be accommo- 
dated within the framework of Hoffman's 
model because the theory does not preclude 
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the involvement of constitutional as well as 
experiential factors in the development of 
empathy. The interactionist approach to 
the study of development in behavioral bi- 
ology also allows for multiple causative fac- 
tors and has been found useful in organizing 
the presentation to follow. In this frame- 
work, species-typical behavior is seen as de- 
veloping toward a predictable end product. 
regardless of whether experiential or es 
tutional factors are involved in its develop. | 
ment (Lehrman, 1970; Wilson, 1975). 

Although evidence has been cited that 
women are generally more empathic than men 
when affective indices are used (Hoffman & 
Levine, 1976), the question addressed here 
is whether altruistic behavior toward the 
young in particular may be facilitated by 
hormonal action in women. Several findings 
in the human literature are consistent with 
the comparative literature in suggesting that 
hormonal events during the prenatal period 
and at parturition may enhance the attrac- 
tiveness of infantile stimulation for women.| 
In humans, there may be further effects of 
hormonal changes at puberty on the attrac- 
tiveness of infantile cues for women. 

The inductive or irreversible influences of | 
circulating hormones on the sexual differenti- 
ation of mammalian fetuses have been docu- 
mented for the human species as well as for 
animals (Money & Erhardt, 1972). Regard- 
less of the genetic sex of the fetus, exposure 
to male sex hormones (androgens) at a criti- 
cal period results in development according 
to the masculine anatomical pattern. That 
these hormonal influences are not merely ana- 
tomical is suggested by the research of Money 
and Erhardt (1972); in a study of fetally 
androgenized girls, they found that as pre 
adolescents these girls exhibited tomboyish 
behaviors. When compared with normal girls 
of their age, the fetally androgenized girls 
preferred active sports to passive activities, 
Such as doll play; they preferred boys to girls 
as playmates; they preferred to wear func 
tional clothing, for example, slacks rather 
than dresses; and they were more caret 
than marriage-oriented in their expectation) 
for their futures. These findings were subse | 
quently replicated with a different sample 
(Erhardt & Baker, 1974). Money and Е" 
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hardt speculated that prenatal hormonal ex- 
posure to androgens may act to raise the 
threshold of sensitivity to stimuli associated 
- with the traditional nurturant caregiving role. 
Possible influences of hormonal changes at 
puberty on the attractiveness of infantile cues 
are suggested by two studies. Fullard and 
Reiling (1976), in a developmental investi- 
gation of Lorenz's *babyness," found that 
children from about the age of 8 until the age 
. 0f 12 preferred pictures of adults to pictures 
| of infants. However, between the ages of 12 
and 14, the preference of females shifted to 
- infants. Males’ preferences for infants did 
- not significantly exceed chance level until 
_ adulthood (older than 18 years). Similarly, 
Huckstedt (1965) found that females pre- 
ferred a drawing of a supernormal baby 
(with exaggerated infantile features) to that 
of a normal baby at 10 to 13 years of age 
but that this preference was not reliable for 
_ males until 18 to 21 years of age. 
| These findings suggest that prenatal hor- 
monal events and hormonal changes at pub- 
erty may operate to bias females toward the 
eventual adoption of the caregiving role. 
However, it is difficult to rule out the effects 
Of socialization pressures, as Quadnago and 
| his colleagues (Quadnago, Briscoe, & Quad- 
Паво, 1977) have pointed out, or the effects 
of differential exposure to infants that fe- 
males may gain prior to adulthood as a 
result of societal pressures. 

Hormonal changes associated with preg- 
m and parturition have been held respon- 
а. mn the induction or rapid onset of care- 

Du behavior in some mammals, as for ex- 
(Richard Boats (Klopfer, 1971), in hamsters 
(Мој. 1 1966), and in the laboratory rat 
Brun 74: Rosenblatt, 1970). These hor- 
I tues anges appear to sensitize the mother 
iiy it anting from the young. Simi- 
5 E as been argued that the hormonal 
es ован parturition may рго- 
of care dm period for the establishment 
910). ЊЕ in human mothers as well. Salk 
DON example, found. that prolonged 
of mother ( І to 7 days beginning at birth) 
term inis 5 from their premature and full- 
normal Er resulted in deviations from the 
side (r Pattern of holding infants on the left 
~ egardless of the handedness of moth- 


209 


ers). Klaus and Kennell (1970) also found 
persistent differences in caregiving behavior 
between mothers who were allowed early con- 
tact with their premature babies and mothers 
who were only allowed late contact, In more 
recent studies Klaus has extended his studies 
to mothers of full-term infants allowed ex- 
tended or traditional contact with their ba- 
bies in the first 3 days after delivery. Several 
findings particularly relevant to responsive- 
ness to infant crying were reported. One 
month after hospital discharge, mothers in 
the extended-contact group reported in an 
interview that they picked up their babies 
when they cried more often than did mothers 
in the traditional-contact group (Klaus et 
al, 1972). Extended-contact mothers were 
also observed to soothe their infants more 
often during a physical examination 1 month 
after hospital discharge (Kennell et al., 
1974). The effects of extended contact on 
responsiveness to crying continued to be evi- 
dent during a follow-up 11 months later. The 
mothers in the extended-contact group spent 
more time soothing their infants when they 
cried during a physical examination at 1 year 
of age (Kennell et al., 1974). Klaus and his 
colleagues argued that early separation affects 
the mother's commitment and attachment to 
her infant, her development of a sense of 
caregiving abilities, and her ability to estab- 
lish an efficient caregiving regimen (Barnett, 
Leiderman, Grobstein, & Klaus, 1970). 
Klaus's findings may have important implica- 
tions for epidemiological studies of child 
abuse, as early mother-infant separation 
along with other adverse factors is reportedly 
more common among abused children than 
among unabused siblings (Lynch, 1975). 
Whereas hormonal induction in the female 
rat may be necessary at the outset for the 
rapid initiation of caregiving behavior, 
maintenance of caregiving is less dependent 
on hormones than on stimulation emanating 
from the young (Moltz, 1974; Rosenblatt, 
1970). The hormonal conditions associated 
with lactation are not necessary to maintain 
caregiving behavior, but there is evidence 
that the presence of prolactin in lactating 
rats may play a facilitative role by reducing 
the mother’s physiological and behavioral 
responsiveness to stress (Thoman, Conner, & 
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Levine, 1970). The influence of prolactin on 
the behavior of human mothers is not known. 
It is interesting to note, however, that in one 
study (Bernal, 1972), breast-feeding mothers 
responded to crying more quickly than did 
bottle-feeding mothers and were more likely 
to respond with feeding. Prompt responding 
to cries by the breast-feeding mother may be 
mediated by the effect of the cry on the 
letdown reflex, as suggested by Mead and 
Newton (1967), and/or by the effect of the 
cry on changes in the temperature of the 
lactating breast (Vuorenkoski, Wasz-Hockert, 
Kiovisto, & Lind, 1969). Alternatively, if 
prolactin makes human as well as rodent 
mothers less prone to stress, the threshold of 
breast-feeding mothers for the activation of 
egoistic motives may be higher than that of 
bottle-feeding mothers. 

Turning now to the nonhormonal bases for 
caregiving behavior, the role of exposure to 
the young has been emphasized in reviews of 
the ontogeny of caregiving behavior in mam- 
mals (Moltz, 1971; Noirot, 1972; Rosen- 
blatt, 1970). For example, enforced exposure 
to the young elicits appropriate caregiving 
such as nest building and retrieving in virgin 
female and male rats. The length of time 
required to induce caregiving behavior by 
exposure to pups (about 6 days), however, 
far exceeds limits that would be adaptive in 
the natural context, as pups would die with- 
out appropriate care for that period of time. 
It has been concluded from these studies that 
caregiving behavior is characteristic of both 
sexes in the species studied. Caregiving is 
not dependent on physiological changes for 
its appearance, but the hormonal changes 
associated with parturition in the female 
reduce the duration of exposure to the young 
that is required to effect a change from at- 
tacking or avoidance to caregiving behavior. 
The effects of the mother’s parity on care- 
giving behavior provide further evidence of 
the effects of exposure to the young. When 
the natural sequence of hormonal changes at 
parturition is disrupted, multiparous rats 
provide adequate care, whereas 50% of 
primiparous rats do not (Moltz, 1971), Thus, 
prior exposure to pup stimuli can compensate 
for disruptions in mechanisms under hor- 
monal control. 
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Harper (1971) has reviewed studies that 
suggest that exposure to the young may sensi- 
tize higher as well as lower mammals to care- 
eliciting cues. A dramatic example of this 
phenomenon was reported by Harlow and 
his colleagues (Harlow, Harlow, Dodsworth, 
& Arling, 1966), Although their isolation- 
reared rhesus mothers were extremely abusive 
with their first infants, they performed ade- 
quately as mothers with their second infants 
despite the traumatic nature of their earlier 
experiences, Salk (1970) also noted the com- | 
pensatory effects of parity for human moth- 
ers. He found that the effect of mother-in- 
fant separation on which side of the body the 
infant was held by the mother was overridden 
if the mother had previously had a child from 
which she was not separated in the early 
postpartum period. That past experience in 
rearing infants influences the attraction of 
adult rhesus females to neonates was reported 
by Sackett (1970). Using a “self-selection 
circus” that allowed the tested animal to ap- 
proach other animals of varying ages and 
sexes, he found that multiparous females spent | 
the most time with neonates, followed by 
primiparous females who, in turn, spent more 
time with neonates than did nulliparous fe- 
males. 

The parity of human mothers has also been 
associated with responsiveness to crying in à 
study by Bernal (1972). Mothers of second 
borns were less likely to ignore crying and - 
were more likely to respond promptly than 
were mothers of firstborns. It is possible that 
exposure to infants increases the tolerability 
of the sound of crying for parents and ге- 
duces the likelihood of avoidance responses. 
One could also argue that increased respon- 
siveness to second borns resulted from al- 
tered conceptions of the needs of infants M 
the newborn period. However, a cognitive 
interpretation of these results seems less 
likely in the light of Bernal’s finding that 
the behavior of multiparas contradicted their 
intentions as stated in prenatal interviews. 
Although 85% of multiparas intended to tê- | 
spond to crying only after 10 minutes, 70% 
of them actually responded to crying within 
10 minutes (for primiparas these percentages 
were 50% and 62%, respectively). Evidence 
that contingent responsiveness increases Wl 
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parity is, however, equivocal. Another study 
(Cohen & Beckwith, 1977) found a reduction 
in contingent responsiveness with parity, 
which the investigators attributed to the 
competing demands of the older sibling. A 
study in which the spacing between infants 
and their older siblings is controlled may 
resolve the apparent discrepancy in reported 
relationships between parity and responsive- 
ness to crying. 

In summary, the studies reviewed suggest 
that a major factor in the ontogeny of mam- 
malian parental behavior is exposure to the 
young and its enhancement of the attractive- 
ness of the young for adult male and female 
species members. Females, however, may have 
somewhat of an advantage over males in 
terms of hormonal mechanisms that sensitize 
them to infantile cues. The net effect of the 
interaction of organismic and experiential 
factors may account for the greater partici- 
pation of females in infant care among hu- 
mans as well as among lower mammals. 
Nevertheless, among human populations, ob- 
served sex differences would be expected to 
vary across cultures depending on the oppor- 
tunities provided within each culture for 
males to be exposed to the eliciting effect of 
infantile stimulation, The lesser sensitivity to 
crying in Western than in primitive and tra- 
ditional cultures may partly be a function of 
à rearing pattern that not only reduces the 
intimacy of contact between parents and in- 
fants in general but also traditionally pro- 
hibits extended contact early in the post- 
partum period when the mother’s sensitivity 
Hn care-eliciting cues may be greatest. In ad- 
dition, increases in the attractiveness of in- 
fants for pubertal girls may indicate a 
heightened susceptibility to observational 
learning in the prereproductive period for 
Which opportunities are often lacking in 
ae cultures in which infant care usually 
ders ace in the privacy of the small nu- 
ao, amily home. Altruistic behavior toward 
Sn infants must then be viewed in the 
e 1С context of ontogenetic processes that 

Sitize adults to infantile cues and en- 


Ка the attractiveness of the young for 
m, 
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Summary 


This article began with the common obser- 
vation that the infant cry is a compelling 
sound that elicits actions of either a nurturant 
or a nonnurturant (even homicidal) nature 
from adults. Two models of the mechanisms 
by which the cry has its powerful impact 
were examined. 

The first model examined was that of the 
cry as a releaser of parental behavior. In this 
theoretical framework, the cry is viewed as 
a distress signal that originally evolved, along 
with other attachment behaviors, to promote 
proximity between infants and their caregiv- 
ers. Close proximity between caregiver and 
infant functioned to protect the infant from 
predators in the dangerous environment in 
which the species evolved. To ensure genome 
survival, reciprocal mechanisms evolved in 
caregivers to promote immediate and appro- 
priate responses to the cry signal. In this 
context, it has been hypothesized that the 
cry may act as a releaser—a key stimulus 
that acts figuratively to release a fixed motor 
response from the receiver. The recognition 
of the signal and the production of the motor 
response are said to be under the control of 
a hypothesized neural filtering system re- 
ferred to as an innate releasing mechanism. 
The model of the cry as a releaser was ex- 
amined in the light of the available literature 
on the physical characteristics of the infant 
cry and its effectiveness in eliciting parental 
behavior, It was found that the key features 
of the cry stimulus that have been identified 
relate more to stimulus intensity than to 
stimulus recognition. In addition, the cry was 
not found to be invariably effective in elicit- 
ing caregiving behavior, particularly in West- 
ern cultures. This analysis suggested that the 
data available, though not entirely compati- 
ble with the classical view of IRMs, may be 
compatible with the broadened definition of 
releasers as motivational entities adopted by 
some modern ethologists. 

A model of the cry as an activator of emo- 
tions was then examined. In this formulation, 
the cry is likened to the graded signals em- 
ployed for communication by some non- 
human primates. The cry is viewed as an 
involuntary reflex action to distress that in- 
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creases in intensity with the greater motiva- 
tion or discomfort felt by the infant. The 
response to the cry likewise consists of an 
isomorphic response of distress in the ob- 
server. Egoistic or self-serving motives for 
responses to the cry, that is, to reduce the 
parent’s own distress, may account for at- 
tempts to avoid or escape from the crying 
infant. On the other hand, altruistic motives, 
that is, to reduce the baby's distress, may 
underlie parental responses aimed at remov- 
ing the source of the infant's discomfort. 
Within the framework of Hoffman's em- 
pathy model (1975), altruistic behavior 
toward persons in distress is viewed as a 
joint product of the capacity for shared af- 
feet and the development of a reciprocal con- 
cern for others, Although the triggering of 
empathic distress usually results in the per- 
formance of an altruistic act, exposure to 
excessive crying may transform the parent’s 
motivation from altruistic to egoistic, that is, 
to a concern to alleviate the parent’s own 
distress rather than the infant’s. The roles of 
constitutional irritability in the infant as well 
as parental child-rearing philosophies and 
management techniques were discussed in 
relation to the activation of egoistic motives 
and behavior toward crying infants. Egoistic 
motives may occur more frequently in West- 
ern than in primitive and traditional cultures 
because of a child-rearing pattern that pro- 
motes excessive crying and thereby overly 
taxes parents’ abilities to withstand continu- 
ing high levels of emotional arousal. 
Because caregiving behavior is important 
for genome survival, it was argued that the 
explanatory power of the general empathy 
model proposed could be further increased by 
taking into consideration specific ontoge- 
netic processes that foster altruistic behavior 
toward infants in mammals. A review of the 
human and comparative literature suggested 
that the ontogeny of mammalian parental 
behavior may be under the control of two 
separate but interacting mechanisms. Where- 
as hormonal events sensitize female species 
members to infantile cues, exposure to the 
young has comparable effects for males and 
females. Sensitization, whether due to hor- 
monal or to experiential factors, enhances the 
attractiveness of the young for adult species 
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members. Altruistic behavior toward crying 
infants must be viewed within the context of 
these ontogenetic processes that have evolved 
to ensure genome survival. 
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The person with whom one interacts in- 
fluences what one does. This interdependency 
of behavior has psychological, as well as 
Statistical, implications. Unfortunately, re- 
searchers have either ignored the interde- 
pendency or tried to eliminate it, since certain 
Statistical procedures do not allow for it. 
To eliminate the dependency, several ap- 
dieses can be taken. For example, studies 
ir "o often use confederates to provide 
arn ard against which subjects’ behavior 
measured. Not only is this strategy 
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s of sex (or any other dichotomous 
behavioral responses observed in 
tion used, the dyads are of three 


effect: 


ustra! 


and boy-boy. Main effects of sex of subject and of sex 
ffects are estimated and tested, using a generalization 
ch. Intragroup correlations between paired re- 


ures of analysis are discussed 
ended to interacting groups larger than 
dichotomous (e.g., boys and girls 
there are three intragroup correla- 


difficult to implement in studies of children but 
it is also not always clear that results obtained 
using confederates generalize to groups, all of 
whose members are naive. Alternatively, the 
dependency of partners’ scores can be eli- 
minated by devising scores that apply only to 
a group or an interacting pair. These scores 
are dyadic in nature, that is, they can only be 
attributed to the dyad and not to the in- 
dividuals within the dyad (eg. eye-to-eye 
contact or tug-of-war scores). With scores that 
can be given to each individual in the dyad 
separately, one can avoid the issue of depen- 
dency by summing individual scores across 
partners and using only this pair score. In all 
three cases—using confederates, using scores 
re, or using pair-average 


dyadic in natu 
scores—the statistical problems created by 
dependency are avoided. 

Although some questions can be appro- 


priately answered by such scores, there are 
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many important kinds of questions that cannot 
be answered without individual measures for 
both members of a dyad (or an interactive 
group). First, the nature of the mutual 
dependency of subject and partner behaviors 
itself varies across groups (Jacklin & Maccoby, 
1978), and this fact is of course lost in 
studies with confererates or with pair or dyad 
scores. Second, effects of the partner as 
stimulus may be lost. For example, children 
have been shown to behave differently in the 
presence of a boy as opposed to a girl. In the 
presence of a girl, both boys and girls are more 
likely to offer toys and behave in other posi- 
tive ways. However, in the presence of a boy, 
both boys and girls are more likely to with- 
draw (see Jacklin & Maccoby, 1978). Use 
of a score for each pair would only allow overall 
comparisons of boy-boy, girl-girl and mixed- 
sex pairs and would mask these sex-of-partner 
effects. 

Thus, a variety of questions can be asked 
about dyadic effects, the answers to which 
depend on observations of each member of a 
dyad (or more generally, on observations of 
each member of an interactive group). The 
examples given here are questions about sex 
differences in children. Isolating the contribu- 
tions of sex of subject, sex of partner, and the 
interaction of these two from one another has 
been found necessary to an understanding of 
children's social behavior (see Jacklin & 
Maccoby, 1978). However, the same problem 
exists for any characteristic of the subject or 
partner that is being investigated. 

Several studies have looked at the nature of 
interpersonal behavior in young children when 
some characteristic of the subjects or partners 
is of interest. Statistical approaches to the 
dependency problem have varied. Muste and 
Sharpe (1947), for example, avoided the issue 
by presenting only descriptive material. A 
number of researchers have used pair scores 
(Eckerman, Whatley, & Kutz, 1975; Nadel- 
man & Schiffler, Note 1; Ross & Hay, Note 2). 
Others seem to have assumed (considering the 
statistical tests that they used) that partners" 
behaviors are not interrelated (Doyle, 1975; 
Hartup, 1974; Langlois, Gottfried, & Seay, 
1973; Roff & Roff, 1940; Langlois & Downs, 

Note 3). The statistical procedures commonly 
used to evaluate data of this type depend for 
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their validity on adherence to the assumption 
that the individual scores are independent. 
However, it is clear that when the behaviors 
of subjects and their partners are put into the 
same analysis, this assumption of independence 
is often violated. As an example, let us con- 
sider an extreme case. Suppose that a child's 
behavior exactly matches that of his or her 
playmate in some respect. Putting the two 
members of a pair into an analysis as though 
they were independent would be equivalent to 
putting the same case in twice, thus doubling 
the permissible degrees of freedom and under- 
estimating the error variance. In the more 
usual case, mutual dependence of behaviors is 
only partial, but to the extent that mutuality 
does exist, the inappropriateness of the usual 
statistical treatments becomes a serious prob- 
lem. If one wished, for example, to compare the 
behavior of a girl when she plays with another 
girl with the behavior of a girl when she plays 
with a boy, there would be a potentially 
serious violation of assumptions if both mem- 
bers of each pair were treated as subjects and 
put into any analysis that disregarded the 
mutual dependence of behaviors. 

This problem is well-known in other con- 
texts. If, for example, one assigns one member 
of a matched pair to one group and the other 
to a comparison group, comparison of group 
response must be based on the matched-pairs 
1 test or the Wilcoxon signed-ranks test rather 
than on the usual Student's / test or the Mann- 
Whitney U test, which require independent 
responses. If one assesses the effect of 
different treatments, with each subject ЄХ 
posed to each treatment, analysis of treatment 
effect must be based on a repeated measures 
analysis of variance (ANOVA) or a Friedman 
test rather than one an anova of a one-way 
layout or a Kruskal-Wallis H test, both of 
which, again, assume independence of ob- 
servations. However, in studies of social be- 
havior, the possible dependency has not always 
been taken into account. | 

In summary, we suggest that interperson?” 
behavior often involves the interdependency 
of subject’s and partner’s behaviors. Generally: | 
researchers have avoided the statistical prob: 
lems inherent in this interdependency by using 
confederates, pair scores, or dyad scores: | 
However, using confederates or dyad or “ 
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scores has two disadvantages: (a) The nature 
of the dependency itself, which may be of 
interest, is often overlooked, and (b) informa- 
tion about partner effects or the interaction 
of subject and partner effects cannot be 
obtained. 

In the next section, we propose a technique 
that allows for the analysis of measures of 
the individual behaviors of both subjects and 
their partners. The technique involves esti- 
mating the level of dependency between sub- 
ject and partner behaviors and taking this 
dependency into account before estimating and 
testing group effects. In general, the analysis 
parallels the logic of a matched-pairs / test. 
The approach can be generalized to the ex- 
amination of individual behavior observed in 
groups larger than two. We consider (and 
reject) two alternative approaches that do 
not take dependency into account and point 
out the types of errors that this neglect 
induces, as well as two alternative approaches 
that may or may not be viable. 


Analysis of Individual Behavioral 
Observations in Dyads 


The design considered here is one in which 
three types of dyads are studied: girl-girl 
pairs (GG), girl-boy pairs (GB for an observa- 
tion of the girl in such a pair and BG for an 
observation of the boy), and boy-boy pairs 
(BB). The mathematical model used basically 
involves analysis of variance, fixed effects, and 
two factors with р as the grand mean, a as 
the effect of sex of subject, В as the effect of 
sex of partner, and as the group interaction 
between sex of subject and sex of partner. 
Since, however, the pairwise correlations are 
nonzero and the cell variances may be unequal, 
the usual analytic procedures may be invalid. 

, The model states that the paired observa- 
tions in the girl-girl group are indicated by 
(x, a) with 


r-uctectBctvte 

y = и +а+в ++ е, 
E(9) = Е(е) = 0, 

var (е) = var (€) = on’, 


correlation (є, є) = pu- 
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Observations of the girl-boy pairs аге indi- 
cated by (x, y) with 


х=ржфжа—в—“-+ 6 
y7u—edt8—Tvt» 

Е(е) = Еп) = 0, 

var(e = ал“, var (y) = on’, 
correlation (e n) = p12 


Observations of the boy-boy pairs are indi- 
cated by (y, у) with 


y=u-a-B+yt+ 
у =и–ае—в+у+т, 
Еп) = Е(т) = 0, 

var (n) = var (n') = ox, 
correlation (n, 7’) = p22- 


Now let та = (x + x')/2 in the girl-girl group 
(the order of assignment within same-sex pairs 
will be seen to be irrelevant), let m= (x4-)/2 
and di (x—y)/2 in the girl-boy pairs, and 
let ms = (y + y )/2in the boy-boy pairs. Now 
та, ma, do, and та are random variables with 


E(m) = и Та +8+7 


where 
2 
var (mi) — as (1+ pu), 
E(m) = и —Y 
where 
ei? + 2pnowon + oar 
Yu) e 4 
E(d) =a —8 
where 
ei? — 2pnovon + оз? 
vna LC! ита о» 
E(m) = и — anp EY 
where 


2 
var (ma) =“ (1 + ры). 


The variances of each of these variables, which 
are rather complicated functions of the popula- 
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tion variances and correlation coefficients, can 
be estimated directly and simply from the 
sample by the sample variances 


5j, 5,2, 502, апа 52. 


Using the above equations, the following are 
estimators of the parameters: 
~ Mic 2M: +M: 
атр ri 


qm A Ма 


4 , 
Mı + 2d; — Ма 
a= ———3— 73 
4 
and 
, Mi-2M;-FM; 
yoo 4 , 


where M; is the mean computed over the m 
averaged responses of the girl-girl pairs, М 
is the mean computed over the m averaged 
responses, d; is the half difference of the 
responses of the girl-boy pairs, and M; is the 
mean computed over the из averaged responses 
of the boy-boy pairs. Since statistics that have 
different subscripts are independent, the 
standard errors of д and 7 are estimated by 


Mad. Sing 


2N1 
P 


qu Ss 
SE = -| = 
1 i ny "m по 
and the standard errors of 4 and 8 are esti- 
mated by 


4542 


а Qm) 
по 


КАЧИ 
т xy. 


Ts 
SE, = 1 (= 
If the sample variances are consistent esti- 
mators of finite population variances, then for 
large sample sizes the test statistics B/SE, 
6/SEs, B/SEs, and 9/SE, are approximately 
distributed as standard normal deviates 
(Cramér, 1946; Lindeberg & Levy Theorem 
p. 215; Theorem 20.6, p. 254). To ascertain 
the statistical significance of any of these 
effects for large sample sizes, one would com- 
pare the magnitude of the corresponding test 
statistic with the critical values of the standard 
normal distribution. 
To this point, minimal assumptions have 
been made about the distributions of the data. 
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In particular, we have avoided assuming a 
bivariate normal distribution or equal vari- 
ances as well as zero correlation coefficients: 
It should be noted, however, that if the vari- 
ances are equal and the correlations are zero, 
then the statistics above are approximately 
distributed as / statistics. If the size of degrees 
of freedom is large, there is little difference in 
referring to critical values of a / statistic in- 
stead of those of the standard normal deviate. 
For small sample sizes, however, one should 
be aware of these options as well as of the 
consequences of inappropriately exercising 
these options. 

If variances are equal and correlations are 
zero, then the degrees of freedom are m + т 
+ ns — 4. The procedure in this case is only 
as valid as the assumptions. If variances are 
unequal or correlations are nonzero, there is | 
considerable evidence that the nominal signif- 
icance level of the test may differ substantially | 
from the real significance level (Scheffé, 1959, 
chap. 10). If the data are nonnormally dis- 
tributed (but the variance-correlation as- 
sumptions are met), it is generally believed 
that the / test is relatively robust. However, 
even in this instance, there are special cases 
that belie this belief (Lee & Gurland, 1977). 

Alternatively, one can assume that SE 
and SE; are linear combinations of indepen- 
dent chi-square statistics, statistically inde- 
pendent of the sample means used to estimate 
и, а, B, and у. In this case, the sample sta- 
tistics are approximately distributed as / 
statistics with degrees of freedom that can be 
estimated from the data (Satterthwaite, 1946). 
Again, however, if the data are not approxi- 
mately normally distributed, for small samples 
the sample variances do not necessarily have 
chi-square distributions, nor are the sample 
means and variances necessarily independent. 
Application of this theory may then be risky- 

In addition to these analyses for grouP 
effects, intraclass correlation coefficients ca? 
be computed separately between paired in- 
dividuals in the girl-girl and boy-boy dyads, 
and the product-moment correlation coefficient 
can be computed for the girl-boy dyad (the 
intraclass form may be inappropriate if thé 
variances of observations on girl and boy in the 
GB dyad are not equal). Tests of significant? 
for the two types of correlation coefficients 


differ only in specification of degrees of free- 
dom (п — 1 for intraclass and л — 2 for pro- 
duct moment). Not only can one test for the 
significance of each correlation coefficient but 
one can also test for the homogeneity of the 
three correlation coefficients, and, if they are 
homogeneous, one can test for the significance 
of their pooled value (Kraemer, 1975). 

A detailed example of the full calculation, 
estimation, and testing procedures is presented 
in Tables 1 and 2. 


Examination of Alternative Approaches 
Inappropriate ANOVA Approach 


What are the consequences of applying the 
usual Anova procedure for a balanced design 


Table 1 
Frequency of a Toy Offer in an Interactive 
Experimental Situation* 


Raw data Processed data 

GG GB,BG BB СЕТТЕ г. 
НиО 12: 2,4 6.5. 1.5 59D NS 
$50 00 21 2.5.0 Oe 
22). 0,9 11 20 45 —45 10 
LS 3,2 3,3 30: 25) 0060630 
$0 52 02 1.5 35 15. 10 
Ars 253 240 2.525 SEU 
330-55 оба ^ 2511500 0S 
$9. 01 ва. 15 Бев 
11 $3 11 | 45 ЗОО 
ni $5 10 2:5 5 50 REO NS 
pi. 43 14: 300 ои 
18 20 00 6.5. 1:09 10 050 

1,2 15 —.5 

3,14 $5. —5,5 

41 25 15 

7,2 45. 7-25 

9,0 0 0 

0,0 10110 

9,0 DURÓ 

$1 20 10 

1,2 1:5 275 
t 321 252 —.19 1.67 
| 172 216 179 1.25 


Note. G y; 5 
;.7 = girl; В = boy. Intragroup correlation 
jos) ente were .33 (ns), 44 (p < .05), and —.19 
от Ше GG, GB plus BG, and BB groups, 
fro ME There was no significant deviation 
the Deos) Sey, of correlation: x? = 3.10. For 
i ed correlation coefficient, 6 = .25. 
are from Jacklin and Maccoby (1978). 
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Table 2 
Estimates and Tests of Effects for Frequency 
of a Toy Offer* 


Esti- 

Param- Esti- mated 
eter mator SE 5 ај 
и 2.481 .281 8.829** 34 
а —.480 .249 1.928* 38 
в —.290 249 1.165 38 
Y —.403 -281 150 34 


* Data are from Jacklin and Maccoby (1978). 

b Using the method described in Satterthwaite 
(1946) 

* p « 10. 

** р < .001; the effect of the mean estimator's 
being significantly different from zero is trivial in 
this case, since only positive scores were used. 
However, if negative and positive scores had been 
used, the estimator significance might be useful. 


(т, ns = п, па = 2n) and ignoring the paired 
nature of the data? The parameter estimators 
are identical to those above and remain un- 
biased. The variance of these estimators, how- 
ever, may be either underestimated or over- 
estimated (depending on the true variance- 
covariance structure) by the use of the mean 
square error from the anova. For example, 
if all the variances were equal and all the cor- 
relation coefficients were equal to a single non- 
zero p, then the mean square error would esti- 
mate а", and the variance of ў would be taken 
to be c?/8n. In fact, var (7) = 02(1 + p)/8n. 
Thus if p> 0 (ie, a positive correlation 
between subjects and partners), the variance 
is underestimated, and # (the interaction be- 
tween subjects and partners) may appear to 
be more significant than it is; whereas if p < 0 
(ie. a negative correlation between subjects 
and partners), the variance is overestimated, 
and $ (the interaction between subjects and 
partners) may appear to be less significant 
than it truly is. Thus, in general, if the cor- 
relation between subject and partner is ignored, 
incorrect statements may be made regarding 
the significance of one's findings. 


Multiple t-Test Approach 


One might also consider comparing (e.g., 
using / tests) the responses of boys in same-sex 
dyads (averaging paired responses) with those 
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of boys in mixed-sex dyads (BB vs. GB) and 
similarly consider comparing BB versus BG, 
BB versus GG, BG versus GG, and GB versus 
GG). Since subjects' responses in different 
groups are independent, these would be valid 
tests. There are five such tests, with the pro- 
posed multiple / tests nonindependent. In such 
a case, the probability that one or another of 
the tests will exceed the stated significance level 
by chance may be quite high. However, this 
is not the crucial point. The main problem 
lies in the fact that one can extract from these 
test results no clear evidence as to which of 
the factors of interest (sex of subject, sex of 
partner, or interaction) is operative, since 
each procedure tests some combination of 
factors. For example, the test comparing the 
boys’ responses in the BB and BG dyads tests 
whether a — y. If the test turns out to be 
significant, it is not clear whether с = 0 and 
y = 0, whether а > 0 and у = 0, or whether 
both are nonzero but equal to different values. 
Whether or not the sex of partner effect equals 
the interaction effect is in general simply not 
of research interest. 


Discarding-Data Approach 


One may also discard one randomly selected 
subject from each dyad to ensure independence 
of the data used in analysis. However, com- 
putational simplicity is in this case achieved 
at considerable loss of power and, in addition, 
with a loss of information about the dyadic 
interaction itself, that is, about the pairwise 
correlations, which may be of interest. 


Multivariate Analysis of Variance 


Another valid approach to the analysis en- 
tails regarding the original sample as one from 


Table 3 


Model for Individual Responses in Groups of Size k 


Type of group 
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a bivariate distribution with means and 
covariance structure as specified. If the. 
distribution is specifically bivariate normal. 
with identical covariance structure in the three | 
groups (BB, BG, and GG), one can use а 
multivariate ANOVA approach (Anderson, 1958, 
p. 215-221). For small sample sizes, this 
procedure is preferable if the assumptions are 
satisfied, but is risky otherwise. 


Generalization to Triads and Beyond 


| 


Тће approach detailed for dyads can be 
generalized to the situation in which subjects. 
are of two types (girls and boys) but are com- 
bined in groups of size three or more. The 
principles used in the case of dyads are readily | 
extended to larger groups, but the notation 
becomes cumbersome. We thus sketch only 
briefly how this generalization can be effected. 
Table 3 presents a mathematical model for 
this situation. 

Each group comprises k members, and there 
are k + 1 such groups indexed by the number 
of girls in the group: j = 0, 1, 2, ..., & m 
the model, а is the sex of subject effect, Bi 8 
the main effect of that subject’s partnership 
group (with k— 1 members), comprising 
j — 1 girls and & — j boys, and v; is the 
interaction effect between sex of subject and 
such a partnership group (© 6; = =" 
ј= 1, 2, ..., k). Thus one notes that the 
response of a boy in a group of j — 1 girls 
and k — j + 1 boys is parameterized in term 
of the same parameters (8 and у) as that ofa 
girl in a group of j girls and & — j boys- 

We assume that the expected response of 
all girls in any group is the same, as is that 0| 
all boys in a group. Thus each k-dimension® 
response vector, for purposes of estimating '* 


Individual res; . Меап respon 
Index No. No. ap T : 
j Girls Boys Girls Boys Observed Girls Во 
0 0 k Em и—се+%— nte пу — A 
1 а" пића + + є WALL пуне m Xa n 
n bU nac Be qi e 2i а Xu 
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, parameters, can be reduced to a bivariate 


vector (X ;;, У:о+р) in which X; is the average 
response of girls in the ith group of type j 
(i9 1, 2, ..., nj) and Уг) is the average 
response of boys in such a group. Thus no 
matter what the size of the group, the problem 
can be reduced to that of considering a bi- 
variate response. 

Within a group there are only three inter- 
subject correlation coefficients of interest: 
that of the error components between a pair of 
girls, that between a pair of boys, and that 
between a boy and a girl. When these cor- 
relation coefficients are nonzero, as one would 
expect, X;; and У; оз are correlated responses. 

If the data are reorganized in terms of a 
two-way layout (Sex of Subject X Type of 
Partnership Group), the process of obtaining 
unbiased estimators of the parameters is 
clear (see Table 4). 


fj-2,-M where j21,2, ..., k, and 
1;-X ,-Z, —X.--M where ј=1, 2,...,k 


The problem lies in the estimation of standard 
errors for these estimators, since they comprise 
both dependent and independent sample 
means. The procedure is illustrated by the 
calculation for, say, Bo, 


First 6, is re-expressed in terms of sample 
means as 


РЧ k 


b= (Rat P)/2 – (X, x Y.,)/2k. 
~“ 


$e 
Correlated means are isolated, that is, 


By = (AHER — 1 — Y] 
+R- Ра – X4] 


k-1 
TE s+ Puen) – Pa) – а). 


R this case, each bracketed term is indepen- 
nt of any other bracketed term, since the 


terms arise from different groups. Thus the 


ае of B, is the sum of the variances of 
€ bracketed terms divided by 42. The 
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Table 4 
Means Computed From Groups of Size k 


No. of girls in partnership group 
Sex of 


subject 0 1 2 +++ k—1 Mean 

Girl ТИ Хеј wee Xa Es 

Boy У 6€ Fa y. 
Mean Z, Z ZZ 2, M 


Note. Each mean in the first row is correlated with 
the mean in the pond TOW ret | one column to 
the right (e.g, Хал with У); X. = X Xi;/n;, 
j= У; Ут, X.-Y;Xj/h Р..= У Ӯ. 
М = (X.. + Y.)/2, and 2, = (X.; + Р.)/2. 


variance of any опе of these terms, for example, 
that of 


Г — 1)X.2 — Y.s)], 


is obtained as follows. 
For each of the n» groups observed compris- 
ing two girls and k — 2 boys we compute 


О,= (8—1) Хе Үз, where i=1, 2, ..., mz. 


The variance estimate of со? is then the sample 
variance of these observations, sy’. The term 


of interest, 


(ESSE = Pas, 


has population variance y*/ms, estimated by 
S? / ns. 

In this way the variance estimates of each 
of the bracketed terms are compiled and 
combined to yield the variance estimate of p». 
Once again, if the sample variances are con- 
sistent estimators of finite population vari- 
ances, the large sample distribution of the 
ratio of the statistic to its standard error 
under the null hypothesis is approximately 
standard normal. As above, under certain 
circumstances the small sample distribution of 
these statistics may be either exactly or ap- 
proximately a / distribution. 


Summary 


The study of social behavior without con- 
federates is complicated by the dependency 
of subjects’ and partners’ behaviors. We have 
suggested a statistical approach that takes 
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into account this dependency. Although we 
have illustrated the approach with data from 
dyads, the method can be generalized to 
groups of any size. 

We have discussed the problems encountered 
with alternative statistical analyses. The most 
common approach in the literature is to use 
an ANOVA without taking into account the 
dependency of the data. Even if all the other 
ANOVA assumptions are met (equal variances 
and balanced design), the test for the inter- 
action term, for example, will be biased de- 
pending on the amount and direction of de- 
pendency (correlation) that exist. Positive 
correlations lead to an overestimate of the 
significance of the ANOVA interaction term; 
negative correlations lead to an underestimate 
of the significance of the anova interaction 
term. A second approach we discussed was the 
multiple /-test approach. The problem with 
this approach is that it does not give clear 
evidence of which factors of interest account 
for the results. A third approach is to drop 
half the subjects from the analysis to eliminate 
the dependency. One problem with this ap- 
proach is a loss of power of the test used, but 
perhaps a more serious problem is that the 
actual correlations between subject and partner 
may be lost. 

Finally, we note that all of these approaches 
represent attempts to reduce what is in general 
a complex multivariate analysis problem to a 
series of relatively simple univariate analysis 
problems. In certain circumstances (multi- 
variate normality and certain covariance 
structures) the multivariate approach is not 
so complex as to discourage its use and in these 
cases is preferable. 

Interdependence of subjects' and partners' 
behaviors reflects the real social situation. In 
some cases, if we are to understand the dy- 
namics of social behavior, the use of con- 
federates may not be the best choice, even when 
it is possible. The analytic procedures, how- 
ever, should be appropriate to the nature of 
the data. 
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Obsessive-Compulsive Personality: A Review 


Jerrold M. Pollak 
Boston College 


A review of the term obsessive-compulsive personality (or anal character) is 
presented. Statistical studies of obsessive-compulsive personality conducted over 
approximately the last two decades are then reviewed, emphasizing the extent 
to which they support theory, clinical observation, and description. Evidence is 
still needed on precise etiological determinants, and persuasive evidence in favor 
of classical psychoanalytic theories about etiology is lacking. Empirically based 
findings to date, however, are congruent with clinical observation, description, 
and prediction regarding the salient behavioral characteristics and character 
styles of obsessive-compulsive individuals. 


The obsessive-compulsive personality or 
anal character, as it is sometimes termed, 
came under close scrutiny by Freud and his 
colleagues (Abraham, 1921/1953; Freud, 
1908/1963; Jones, 1918/1961). Carr (1974) 
credited Esquirol (cited in Carr, 1974), who 
worked in the early part of the 19th century, 
with the first publication relating to compul- 
sions and similar phenomena. According to 
the early Freudians, the anal character, as 
they referred to it, arises out of conflicts 
between parents and child over bowel train- 
ing in the 2nd to 3rd year of life. The intro- 
duction of bowel training was thought to 
bring with it an inevitable conflict between 
the child’s desire to freely manipulate elimi- 
nation (expulsiveness) and retention (reten- 
tiveness) and the need of the child’s primary 
caretakers to regulate their child’s anal activi- 
ties and expressions in line with prevailing 
cultural and societal standards of cleanliness 
and impulse control. If the primary caretakers 
Prove to be too punitive, impatient, and in- 
tolerant of their charge’s willfulness and au- 
tonomy, or if the training comes either too 
early or too late or is experienced as inordi- 
nately frustrating or gratifying, the inevitable 
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struggle over bowel control intensifies, Ac- 
cording to classical analytic theory, this may 
very well establish the groundwork for anal 
fixations and hence the development of a 
predominantly anal or obsessive-compulsive 
character structure in the child. 

For Freud and for all later analytic and 
nonanalytic clinicians, obsessional personality 
or anal character is to be carefully distin- 
guished from the psychiatric syndrome of 
what has been called obsessive-compulsive or 
obsessional neurosis. In an obsessional neu- 
rosis the individual suffers from persistent 
intrusion of undesired thoughts (obsessions) 
or urges and actions (compulsions) that he or 
she finds extremely difficult, if not impossible, 
to control and ultimately stop. Following 
Salzman (1968), the obsession may be viewed 
as a persistent ritualized thought pattern, 
whereas the compulsion is a persistent ritual- 
ized behavior pattern. Either or both may be 
salient in the clinical picture, and anxiety and 
distress usually are concomitants of the dis- 
order. 

Individuals with a predominantly obses- 
sive-compulsive personality are considered 
asymptomatic, that is, what defines the indi- 
vidual consists of particular constellations of 
traits, defenses, and life-style and not the 
presence of psychiatric symptomatology. 

While obsessive-compulsive neurosis is a 
relatively rare phenomenon '(Templer, 1972), 
obsessive-compulsive personality or anal char- 
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acter structure is not, In fact, one could argue 
that in Western culture obsessive-compulsive 
personality is one of the, if not the predomi- 
nant, social character structures, embodying 
as it does so much of the general world view 
of the Protestant Work Ethic and capitalist 
social and economic organization (Honig- 
mann, 1967). 

Obsessive-compulsive personality has also 
been observed to prevail in some non-Western 
but relatively advanced industrial societies, 
such as Japan (Gorer, 1943), and is very 
much in evidence within various professional 
groups and specialty occupations and among 
individuals in bureaucratic and managerial 
positions, It could be said as well that many 
of the traits characteristic of obsessional 
personalities, for example, perseverance, in- 
dustriousness, thriftiness, ambition, self-con- 
trol, and so on, are highly regarded and re- 
warded within capitalistic, technological soci- 
eties, serve to promote in their possessors 
feelings of self-worth and acceptability, and 
generally provide them with a foundation for 
emotional stability and relative resistance to 
stress (Paykel & Prusoff, 1973), 

Freud (1908/1963) delineated one partic- 
ular constellation of traits, namely, obstinacy, 
parsimony, and orderliness, that constitute the 
core of what he termed the anal retentive or 
anal character type and that arise from sub- 
limations of and reaction formations against 
infantile anal erotic impulses that press for 
expression. In Freud, orderliness refers to 
both exceptional bodily cleanliness and a high 
degree of reliability and conscientiousness in 
the performance of all actions, however incon- 
sequential. Parsimony involves frugality and, 
in the extreme, stinginess and avarice, where- 
as obstinacy involves strong tendencies to be 
negativistic, defiant, and even hostile in rela- 
tion to authority figures, According to Freud, 
orderliness develops from the internalization 
of parental demands for bowel control, 
whereas parsimony develops from the continu- 
ation of the infantile tendency to retain feces, 
both because of the erotic pleasure that ac- 
companies retention and because of the fear 
of losing the overvalued product. Freud 
viewed obstinacy as developing from resist- 

ance to parental demands. In the traditional 
psychoanalytic model, considerable aggres- 
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sion, that is, anal sadistic impulses and atti- 
tudes, is generated in the child as a result of 
his or her struggle to insure autonomy. When 
this struggle leads to fixations or develop- 
mental arrests at the anal stage, the infantile 
unresolved aggression is expressed in later life 
in any number of ways, for example, in pas- 
sive-aggressive withholding behavior (or other 
indirect means of defying the wishes or dic- 
tates of others) or in the adoption of ex- 
tremely conventional, oversocialized or reac- 
tionary attitudes via reaction formation and 
identification with the aggressor. The adop- 
tion of reactionary attitudes is oftentimes 
linked with an aggressive, hypercritical, and 
controlling attitude toward others. Regard- 
less of the exact form taken by infantile ag- 
gression in specific individuals, ambivalent 
attitudes toward the expression of hostile 
feelings and impulses particularly, but also 
toward all manifestations of the affect and 
impulse life, are thought to be paramount in 
the personality makeup of the adult obses- 
sive. 

The traits, defenses (undoing, reaction for- 
mation, intellectualization, and isolation), and 
life-styles viewed as characteristic of anal char- 
acters or obsessive-compulsive personalities 
have been further elaborated by later ana- 
lysts with more refined theoretical formula- 
tions, case history data, and analyses of re- 
sponses to projective tests like the Rorschach 
test (Fenichel, 1945; Rado, 1959; Schafer, 
1954; Shapiro, 1965). Despite some ambig- 
uities and inconsistencies in the voluminous 
clinical literature, there appears to be consid- 
erable consistency in the personality descrip- 
tions that emerge. This is true even when one 
compares psychoanalytic descriptions of the 
anal character with descriptions of the 
obsessive-compulsive character of obsessive 
personality that emerge from less heavily 
psychoanalytically influenced theoreticians 
and practitioners. The latter accept the ana- 
lytic description of the character, but do not 
necessarily agree with its etiologic assump- 
tions concerning psychogenesis in the ana 
stage of psychosexual development. Ingram 
(1961a), for instance, compared description 
of the obsessive personality found in leading 
psychiatric texts with descriptions of the anê 
character given in the classical psychoanalytic 
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papers of Freud, Abraham, and Jones and 
found highly similar descriptions with nu- 
merous features in common, for example, the 
characteristics of orderliness, persistence, and 
rigidity. For descriptive purposes Ingram felt 
there was no point in distinguishing between 
the two terms. The terms obsessional, obses- 
sive, or obsessive-compulsive personality ap- 
pear to be the preferred terms insofar as no 
etiologic assumptions are implied by them. 

In his review of descriptive features, In- 
gram emphasized the following composite de- 
scription of obsessive individuals: 

People who manifest obstinacy are perse- 
vering, thorough, reliable, overconscientious, 
and have considerable qualities of endurance 
and drive. They are dogged and persistent. All 
of this may lead them to be called obstinate, 
stubborn, or defiant. Added to this is a qual- 
ity of self-willed independence that leads to 
a desire to do things their own way and to a 
dislike of interference. They are rigid and in- 
flexible, particularly in terms of ideals, values, 
and standards of conduct for themselves as 
well as others. They are not emotionally de- 
monstrative and may be cold, aloof, and dis- 
tant. They are often critical, controlling, and 
in the extreme, power loving. 

Individuals who demonstrate inconclusive- 
ness are indecisive, vacillate in behavior and 
thought, cannot leave well enough alone, and 
make unsatisfying attempts to reach order 
and perfection. They are always busy and 
never finished and feel harassed by responsi- 
bilities and obligations. They repeat and check 
their work repeatedly and needlessly, and they 
continuously weigh the pros and cons of deci- 
sions. They fear error, are afraid of making 
Mistakes or omissions, have strict moral 
scruples, and fear violating the social code. In 
general, they are uncertain of themselves, are 
insecure, are prone to worry and doubt, and 
сап be hesitant; but they often try to conceal 
these traits, 

People who display orderliness are over- 
orderly, live by routine, and become easily 
е when their routine is disturbed by un- 
Oreseen events or circumstances. They аге 
Meticulous, perfectionistic, and sticklers for 
Rm They are fond of indexing, tabulat- 
8, Organizing, and planning and crave ас- 

uracy, symmetry, and rationality. 
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Individuals who exhibit parsimony are fre- 
quently avaricious, particularly with regard 
to possessions, money, and time, all of which 
are not to be wasted. 

The review of the literature presented in 
the following pages focuses on statistically 
based research studies of obsessive-compulsive 
personality conducted over the past two 
decades or so that are judged to be relevant 
to the descriptive, psychogenetic, and psycho- 
dynamic considerations that have emerged 
primarily from theory and clinical observa- 
tion. 


Empirical Studies Relevant to the Etiology of 
Obsessive-Compulsive Personality 


A number of investigators have sought to 
substantiate relationships between toilet 
training practices and the development of anal 
character traits as originally proposed by 
Freud (e.g. Beloff, 1957; Bernstein, 1955; 
Durrett, 1959; Finney, 1963; Hetherington 
& Brackbill, 1963; Holway, 1949; Huschka, 
1942; Kline, 1969; D. R. Miller & Swanson, 
1966; Sears, Rau, & Alpert, 1965; Sewell, 
Mussen, & Harris, 1955; Straus, 1957; Whit- 
ing & Child, 1953). The majority of these 
studies have focused largely on the age toilet 
training was initiated, the age it was brought 
to completion, and the degree to which it may 
have been inordinately lax or severe. The de- 
signs typically involved the collection of retro- 
spective accounts, usually by mothers, of the 
toilet training period in an attempt to relate 
these parental recollections of when and how 
training proceeded to the degree of the off- 
spring’s anal orientation, as measured by 
teacher and parent ratings, response to anality 
questionnaires, and performance on various 
behavioral tests (S. Fisher & Greenberg, 
1977). 

A review of these studies offers, at best, 
meager support for the hypothesized relation- 
ship between toilet training practices and 
the development of anal or Obsessive-com- 
pulsive character structure. Orlansky (1949) 
concluded that knowledge of sphincter train- 
ing was insufficient to substantiate or dis- 
prove the Freudian position. Sewell et al. 
(1955) attempted to relate toilet training 
practices to personality assessments made at 
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5 years of age, but found no evidence for the 
Freudian view. Beloff (1957) found that the 
personality traits of orderliness, parsimony, 
and obstinacy occurred together, but could 
not find significant relationships between how 
coercively toilet training was carried out, as 
assessed through interviews of mothers of 
college students, and student and peer ratings 
of anal characteristics. In a review of child- 
rearing practices, O'Connor and Franks 
(1960) reported no conclusive evidence for 
the Freudian interpretation. Studies subse- 
quent to that review (e.g., Hetherington & 
Brackbill, 1963; Finney, 1963; Sears et al., 
1965) also indicated that reported parental ac- 
counts of toilet training practices were not 
related to ratings of children's anal character- 
istics. Moreover, in a series of studies, Got- 
theil and Stone (1968, 1974) and Stone and 
Gottheil (1975) could find, at best, only a 
“slight preferential association" between anal 
personality patterns and patterns of bowel 
habits in samples of normal, neurotic, and 
psychosomatically involved outpatients, that 
is, bowel habit questionnaire items did not 
load highly on the anal trait factor that was 
identified. Kline (1968), however, did report 
significant positive relationships between a 
measure of anal eroticism derived from the 
Blacky Pictures (Blum, 1949) and several 
measures of obsessional traits and symptoms 
(viz., measures from Sandler & Hazari, 1960, 
and Beloff, 1957, and his own AI3 Scale of 
Anality). 

Although it appears that there is little, if 
any, empirical evidence for the classical 
psychoanalytic position on the etiology of the 
obsessive-compulsive or anal character type, 
there are suggestions in clinical observation 
and in some of the same studies that failed to 
support the Freudian point of view of rela- 
tionships between anal character traits in the 
child and the existence of anal characteristics 
in the parents (Beloff, 1957; Finney, 1963; 
Hetherington & Brackbill, 1963). Beloff ad- 
ministered a questionnaire concerned with 
anal traits to a sample of male and female 
college students and their mothers and found 
positive relationships between parental anal 
orientation and the anal orientation of the 
children, 

Hetherington and Brackbill administered a 
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questionnaire measuring Freud's anal triad, 

namely, obstinacy, orderliness, and parsimony, 

to a sample of fathers and mothers. Their 

male and female children’s anality was as- 

sessed by performance on a series of tasks 
measuring anal behaviors such as parsimony, 

obstinacy, and perseverance. Significant posi- 

tive correlations were found between degree 
of anality in mother and daughter but not in 

mother and son. No significant relationships 

were found between a father’s anality and that 
of his children of either sex. Speculating that 
strict toilet training is simply one expression 

of a more general pattern of parental rigid- 

ity, Finney found, as predicted, that clini- 

cians’ ratings of general rigidity in a sample 

of mothers bore a significant positive rela- 

tionship to the degree of the child's anal ori- 
entation. 

The results of these studies, which are in- 
dicative of comparable degrees of anal orienta- 
tion in children and their parents, suggest à 
number of possibilities regarding etiology. It 
may be that obsessive-compulsive personality: 
or anal character structure emerges from 
repeated contact, association, and clashes 
throughout childhood between the child and 
figures such as parents, teachers, and rela- 
tives directly involved in caretaking responsi- | 
bilities, who are themselves fairly rigid, con- ^ 
trolling, and generally obsessional in their 
style of relating to the children in their 
charge. As Carr (1974) has pointed out, even 
if a relationship between rigid toilet training 
practices and obsessional traits could be 
shown, this association could easily be in- 
terpreted as a function of childhood training 
in general rather than of specific repressive 
toilet training. This point of view is not in- 
consistent with the idea that the effect of 4 
rigid, obsessional parental orientation could 
very well be maximal before or during the 
toilet training phase, when unresolved anal 
conflicts in one or both parents are stirred up 
anew, leading to increased anxiety and more 
Pronounced recourse to obsessional behavior 
as a defense against the impact of the stress- 
ful circumstances (S. Fisher & Greenberg 
1977). This could occur even if the primary 
caretakers did not begin toilet training pat- 
ticularly early or late. It may be, then, that 
toilet training practices are not causal in any 
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strict sense, but are a correlate of a larger 
and more influential child-rearing pattern. In 
this view, obsessive-compulsive style is seen 
as largely socially learned behavior that re- 
sults from the imitation and modeling of 
significant others over a number of years 
throughout the childhood period. 

One cannot discount, as well, the possible 
role played by some as yet vaguely defined 


. constitutional factors in the etiology of ob- 


sessive-compulsive personality. Freud (1908/ 
1963) himself made reference to constitutional 
influences that might result in an especially 
intense inborn erotic sensitivity in the anal 
zone that by itself or in interaction with 
experiences during the toilet training phase 
results in the development of a predominantly 
anal adult personality orientation. 

In a review of research findings on obses- 
sive-compulsive neurosis, Templer (1972) 
concluded that, although there is a high inci- 
dence of various types of psychopathology 
among relatives of obsessive-compulsive neu- 
rotics, the possible etiological role played by 
Benetic or other constitutional factors is un- 
clear. This point of view is shared by Carr 
(1974). In the case of the development of 
obsessive-compulsive personality, the possible 
role played by constitutional influences still 
remains in the realm of speculation because, 
to date, there has been little empirical re- 
search in this area. A study by Hays (1972), 
however, of the family pedigrees of 17 psychi- 
atric patients, mostly female, that carried a 
diagnosis of psychotic depression and had 
premorbid obsessive-compulsive personalities 
did find evidence to support an interaction 
effect of genetic predisposition, sex of the 
child, and child-rearing style in the genesis 
of obsessive-compulsive personality. 

Some theorists and clinicians stress the need 
for obsessive-compulsive personality style as 
a Character defense or armor for the ego 
against the ambiguities, uncertainties, and 
anxieties inherent in human existence (eg., 
Becker, 1974; М. Н. Miller & Chotlos, 1960; 
alzman, 1968; Strauss, 1966). There is yet, 
owever, no statistical evidence to support the 
S of these clinicians, who work 
8 y within an existential-phenomenological 
da ework that rarely generates statistical 
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Measurement of Obsessive Compulsiveness 


Over the past two decades, several ques- 
tionnaires and scales have been devised that 
purport to measure obsessional traits and 
characteristics (Allen & Tune, 1975; Beloff, 
1957; Blum, 1949; Caine & Hawkins, 1963; 
Comrey, 1965; Cooper, 1970; Gottheil, 
1965b; Grygier, 1956; Kline, 1969; Lazare, 
Klerman, & Armor, 1966, 1970; Sandler & 
Hazari, 1960). Kline (1969) argued that 
there has really not been an abundance of 
empirical research on obsessional personality, 
primarily because there is no fully accepted 
measure of obsessional traits or obsessional 
symptoms. Most, if not all, of the existing 
scales are not standardized, nor is there suf- 
ficient evidence for their reliability and va- 
lidity to justify a rational choice of one over 
the other. According to Kline the major per- 
sonality inventories, like the Sixteen Person- 
ality Factor Questionnaire (Cattell & Eber, 
1957), the Eysenck Personality Inventory, the 
Maudsley Personality Inventory, and the 
Minnesota Multiphasic Personality Inventory 
(MMPI), do not contain a measure of obses- 
sional traits and characteristics. The Psy- 
chasthenia scale of the MMPI, designated as 
Scale 7, is sometimes referred to as a measure 
of obsessive-compulsiveness; however, there 
is good reason to suspect that the Psychas- 
thenia scale is more a general measure of 
classically neurotic concerns, preoccupations, 
and characteristics, namely, anxiety, with- 
drawal, immobilization, agitation, and so on, 
rather than a specific measure of obsessive- 
compulsive behavioral tendencies (e.g., Dahl- 
strom, Welsh, & Dahlstrom, 1972; Drake & 
Oetting, 1959). One of the more promising 
measures to date is the Lazare-Klerman Trait 
Scales (Lazare et al., 1966, 1970), a factor 
analytically derived instrument that purports 
to measure obsessional, hysterical, and oral 
dependent personality. This self-report inven- 
tory contains 140 true-false items that are 
scored into 20 trait scores, each based on 
seven items. These 20 traits are reported in 
four separate factor analytic studies to com- 
bine to three orthogonal factors that closely 
mirror obsessive, hysterical, and oral char- 
acter traits, as described in the clinical litera- 
ture (Lazare et al., 1966, 1970; Paykel & 
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Prusoff, 1973; Van Den Berg & Helstone, 
1975). 


Empirical Validation of 
Obsessive-Compulsive Personality 


Clinical descriptions are highly consistent 
in terms of the defenses, traits, and behavioral 
styles thought to be defining characteristics 
of the obsessive-compulsive personality. The 
findings of statistical studies have in the 
majority of cases generally supported the de- 
scriptions of the anal trait clusters found in 
the clinical studies. The bulk of these studies 
have been of the correlational and factor 
analytic type (Barnes, 1952; Beloff, 1957; 
Brooks, 1969; Comrey, 1965; Cooper & Kel- 
leher, 1973; Finney, 1961, 1963; Gottheil, 
1965b; Gottheil & Stone, 1968; Hetherington 
& Brackbill, 1963; Kline, 1968; Lazare et 
al., 1966, 1970; Mandel, 1958; Paykel & 
Prusoff, 1973; Rapaport, 1955; Sandler & 
Hazari, 1960; Schlesinger, 1963; Sears, 1943; 
Stagner, Lawson, & Moffitt, 1955; Stagner & 
Moffitt, 1956; Stone & Gottheil, 1975; Van 
Den Berg & Helstone, 1975). 

Finney (1961, 1963) and Beloff (1957) 
found the traits of obstinacy, parsimony, and 
orderliness to be correlationally related in the 
children they studied. 

Barnes (1952), Stagner et al. (1955), and 
Stagner and Moffitt (1956) sought to show 
through factor analytic methods that traits 
related to the psychosexual stages delineated 
in Freudian theory were empirically grouped 
in adult subjects. The results of these early 
factor analyses must be viewed as equivocal 
and difficult to accurately interpret because 
of inconsistent definitions of psychosexual 
stages and traits and the dubious adequacy of 
the measures used (Gottheil & Stone, 1968). 
In the study by Barnes, a meticulous rather 
than an anal factor per se was identified using 
the responses of 266 male college students 
to lists of items composed for the study. 
However, grouped together on this factor 
were traits of orderliness, reliability, law 
abidance, and cleanliness, in addition to me- 
ticulousness. Rigidity, sadism, and defiant 
resentment loaded on another factor termed 


externalized aggression. 
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Other factor analytic investigations appear 


to be more consistent with theory and clinical | 


description. Schlesinger (1963) performed a 
factor analysis of 154 items taken from vari- 
ous anality questionnaires and extracted 12 
factors, namely, responsibility in dealing with 
others, regularity and meticulousness, reten- 
tiveness as a style of life, obstinacy, rigidity, 
frugality, concern about dirt and contamina- 
tion, orderliness, self-righteous hostility and 
competitiveness, anxiety over possible loss of 
control, sensitivity to smells, and retentive- 
ness in relation to possessions. Cooper and 
Kelleher (1973) carried out a principal-com- 
ponents analysis of the Leyton Obsessional 
Inventory (Cooper, 1970) using approx- 
imately 300 normal subjects divided by sex 
and nationality (Irish and English). Three 
distinct components were derived from four 
separate analyses and termed (a) concern 
with being clean and tidy, (b) feeling of in- 
completeness, (c) checking and repetition. 
The second component appeared to relate to 
a need for closure, as reflected in the follow- 
ing item: “Even when you have done some- 
thing carefully, do you often feel that it is 
somehow not-quite right or complete?" Other 
less distinct components were found, one of 
which suggested the label methodical. 

In studies by Lazare et al. (1966, 1970), 
alluded to above, factor analysis was em- 
ployed to explore three personality patterns 
derived from psychoanalytic theory: oral, ob- 
sessive, and hysterical. The three personality 
patterns were defined by approximately 20 
traits obtained from a review of the clinical 
literature. Each of these traits was then reli- 
ably measured from groups of true and false 
items rated by samples of female psychiatric 
inpatients and outpatients. The traits so mea 
sured formed three clusters that corresponde 
quite closely to psychoanalytic descriptions 
of oral, obsessive, and hysterical personality 
patterns. In the earlier study (Lazare et al. 
1966), all of the defining traits of the obses- 
sive personality factor were correctly pl 
dicted from theory, for example, orderliness, 
parsimony, rejection of others, emotional coh 
striction, obstinacy, severe superego, rigidity, 
and perseverance. One predicted obsessi? 
trait, self-doubt, however, only had a factor 
loading of .12. In their later study, 14287 
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‘etal, (1970) replicated their earlier findings 
with the exception of the trait of obstinacy. 
Virtually identical defining traits have been 
ported by Paykel and Prusoff (1973), who 
used the responses of male and female re- 
“covered depressives to the Lazare-Klerman 
"Trait Scales, and by Van Den Berg and Hel- 
"stone (1975), who used the Lazare-Klerman 
"Trait Scales in Holland with samples of psy- 
chiatric and normal females. 
7? Gottheil (1965a) sought to investigate the 
extent to which mental health professionals 
‘agree in their use of the terms anal and oral 
Character. A group of psychiatrists and clin- 
“ical psychologists (W = 20) completed ques- 
—'üonnaires composed of items derived from 
s clinical descriptions of oral and anal char- 
__асјег types and then were asked to predict 
how a typical oral character would answer 

‘the oral questionnaire and how an anal char- 

acter would typically answer a questionnaire 

On anal character traits. The degree of con- 

sistency demonstrated in the responses of the 

professional subjects was found to be highly 

Significant statistically, suggesting that men- 

tal health experts possess similar conceptions 

Of these character types. Significant agree- 

ment was also found among the subjects on a 

Majority of the items that constituted the two 

questionnaires (р < .03). Whereas in both 
instances agreement was quite high, there was 
1655 agreement on the conceptualization of 
the oral character than of the anal character, 
Suggesting that the concept of the latter has 
been more clearly and unambiguously de- 
scribed. 

In another study, Gottheil (1965b) admin- 
istered his newly constructed oral and anal 
Questionnaires to 200 army enlisted men. 
ltem analyses indicated that the various 
"Characteristics attributed by expe-t judges to 
the anal and oral character types are em- 
Ditically associated in the responses of normal 
adult male subjects. 

In one factor analytic study using the same 
Sample of army recruits, Gottheil and Stone 
(1968) derived an oral and an anal trait 
factor from responses both to the items con- 
Stituting the oral and anal trait question- 
| and to items concerned with mouth 

E. bowel habits. Oral and anal subfactors 
_ Mere also derived that were quite consistent 
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with psychoanalytic descriptions. The five 
anal subfactors identified were termed (a) 
rigidity; (b) obsessive rumination; (c) per- 
fectionism (which includes orderliness, per- 
sistence, and a tendency to be critical); (d) 
parsimony, possessiveness, and checking and 
rechecking; and (e) a practical point of view. 
However, in the overall questionnaire factor 
analysis, the oral and anal trait factors were 
weak. Together they accounted for only 5.3% 
of the variance in the total set of items. 
Moreover, only 23% of the variance in the 
anal trait scale was accounted for by the five 
anal trait subfactors that were extracted. 
Thus, despite the kinds of items selected, 
neither anal nor oral character structure 
emerged as the strongest organizing factor. 
All of these results were for the most part 
confirmed in a later study (Stone & Gottheil, 
1975) that factor analyzed the responses of 
samples of neurotic and psychosomatically 
involved outpatients to the same sets of items 
employed in the previous study. 

Studies by Gottheil (1965b), Gottheil and 
Stone (1974), and Kline (1967) provide evi- 
dence that obsessional character traits are 
normally distributed in samples of normals 
and of neurotic and psychosomatic patients. 
Therefore, adult subjects that constitute both 
normal and clinical groups cannot be char- 
acterized as either having or not having ob- 
sessive tendencies, but rather must be char- 
acterized as having more or less of them. 


Obsessional Symptoms Versus 
Obsessional Personality 


Several studies have addressed the issue of 
whether obsessional symptoms can be reliably 
differentiated from obsessional personality 
traits and characteristics. As discussed in the 
Introduction, the distinction between per- 
sonality and symptoms was emphasized by 
Freud and his contemporaries, as well as by 
virtually all later clinicians. 

Using a sample of 100 neurotic patients, 
approximately equally divided between men 
and women, Sandler and Hazari (1960) fac- 
tor analyzed responses to 40 items related 
to obsessive-compulsive character traits and 
symptoms and found two relatively inde- 
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pendent, orthogonal personality constellations 
or dimensions quite similar to the clinical 
distinctions between obsessional character 
traits and obsessional neurotic symptomatol- 
ogy. The items that loaded on the character 
trait dimension present a picture of an ex- 
ceedingly systematic and methodical person 
who likes a well-ordered life, is consistent and 
punctual, and is meticulous in his or her use 
of words. The individual dislikes half-done 
tasks and finds interruptions in plans and 
goal-directed activity irksome. He or she pays 
much attention to detail and has a strong 
aversion to dirt. These characteristics are well 
integrated in the personality, that is, they are 
ego syntonic and are frequently viewed as a 
source of pride and esteem by their posses- 
sors. , 
The other dimension Sandler and Hazari 
identified corresponded well to descriptions 
of obsessional neurotic symptomatology, for 
example, when life is severely disrupted by 
the intrusion of unwanted thoughts and com- 
pulsive acts and by worry, doubt, and pro- 
crastination, Unlike the obsessional traits, 
these unwanted thoughts and impulses were 
experienced as alien and disturbing. In psy- 
choanalytic terms, the traits represent a suc- 
cessful or adaptive ego defense, whereas the 
symptoms are evidence of a breakdown in de- 
fense mechanisms (Kline, 1967). Sandler and 
Hazari concluded that although the two di- 
mensions were orthogonal, this does not 
necessarily mean that in many instances sub- 
jects will not show a mixture of both dimen- 
sions; nor does it necessarily imply that both 
do not share common dynamics and etiology, 
for example, conflicts over anal urges. Sub- 
sequent studies by Foulds, Caine, Adams, 
and Owen (1965), Kline (1967), and Meares 
(1971) found evidence to support Sandler 
and Hazari’s original distinction between ego- 
syntonic traits and ego-dystonic (or alien) 
symptomatology. Slade (1974) similarly con- 
cluded that factor analytic investigations in 
which both obsessional-trait and obsessional- 
symptom items have been included strongly 
suggest the existence of separate trait’ and 
symptom factors. Whether a single trait and 
a single factor emerge or a number of both 
factors emerge may be primarily dependent 
on the range of behavior studied. 
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The results of two studies (Ingram, 1961b; 
C. M. Rosenberg, 1967) suggest that there is 
some relationship between obsessional neu- 
rosis and obsessional personality. Rosenberg 
investigated the personality characteristics of 
47 obsessional neurotics by means of psychi- 
atric ratings and performance on selected per- 
sonality inventories. Of this sample, 25 were 
judged to have an obsessional premorbid per- 
sonality, which was quite congruent with 
clinical descriptions, that is, orderly, rigid, 
obstinate, dependable, pedantic, and so forth. 
Psychometric studies like those of Sandler 
and Hazari (1960), however, do suggest the 
independence of obsessional illness and obses- 
sional character. Moreover, clinical observa- 
tion suggests that at least some obsessional 
neurotics never could be said to have had a 
premorbid obsessional character makeup 
(e.g., Rack, 1977). In addition, as Paykel 
and Prusoff (1973) pointed out, obsessional 
patients with a corresponding premorbid ob- 
sessional makeup represent a smal] and not 
necessarily typical segment of individuals 
with ^ obsessive-compulsive ^ personalities. 
Clearly there is no necessary one-to-one rela- 
tionship between obsessional personality and 
obsessional neurosis, despite the occasional 
finding that more obsessive-compulsive neu- 
rotics than would be expected by chance show 
evidence of a premorbid obsessional per- 
sonality. 

Several studies have been specifically con- 
cerned with the relationship between obses- 
sional traits and the personality dimension 
of neuroticism. The findings of factor analytic 
studies such as those of Sandler and Hazari 
suggest that obsessional traits may relate 
inversely to neuroticism, whereas obsessional 
symptoms may relate positively to measures 
of neuroticism, maladjustment, and emotional 
instability, With one exception (Orme, 1965), 
the few studies that have addressed this issue 
report these relationships to exist (Kline, 
1968; Meares, 1971; Paykel & Prusoff, 
1973). Kline (1967) factor analyzed the 
Sandler-Hazari Scale (1960), the Beloff Ап 
Test (1957), and the MMPI, using the 1€ 
sponses of a normal sample of 93 college 
students, Three relevant factors emerged: 
general emotional instability, a factor of ob- 
sessional character traits, and a factor of 59" 
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cial introversion. Neither the Beloff nor the 
Sandler-Hazari measure (both measuring ob- 
sessional character traits) loaded highly on 
the emotional instability factor, while the 
Sandler-Hazari measure of obsessional symp- 
toms loaded highly on both the social intro- 
version and emotional instability factors. In 
a subsequent study, Kline (1968) factor 
analyzed the MMPI, the Beloff Anal Test, 
the Sandler-Hazari Scale, and his newly de- 
vised anal scale (AI3), a measure of obses- 
sional traits, and discovered that his measure 
also did not load on the emotional instability 
factor that runs through the clinical scales of 
the MMPI. Meares, using a sample of 32 
patients with spasmodic torticollis, found a 
moderate positive correlation (r= .6, P< 
.001) between a measure of neuroticism (the 
Eysenck Personality Inventory) and the 
obsessional symptoms section of the Sandler- 
Hazari Scale. Obsessional traits also mea- 
sured by the Sandler-Hazari Scale were nega- 
tively related to the Eysenck Personality 
Inventory Neuroticism scale (r= —.31). Orme 
(1965) reported that obsessional character 
traits correlated moderately with a measure 
of emotional instability, Cattell’s (Cattell & 
Eber, 1957) 13-item ‘O’ Factor Scale, in 
samples of normals and obsessional neurotics, 
but, as Paykel and Prusoff pointed out, 
Orme’s measure of traits was derived from 
Sandler and Hazari’s second dimension of 
ego-alien phenomena and relates more to ob- 
sessional symptoms than to obsessional traits. 
Paykel and Prusoff reported additional evi- 
dence that neuroticism, as measured by the 
Maudsley Personality Inventory, relates in- 
versely to obsessional traits, as measured by 
the Lazare-Klerman Trait Scales (r = —.23, 
P< 05), in a sample of 131 recovered de- 
Pressed male and female psychiatric patients. 
In a recent study, however, Pollak (1978), 
Using the Lazare-Klerman Trait Scales as a 
Measure of obsessive personality, found in a 
sample of graduate students (N = 114) that 
Obsessive personality correlated negatively 
With several self-actualization variables of 
the Personal Orientation Inventory (Shost- 
tom, 1966; ғ ranged from —.16 to —.40). If 
one views this inventory as a measure of 
Optimal emotional functioning or positive 
mental health, then the inverse relationships 
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found at least suggest that obsessive per- 
sonality may reflect a somewhat immature, if 
not neurotic, character structure. 

In summary, with some notable exceptions, 
studies indicate significant positive relation- 
ships between measures of obsessional neu- 
rosis and measures of emotional instability 
and significant negative relationships between 
obsessional trait measures and emotional in- 
stability (Slade, 1974). 


Obsessive-Compulsive Personality and 
Introversion-Extraversion 


According to the personality model of Ey- 
senck (1947, 1959, 1960), two orthogonal 
dimensions are used to account for the psy- 
choneuroses, namely, neuroticism and extra- 
version-introversion. Hysterical character 
disorders are viewed, for example, as disturb- 
ances of the neurotic extravert, whereas ob- 
sessive-compulsive disorders are classified as 
disorders of the neurotic introvert. Thus, ac- 
cording to this paradigm, obsessive-compul- 
sive and hysterical patients are conceptual- 
ized as occupying opposite ends of a neurotic, 
introversion-extraversion continuum. 

C. M. Rosenberg (1967) found, in a sample 
of obsessional neurotics, significantly lower 
than average scores both on the Extraversion 
scale of the Maudsley Personality Inventory 
and on the second-order extraversion factor of 
the Sixteen Personality Factor Questionnaire 
(Cattell & Eber, 1957). Several other studies 
(e.g, Barret, Caldbeck-Meenan, & White, 
1966; Caine & Hope, 1964; Forbes, 1969; 
Foulds et al., 1965) have found sizable corre- 
lations in the .70 to .80 range between per- 
formance on the Hysteroid-Obsessoid Ques- 
tionnaire (Caine & Hawkins, 1963) and per- 
formance on the Extraversion scale of the 
Maudsley Personality Inventory and the 
second-order factor Extraversion scale of 
Cattell’s Sixteen Personality Factor Question- 
naire in psychiatric and nonpsychiatric sam- 
ples, with subjects classified as obsessoid 
consistently scoring significantly lower on the 
extraversion measures than subjects classified 
as hysteroid. The results of these studies 
suggest that obsessive and hysterical person- 
alities can be conceptualized as opposite ex- 
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tremes on an introversion-extraversion con- 
tinuum, with obsessional personality highly 
related to introversion and hysterical person- 
ality to extraversion. 

Paykel and Prusoff (1973), however, could 
not find any relationship between obsessional 
traits, as measured by the Lazare-Klerman 
Trait Scales, and the introversion-extraver- 
sion dimension, as measured by the Extraver- 
sion scale of the Maudsley Personality Inven- 
tory, in a sample of recovered depressed 
patients (N= 131). Moreover, іп some 
earlier factor analytic research cited above, 
Kline (1967) found that obsessional char- 
acter traits measured by both the Sandler- 
Hazari Scale of obsessional traits and symp- 
toms and the Beloff Anal Test did not load 
highly on a social-introversion factor, whereas 
the Sandler-Hazari measure of obsessional 
symptoms, with a loading of .512, did. 

It is interesting that when measured by the 
Hysteroid-Obsessoid Questionnaire, obsessive— 
compulsive personality appears positively re- 
lated to introversion, but that when measured 
with other instruments like the Lazare-Kler- 
man Trait Scales, it seems independent of the 
introversion-extraversion personality dimen- 
sion. 


Obsessive-Compulsive Personality and 
Response to Experimental Tasks 


A number of studies have attempted to 
discover whether obsessive-compulsive per- 
sonalities respond in experimental situations 
in ways congruent with predictions made 
from theory and clinical observation. For 
example, is there empirical evidence for the 
notion that obsessive individuals crave order 
and structure and strive to be methodical and 
efficient in their actions? 

B. G. Rosenberg (1953) compared the 
performance of psychotherapy patients with 
pronounced obsessive-compulsive tendencies 
with a normal control group on a visual mem- 
ory task that involved choosing from a mul- 
tiple-choice format the ambiguous design 
previously seen in tachistoscopic presentation. 
The alternative choices varied in terms of 
degree of symmetry, and as expected, he 
found that obsessive-compulsive subjects 
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tended to favor the more symmetrical choices, 
This was interpreted as reflecting a particular 
need to impose order, uniformity, and con- 
gruity on visual perception. It is an interest- 
ing finding that has never been directly 
cross-validated. Studies of a somewhat re- 
lated nature, however, that used measures 
such as the Breskin nonverbal test of rigidity 
to explore correlates of perceptual rigidity 
have been reported (e.g., Breskin, 1968; Bres- 
kin, Gorman, & Hochman, 1970; Primavera, 
Simon, & Hochman, 1974). 

In a more recent study that used a sample 
of 40 college men, Rosenwald (1972) found 
significant relationships between a measure 
of anxieties reflective of anal concerns and 
time spent bringing order to a disorderly situ- 
ation, that is, straightening up of a pile of 
scattered magazines, No relationship, how- 
ever, was found between time spent reorganiz- 
ing the magazines and another questionnaire 
measure that specifically concerned anxiety 
about dealing with dirt, No relationship 
could be found, as well, between time spent on 
the reorganization task and efficiency in 
identifying geometric forms with their hands 
while the objects were immersed in a dirty, 
malodorous, feces-like medium. Poor or in- 
efficient performance on this latter task was 
interpreted as indicative of weak defenses 
against anal impulses. 

There are a few studies to date that have 
investigated differences in degree of anal ori- 
entation as a function of occupational choice. 
Would, for example, more obsessional indi- 
viduals be found in occupational pursuits that 
emphasize being orderly and methodical? 
Using a Q-sort technique, Weinstein (1953) 
found, as predicted, that engineering students 
were more anal retentive than law or socia 
work students. Using projective measures like 
the Rorschach test and the Bender-Gestalt 
test, Segal (1961) found partial support for 
his hypothesis that accounting students wou 
demonstrate a greater degree of anal orienta 
tion than would creative writing students. Hf 
found, as expected, that the accounting 2700 
was less tolerant of ambiguity, more те 
strained in expression of hostile affect, 27 
generally more emotionally controlled. 9 
differences were found between the group 
though, in their use of compulsive defense 
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and conformity to social expectations. Con- 
sistent with her predictions, Schlesinger 
(1963) found that accounting and engineer- 
ing students exhibited a significantly greater 
anal orientation than did educational psy- 
chology students. White (1963), in a study of 
experienced clerical bank personnel, found an 
intense dislike of dirt and disorder not found 
to the same degree in a control group sample. 

Although these studies of vocational activ- 
ity in the main support psychoanalytic the- 
orizing and clinical description insofar as 
they suggest positive relationships between 
obsessional characteristics and involvement in 
the more compulsive kinds of vocational pur- 
suits, the findings would be strengthened if 
comparable results were to be found in stud- 
ies that include only individuals actually 
practicing in different professions. With the 
exception of the study by White, findings 
have been based solely on responses of stu- 
dents and not practitioners. 

A few studies have explored relationships 
between anal characteristics and verbal recall 
ability, a dependent variable that appears to 
bear some relation to both the presumed 
orderliness and retentiveness of obsessive— 
compulsive individuals. Adelson and Redmond 
(1958) hypothesized that anal retentive in- 
dividuals should possess methodical and effi- 
cient ways to process and retain information. 
Using the performance of a sample of 61 col- 
lege women on the Blacky Pictures (Blum, 
1949), they found, as predicted, that anal 
retentive subjects showed significantly greater 
ability for immediate and delayed recall of 
prose passages than did anal expulsive sub- 
jects or subjects classified as neutral. 

Pedersen and Marlowe (1950), in an earlier 
study, were unable to duplicate these findings 
Using a male sample, and Marcus (1963), 
using the performance of female college stu- 
dents on the Blacky Pictures, found anal 
tetentives to be superior to anal expulsives 
only with delayed as opposed to immediate 
recall of words. D. F. Fisher and Keen (1972) 
used the Blacky Pictures with a group of 
army men and found no significant relation- 
Ships between the anality measures of re- 
tentiveness and expulsiveness and measures 
of recall of verbal material. As S. Fisher and 
Greenberg (1977) pointed out, results are 
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mixed, with findings more congruent with 
theory in studies that used female subjects. 

Several studies have investigated the pre- 
sumed tendency of obsessive-compulsive per- 
sonalities to be obstinate and negativistic. In 
a series of studies, it was found that, unlike 
“oral” subjects, “anal” subjects were ex- 
tremely difficult to verbally condition using 
positive reinforcement (Cooperman & Child, 
1971; Noblin, Timmons, & Kael, 1966; Tim- 
mons & Noblin, 1963). In a similar kind of 
investigation, Tribich and Messer (1974) 
found that oral subjects reported an amount 
of light movement similar to that reported by 
an authority figure when viewing the auto- 
kinetic phenomenon in their presence, where- 
as anal subjects tended to respond in opposi- 
tion to the authority. In all four of the 
experimental studies that used the Blacky Pic- 
tures as a measure of orality and anality and 
that used college students as subjects, results 
were consistent with theory and the clinical 
impression that anal characters are often 
negativistic and resistant to control from 
authority figures. 

The results of other studies suggest positive 
relationships between measures of anal char- 
acter traits and such predicted oppositional 
behavioral features as nonacquiescent re- 
sponse set (Couch & Keniston, 1960), intense 
dislike for a task engaged in under conditions 
of forced compliance (Bishop, 1967), and 
resistance to attitude change (Rosenwald, 
1972). 

In obsessional personalities the trait of 
obstinacy is often linked with rigid, defiant, 
and hostile attitudes. In fact, the presumed 
rigidity of the obsessive-compulsive is thought 
to manifest itself in most, if not all, aspects 
of the individual’s behavior, for example, in 
thought processes, perceptual style, verbal 
expression, motor activity, and so on (e.g., 
Reich, 1949; Shapiro, 1965). Authoritarian- 
ism has traditionally been linked in person- 
ality research literature to variables such as 
rigidity, low tolerance for ambiguity, and 
aggressive and hostile attitudes (e.g., Adorno, 
Frenkel-Brunswick, Levinson, & Sanford, 
1950). Three studies that related obsessive— 
compulsive characteristics to rigid authori- 
tarian beliefs and attitudes found positive 
relationships between the two, as might be 
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expected (Centers, 1969; Farber, 1955; Rog- 
ers & Wright, 1975). Farber found a correla- 
tion of .37 (р < 01) between the traits of 
orderliness, frugality, and obstinacy and what 
he termed political attitudes of aggressive 
conventionality (which emphasize strong 
antipathy to communism) in a fairly large 
sample of male and female college students 
(N — 130). Although Rabinowitz (1957) 
could not replicate Farber's results using a 
similar sample, some years later Centers 
(1969) found a relationship of comparable 
magnitude between a measure of anality and 
hard-line conservative attitudes on such issues 
as welfare and law and order in a cross-sec- 
tional sample of over 500 adults. Along simi- 
lar lines, Rogers and Wright (1975) reported 
a sizable correlation of .62 (р < .01) between 
authoritarianism as measured by the Cali- 
fornia F Scale and obsessive-compulsiveness 
as measured by the Psychasthenia scale of 
the MMPI in a mixed sample of 38 under- 
graduates, 

The character trait of obstinacy in ob- 
sessive-compulsive personalities has also been 
associated with conflictual and ambivalent 
attitudes toward the recognition and expres- 
sion of hostile feelings. There are only a few 
studies that have experimentally explored 
this issue. In the study by Segal (1961), 
alluded to earlier, the presumably more anal 
student accountant group demonstrated more 
restraint in the expression of hostile affect on 
projective measures than did a group of cre- 
ative writing students. Gordon (1966, 1967) 
related the nature and types of psychological 
interpretations made by clinical psychologists 
and clinical psychology trainees to the person- 
ality dimension of anality as measured by the 
Grygier Anality Scales (Grygier, 1956). Con- 
sistent with the point of view of Fenichel 
(1945) and Schafer (1954) that anal char- 
acter types often lack confidence, are inde- 
cisive, sometimes demonstrate strong reaction 
formations against hostility, and tend toward 
generalization in their thinking, she found 
that high-anal clinicians had less confidence in 
their interpretations, made fewer specific 
predictions, and identified less psychopa- 
thology in the case and test material pre- 

sented to them than did clinician groups des- 
ignated as low anal or neutral on the basis 
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of performance on the Grygier Anality Scales, 
In the aforementioned study by Rosenwald 
(1972), which employed both questionnaire 
and behavioral measures of anality, some evi- 
dence was found to support predicted rela- 
tionships between anal character orientation 
and anxiety-ridden, conflictual, and ambiva- 
lent attitudes toward hostile and aggressive 
feelings and actions, as measured by question- 
naires and responses to specific action-oriented 
tasks. The results overall were mixed, how- 
ever, in relating the measures of anality to the 
measures of hostility and aggression. In sup- 
port of theory, though, Rosenwald found that 
subjects high on two questionnaire measures 
of anal anxiety demonstrated more anxiety 
and were slower in performing a doll-destruc- 
tion task than were subjects low on the 
anality measures. 

With regard specifically to the anal fea- 
ture of indecisiveness that was explored in 
Gordon’s research, Rosenwald, Mendelsohn, 
Fontana, and Portz (1966) compared the 
performance of male college students on а 
task that involved identification of geometric 
forms with the hands under two conditions. 
In one condition, hands were placed in a 
feces-like medium; in the other, the geo- 
metric forms were handled in water. Ineffi- 
cient or blocked performance under the 
more unpleasant conditions was construed as 
indicative of anally linked anxieties and was 
found to be positively related to indecisive- 
ness, as defined by the amount of time sub- 
jects thought they needed to make the neces- 
sary perceptual estimates. 

Empirical support for the presumed anal 
character trait of parsimony was sought in à 
few studies (Lerner, 1961; Noblin, 1962; б: 
M. Rapaport, 1955; Rosenwald, 1972). Ac- 
cording to classical analytic formulations, 
money is equated in the unconscious with the 
excretory product, and activities that involve 
hoarding, collecting, and preserving object» 
especially those symbolic of the excretory 
product and anal function, become paramount 
in the behavioral style of the anal character, 
as a sublimation of the infantile wish to hol 
and retain feces. 

In the study by Noblin, 60 hospitalized 
psychiatric patients placed into anal or 08 
groupings on the basis of responses tO the 
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Blacky Pictures and on the basis of psycho- 

* analytically formulated diagnoses were differ- 
entially rewarded by food or pennies in a 
verbal conditioning paradigm aimed at in- 
creasing the use of personal pronouns in con- 
structing sentences. Anal subjects were found 
to be best motivated by the money rein- 
forcer, whereas oral subjects were differen- 
tially responsive to the food reinforcer, as 
theory would predict. Using a sample of 
teenage boys (У = 30), Lerner (1961) re- 
ported that serious stamp collectors were 
either significantly more sensitive or selec- 
tively insensitive to anally tinged words than 
to neutral words presented on an audiotape 
in comparison to a matched group of boys 
with no collecting interests. Rosenwald (1972) 
found some evidence that degree of anal ori- 
entation was related to the wagering of less 
money in a gambling situation, but the rela- 
tionship reported was only found with one 
of two questionnaire measures of anality and 
was not shown when the hand-immersion be- 
havioral efficiency task, alluded to above, was 
employed as an index of anality. In an earlier 
study, Rapaport (1955) failed to find sig- 
nificant relationships between anality and 
degree of concern with money on the The- 
matic Apperception Test. 

The character trait of parsimony should 
naturally carry over and be reflected in how 
the obsessive-compulsive individual ap- 
proaches and manages time. Some early ana- 
lytic formulations (e.g., Jones, 1918/1961) 
Postulated that time also is an unconscious 
equivalent of the fecal product. Therefore, by 
virtue of their predominant anal fixations, 
obsessional personalities are considered to be 
Particularly sensitive about wasting time and 
having to spend time against their will; they 
Insist on being masters of their own time. 

Three studies to date have investigated 
relationships between anal character traits 
and attitudes toward time (Campos, 1966; 
que & Katz, 1971; Pettit, 1969). Campos 
me › п а sample of 100 male undergradu- 
i aUe Positive relationship between anal 
etentive traits and the tendency to overesti- 
isi intervals of time. This finding is con- 
ACER With the view that time is overesti- 
m precisely because it is something of 

Pecial value to be retained if at all possible. 


237 


Also reported was a positive relationship be- 
tween degree of anality and the tendency to 
use time in a niggardly, thrifty, and cautious 
manner, In the study by Pettit, sizable posi- 
tive correlations (.5 to .65; p< .001) were 
found between the Grygier Anality Scales 
(Grygier, 1956) and a composite anality 
scale devised for the study and a time ques- 
tionnaire designed to measure the importance 
of time to the individual in ordering and con- 
trolling experience. Subjects used were 91 
undergraduates. No significant sex differences 
were found. Gorman and Katz (1971) sought 
to replicate and extend Pettit’s findings, using 
another sample of undergraduates (N = 110). 
They administered Pettit's time scale and 
composite anality scale, and in addition sub- 
jects completed four time scales that measure 
various time attitudes (Calabresi & Cohen, 
1968). The time-anxiety scale measures un- 
comfortable feelings and thoughts about the 
future and a frustrated longing for the past. 
The time-submissiveness measure reflects 
dutiful and conforming attitudes toward time. 
Analysis of the data confirmed Pettit’s find- 
ings of a strong relationship between anality 
and time attitudes, but only for specific as- 
pects of time attitudes. Significant relation- 
ships were found between all time measures 
and the anality measure, except for a non- 
significant and quite low correlation (r = .14) 
between anality and time possessiveness. A 
correlation of greater magnitude was ex- 
pected, given the presumed strong retentive 
orientation of obsessive individuals. The 
overall pattern of the results did suggest that 
the constellations of time attitudes and anal 
character traits might be more fruitfully 
conceptualized under the rubric of rigidity or 
obsessive-compulsive character style, a cate- 
gory which does not reflect psychopathologi- 
cal behavior, but simply reflects a distinct 


personality style. 


Directions for Further Research 


More study of the perceptual correlates of 
obsessive-compulsive personality is urged. 
Although there have been some intriguing 
findings (e.g., В. С. Rosenberg, 1953), very 
little research has been conducted’ in this 
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area to date. What, for example, would the 
relationship be between obsessive-compulsive 
personality and the field dependence-inde- 
pendence construct of Witkin and his associ- 
ates (Witkin et al., 1954). 

More research on the relationship of ob- 
sessive-compulsive personality to measures 
of aesthetic sensitivity and indices of creative 
thinking in verbal and nonverbal mediums is 
also recommended. 

The particular kinds of meanings ascribed 
by obsessive-compulsive personalities to criti- 
cal life situations and tasks, for example, 
vocational choice, marriage, death and dying, 
and so on, would also be an interesting area 
to explore. Is there, for example, a difference 
in the way obsessional personalities concep- 
tualize or personify death and dying, as op- 
posed to the perceptions of individuals consid- 
erably less obsessional in their orientation to 
life or of individuals with fundamentally dif- 
ferent personality styles, for instance, those 
designated as oral, hysterical, impulsive, and 
so forth? If differences are found, are they, in 
fact, consistent with clinical observation and 
predictions from theory? A study of meaning 
would begin to focus more on how obsessional 
personalities experience the world and might 
produce empirical data that would help to 
better evaluate the existential, phenomeno- 
logical points of view on obsessional person- 
ality style (e.g., Becker, 1974; M. H. Miller 
& Chotlos, 1960; Strauss, 1966). 


Conclusions and Summary 


One of the difficulties in attempting to 
evaluate the findings of the empirically based 
research is that a diverse number of specific 
measures and types of measurement ap- 
proaches have been employed. Many of the 
indices of anality and obsessive-compulsive 
characteristics that have been used possess 
questionable psychometric adequacy. For ex- 
ample, for a time, particularly in the 1950s 
and 1960s, the Blacky Pictures (Blum, 1949) 
was often used as a measure of anality; but 
the Blacky Pictures is a projective test that 
carries with it all of the problems of stan- 
dardization, reliability, and validity that have 
come to be associated with the use of pro- 
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jective tests in personality assessment and 
research (e.g., Anastasi, 1968). | 

In addition, there have been few attempts 
to cross-validate some of the more intriguing 
findings, and some of the attempts to replicate 
have resulted in inconsistent findings, More- 
over, only a few researchers have devoted 
themselves to repeated and in-depth treat- 
ment of a particular area of study in an 
attempt to refine measurement techniques and 
more closely study relationships suggested by: 
theory or prior experimentation. On the basis 
of this review of the empirical literature on 
obsessive-compulsive personality, the follow- 
ing conclusions appear to be warranted. 

1. Obsessive-compulsive personality, as à 
cluster of traits, appears to possess consider- 
able empirical validity and to fairly closely 
adhere to clinical descriptions and predic 
tions. This is true despite the fact that an 
array of measurement approaches and specific 
measurement instruments have been employed 
in an attempt to correlate measures of anality 
with various behavioral indices. 

2. Obsessive-compulsive personality can be 
statistically differentiated from obsessive- 
compulsive symptomatology through factor 
analysis. Е 

3. In most instances obsessive-compulsivé 
personality does not appear to be positively 
related to measures of neuroticism, whereas 
obsessional symptoms do; however, findings 
are somewhat inconsistent on this issue. 

4. Obsessive-compulsive personality may be 
independent of an introversion-extraversion 
classification scheme, but more study W! 
diverse measures of obsessive-compulsive Pe 
sonality is needed to help clarify this issue. 

5. Obsessive-compulsive traits appear to 
normally distributed. ; 

6. There appears to be little evidence An 
favor of classical psychoanalytic theories 
about the psychogenesis of obsessive-co™ 
pulsive personality. Overall, there is yet E 
strong empirical support for any etiologic a 
planation. There are, though, suggestions i 
clinical observation and in some statistical) 
based research that obsessive-compulsive 20 
dividuals often are the progeny of obsessiva 
compulsive parents and that obsessional 0^. 
anal character structure develops, at leas i i 
part, from early learning, but not пес 
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exclusively out of the toilet training period 
per se. Rigid, compulsive parenting, however, 
may very well be maximal before and during 
the time that sphincter control is still in the 
process of development. 
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Ridge Regression: 


Bonanza or Beguilement? | 
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University of Alberta, Edmonton, Canada 


Ridge regression is an intriguing new toy for statistical estimation theory. But it 
is just that—a £oy which may someday evolve into a useful if limited tool but is 
still too fragile to do real work. Specifically, ridge regression can indeed improve 
upon the accuracy of traditional estimates of regression parameters if background 


circumstances are right. But if they are not right—and how to diagnose this 
remains obscure—ridge regression incurs a loss of estimational accuracy. 


Estimation of regression coefficients by 
ridge regression (Price, 1977) as a way to 
reduce the very large sampling uncertainty 
that invests classically estimated regression 
coefficients when the estimation sample’s 
predictor distribution approaches multicol- 
linearity is an exciting new methodological 
prospect. But exciting new prospects have a 
regrettable proclivity for promise without 
fulfillment—which is to say, more gently, that 
exploration usually comes up empty despite 
our need to keep looking if we are ever to find 
any new goodies at all—and ridge regression 
is no great exception to this rule. As will be 
acknowledged, the concept behind ridge 
regression has some merit; but its application 
to practical problems of predictor multicol- 
linearity is both conceptually misleading and 
of uncertain practical value. 


Preliminaries 


It is misleading to characterize ridge regres- 
sion as a partial solution to predictor multi- 
collinearity for the simple reason that the 
degree to which the accuracy of ridge regres- 
sion may or may not improve upon that of the 
classical procedure (called ordinary least 
squares or OLS by Price) for estimating pre- 
dictor coefficients has nothing to do with the 
Severity of predictor intercorrelations and is 
in principle equally helpful—or unhelpful— 
when the predictors are fully independent of 
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one another. This is because (a) our predictor 
variables X = (x1, ..., Xm) can always be rof 
tated into an orthogonal basis F = (fi, . - +, fu) 
for X space by a linear transformation 
= XW, in which W is an m X m matrix of 
rotation coefficients; (b) the column vectors 
bx and bp of the criterion variable's true 
regression coefficients on X and F, respectively; 
stand in relation by = Wbp; (c) almost am 


regression extension in particular can 
construed as a method that first estimates В ] 
as by and then converts the latter into its 
estimate Бх of bx by transformation | 
= Wb; and (d) for OLS and ridge regression 
when fi, ..., fm are orthogonal, the expected 
squared error Ехр[ (бх: — bx:)?] of the itk 
term of bx is a weighted sum of the expected! 
squared errors of бул, ..., bm, the weight OF 
Exp[ (бе — br;)?] being the square of the 
ijth element of W.! Point d explains wh 
predictor near-multicollinearity can ravage 
the accuracy of br though I defer details until 


1I have slid quickly by a number of technical pon 
here, For one, we assume that none of the predictors 
is exactly a linear function of the others in the samp" 
albeit this degeneracy can be approached as closely. d 
we please. The sampling model envisioned here 1 
later is the one with fixed predictor covariances, thal 1 i 
we consider the sampling distribution of b Ш Е. 
repeated sampling with sample size and the predi ii 
covariances held constant at the values found in 6 А 
actual sample. And when ridge regression's Ка 8 
parameter is taken to be а sample-dependent rane a 
variable rather than a constant, it is possible for P 
d to be true of ridge regression only approximat 1 
depending on the details of K's selection. 
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later. It also shows that the expected squared 
' error of bx; can be less for ridge regression than 
for OLS only when this is also true for some 
of the бр; for orthogonal predictors fi, . --, fm 


The Nature of Ridge Regression 


Estimating the simultaneous regression of 
criterion variable y on several orthogonal 
, predictors is equivalent, that is, yields the 
same numerical results, for both OLS and 
ridge regression to estimating уз regression 
on each predictor separately. Let us, therefore, 
analyze the nature of ridge regression for a 
single predictor x. 

According to the standard sampling-theoretic 
regression model, y = a + bx +e, in which 
а and b are scalar constants (a being of no 
further interest here) and e is a residual 
variable, statistically independent of x, whose 
mean and variance under repeated sampling 
are respectively zero and an unknown quantity 
са, Let o, суг, and Cez be, respectively, the 
variance of x, the covariance of y with «, and 
the covariance of e with x in the observed 
Size-N sample from which we hope to estimate 
b. (We assume e> 0; otherwise b is not 
well-defined.) Then с, = bor + Cez; while 
given a fixed N and c, (see Footnote 1) and 
iy random sampling on e, it is easy to prove 

at 


Exp(c;) = 0, Ехр(с,.) = bo, 
апа 
Ехр(сг“) = Мага. 


For any estimate б of b, let S(b) be the expected 
Squared error of b, that is, 


S(b) === ExpL (6 — b), 


while henceforth b and б are specifically #58 
OLS and ridge-regression estimates, respec- 
tively. Classically, OLS estimate b is defined 
to minimize S(b) under the constraint that b 
n Unbiased, leading to the computational 
ormula b = c,,/s.?. It follows that 


b = (bo + с,)/02 = b + clos 


Whence 


Exp (b-5 
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and 


S(b) === Expl (6 — 5)2) 
= Exp(c.2)/o2 = се/ Мог, 


which says inter alia that b is unbiased and 
that its expected squared error is independent 
of the parameter being estimated. S(b) is 
not, however, independent of the predictor 
variables variance—nor can it be for any 
estimate of b, insomuch as b itself is dependent 
on с,2. For if x is linearly rescaled to shift its 
standard deviation from oz to 0+ = Soz, b 
is correspondingly rescaled to b* = 57, 
leaving bs (the amount of y variance 
linearly accounted for by x) invariant. But 
as b is multiplicatively adjusted by scaling 
factors, so of necessity is any estimate b of b 
together with its expected squared error. In 
particular, S(b) can be made arbitrarily large 
by making c; sufficiently small. The ratio of 
(b) to 0°, however, is generally independent 
of the predictor scale. For OLS estimation in 
particular, 


S(6)/8 = o2/Nb 02 = (1 — руг)/ Круг 


where p, is the correlation between y and x 
in the population sampled? and is approxi- 
mated (with a positive bias that vanishes with 
increasing N) by the sample correlation. _ 
Ridge regression seemingly aspires to im- 
prove upon OLS estimation by correcting for 
the very large S(b) that accompanies very 
small c. Specifically, the ridge-regression 
estimate of b is b = суг/ (0:2 + k), in which & 
is a small positive quantity whose numerical 
value is the task of ridge-regression theory to 
provide. (More generally, for multiple pre- 


2 More precisely, this is true of orthogonal predictors' 
OLS-estimated regression coefficients, The estimated 
uncertainty associated with the estimate of each bxi 
is a mildly increasing function of the number of addi- 
tional predictors, insomuch as estimating coefficients 
also for the latter incurs à degrees-of-freedom loss 


from the total sample size. у 
з More precisely, pyz is the correlation between y and 


x in a very large population that randomly samples € 
while having the same predictor variance as the ob- 
served sample (see Footnote 1). Also, when x is only 
one of m orthogonal predictors, pyz is not the zero-order 
correlation between y and x, but is their partial corre- 
lation after the other predictors have been partialed 


out. 


244 


dictors, ridge regression alters some or all roots 
of the predictor covariance matrix by small 
increments that in the method's simplest 
version are the same for all roots.) However, 
we have just noted that o/ can be set at any 
stipulated positive value by choice of 
predictor scale; hence if S(b) can be less than 
S(b) for some k when o} is very small, the 
same degree of improvement must be possible 
when c;? is arbitrarily large. 

The essential character of ridge regression, 
and its relation to OLS estimation, is evident 
in the relation 


b- буг! (02° + k) 
= (¢,2/02)[o2/(o2 + #)] = bh, (1) 


where л аг (1+ k/o)! and h<1 if 
k> 0. Thus б is simply an attenuation of b 
by a to-be-selected shrinkage factor h (cf. 
Mayer & Wilke, 1973). Since 


b-—b=bh—b = (b—b)h-- b(— 1), 
or 
(b — by = (b — Буле + 2(b — b)bh(h — 1) 
+000 – 1), 
the expected squared error of b conditional 


on a fixed (ie., sample-independent) value 
of h is 


Sim] = 500) + £k — 1 
= SONEA = 1)? — ho hd, 0) 


where 


ho —aer [1 + S(6)/b 
gibst by2)/ Np, 2] 
= Мр / (У — 1)py? + 1]. (3) 


The choice of A that minimizes 55] is 
thus Æ = ho, given which the inaccuracy of b 
compared with that of OLS estimate b is 
510 1/5 (b) = ho. Even if / is chosen non- 
optimally, moreover, S[5(] is still less than 
S(b) if h differs from ho by less than 1 — ho. 
Hence if there is a practical way to choose h 
in this interval with suitably high probability, 
ridge regression will indeed be more accurate 
than OLS estimation even if the improvement 
cannot amount to much unless fo is appreciably 
less than unity. 

Before considering whether this prospect 
can be realized, however, observe that nothing 
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in Equations 1, 2, and 3, save replacement off 
S(b)/b? by its analysis in terms of pyz, is” 
specific to estimation of regression coefficients, 
The term b can be any unbiased estimator 
from which adjusted estimator b = bh is 
obtained by an attenuation factor A. If this 
correction procedure can be made to work for 
estimation of regression coefficients, then we 
must anticipate that it may also be workable 
for many other classic unbiased estimators, 
starting with the sample mean as an unbiased. 
estimate of the population mean. Either the 
logic of ridge regression calls for wholesale 
reappraisal of all established statistical esti- 
mation procedures or there is something 
impractically fragile about its applicability. 

There is good reason to suspect the latter 
For, any procedure that endeavors to select 
a value of // as close to Ло as possible is in 
effect a procedure for estimating Ло; and since 
Ло is so importantly a function of b, this is 
tantamount to deriving ho from an estimate 
of b, the very parameter that is at issue in the 
first place. Although ridge regression’s logic 
is thus circular, the circle is not necessarily 
vicious. Conceivably an iterative procedure, 
in which Мо is estimated from a prior estimate 
of b (more precisely of S(b)/b?) and the latter 
is then revised in light of ho, may bring off 
Convergence to a more accurate estimate of b 
than the one with which the iteration begins. 
But success at this will clearly be a delicate 
business, critically dependent on both the 
sampling details of the particular parameters 
at issue and an astute choice of input to the 
iteration, р 

Equation 2 does not properly characterize 
any version of ridge regression that selects 
in light of sample data, insomuch as / now 
has a sampling distribution that is not 1M- 
dependent of b. Instead, by some routine 
algebra applied to the analysis of (b =) 
noted prior to Equation 2, we find that 


50) = SOU = hg | 
X Expl (k — ог] + hap tO O 


| 


where | 
С === cov (bè, h?) | 
--2bcov[b,h(h — 9ђ — - 


while %o is still defined by Equation 3 and 
designates sampling covariance. The qua? 


cov 
tity 
| 
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ExpL(h — ho)*] is just the expected squared 
error S(ho) of й construed as an estimate ho 
of ho, so Equation 4 can be rewritten as 


50/50) 
= ho + (1 — №) 180) + C/S). (5) 


Terms S(ho) and C/S(b) in Equation 5 can be 
analyzed further, but there is little present 
point in doing so except to claim without proof 


| that all terms in Equation 5 are nonnegative 


unless ру? is less than V^, in which case it is 
possible for C/S (b) to assume a small negative 
value. The present import of Equation 5 is 
simply that ridge regression can degrade 
estimational accuracy as well as enhance it. 
For regardless of how tiny Ло ma be, S(b) 
will be larger than S(b) unless S(ho) is suffi- 
ciently small. Since the sampling distribution 
of h (=ho) under any specific method H for 
generating k from the sample data will be 
dependent on № and р,:?, one must anticipate 
that even if (b) is less than S(b) under some 
values of these parameters, the superiority 
order will be reversed in other regions of 
(N, p). But if ridge-regression variant H 
does not dominate OLS over the entirety of 
parameter space—and it appears exceedingly 
unlikely that any choice of H can—then it is 
irresponsible to advocate and foolish to use H 
in preference to OLS unless we know what the 
regions of (N, p,.2) space are in which 5(0)/ 
S(b) is, respectively, appreciably less and 
appreciably greater than unity, as well as how 
large the difference from unity in each region 
tends to be. Only then will we be positioned 
to make rational judgments about which 
method, OLS or ridge-regression variant H, 
15 most plausibly the more efficient for the 
Particular application at hand. (In principle, 
choice of an estimator should be a complex 
Process that involves prior credibilities and 
decision-theoretic utilities as well as sampling 
Probabilities. But determining the relevant 
Sampling distributions conditional on the 
relevant parameters is not merely an essential 
step in the decision process; it is the one step 
that we can actually execute in practice 
Without flagrant appeal to vague and probably 
idiosyncratic intuitions.) 

Insomuch as any choice of shrinkage 
Parameter 7; (=) may be taken via Equation 

to define an estimate fy: of correlation 
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parameter py; (ie. one replaces ho by A, 
Pu by буг, and solves for the latter), whereas 
conversely any estimate of py: can be con- 
verted by Equation 3 into an estimate of ho, 
each variant H of single-predictor ridge re- 
gression is grounded at least implicitly on some 
technique for estimating p,.” without benefit 
of a prior ridge-regression estimate of 5. 
(Even when ridge regression seeks to refine 
its solution by iteratively alternating between 
estimation of pyx and of b, there must always 
be a starting fyz) It is not clear that there 
are many cogent ways to do this; in fact, there 
is probably only one that can be considered 
operationally preferable at this time, namely, 
the classic bias-corrected OLS estimate 


(Wherry formula) 


1— (VN — (У — 2)7(1 — rZ) 


Буг = det if greater than 0, 


0 otherwise, 


in which ryz is the observed sample correlation.* 
[Note that when руг # 0, Буг“ is always less 
than ry? and becomes zero when ry; 
«(N —1)3.] Let 6* be the b estimate 
computed by the variant Н * of ridge regression 
that derives # via Equation 3 from руг? estimate 
Буг. I now submit as a working hypothesis 
that under almost any parameter point 
(N, py2), it S(b* > 50), then also S(b) 
> SÈ) for any variant of ridge regression 
other than И“. That is, I suggest that the 
region of parameter space in which some 
variant of ridge regression is superior to OLS 
is roughly included in the region of superiority 
for ridge-regression variant H*. (Arguments 
can be given to support this conjecture, but 
since they are inconclusive I forgo them here.) 
ТЕ so, study of b*s sampling behavior will 
reveal the conditions under which ridge re- 
gression can improve upon OLS. à 
Although it does not seem possible to 
derive an analytically exact value for S (6*)/ 
S(b) given (N, py), this quantity can be 
determined closely enough by Monte Carlo 
simulation. Table 1 reports such results for a 
rather broad spectrum of (N, py:) values, 


4 When х is only one of m orthogonal predictors, fyz 
is the sample’s partial correlation between x and y (see 
Footnote 3), and М — 2 in this formula for Byz be- 
comes N — m — 1. 
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Table 1 
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Monte Carlo Approximations to Ridge-Regression Inefficiency Ratio S[b*]/S[b] as a Function 


of Sample Parameters N and р? 


е 


N 
Ухт 10 20 50 100 500 
.35 (67 .35 (169) .36 (.68) 34 (69) 
Е p Ge ‘54 ud ‘52 (60) “55 (59) :52 (.60) 
‘6 74 (54) 172 (55) 70 (56) 70 (56) 70 (50) 
‘9 88 (.46) ‘85 (46) 183 (50) ‘87 (52) :86 (51) 
12 1.03 (42) 98 (42) :96 (42) ‘97 (45) 2 as 
15 1.16 (33) 1.08 (.37) 1.04 (.38) 1.09 (.40) 107 (40 
18 121 (28) 1.23 (33) 121 (35) 1.11 (34) 115 (39) 
21 1.33 (26) 1.18 (26) 1.19 (30) 1.22 (32) 121 (39 
24 1.38 (19) 1.26 (.24) 1.31 (28) 1.19 (28) 1.28 (26) 
21 1.44 (18) 1.43 (20) 1.37 (25) 1.39 (25) L36 (20 
3.0 142 (13) 140 (21) 1.36 (.18) 141 (22) ги (26) 
4.0 1.47 (07) 1.49 (12) 1.44 (15) 1.43 (16) 1.33 (19) 
5.0 1.37 (02) 1.55 (06) 149 (10) 1.54 (.10) 150 (10) 
6.0 1.28 (00) 1.36 (.02) 1.42 (05) 1.48 (.06) 1.82 (0) 
то 112 (— 1.38 (01) 142 (03) 141 (04) 148 (09) 
8.0 1.08 (—) 1.24 (00) 140 (02) 1.37 (02) газ (00) 
10.0 1.19 (— 1.34 (01) 1.37 (01) 1.37 (02) 
12.0 1.10 (=) 1.24 (00) 1.27 (.00) 136 (01) 
16.0 103 (~ 114 (~ 118 (— 1.20 (00) 
20.0 1.11 (~) 143 (—) 147 23 
30.0 1.04 ( 1.08 (—) 140 (=) 
40.0 1.02 (—) 1.04 (—) 1.06 (—) 


Note. Approximate probabilities that the sample-computed estimate $? of p? is zero are in рое 
tabled entry is the mean of (bi* — bi)? divided by the mean of (b; — b;)?, іп a set of 1,000 sano p 
size ЈУ and stipulated population parameter p?, Each P; was obtained by the following algorithm: (а) Сеп. js 
N-termed vectors x; and e; by elementwise independent production from a random generator Mi i 
tribution is unit normal. (b) Compute the variance V; of x; determine b; = (02)! by p 
р = b? Vi/ (b? V; + 1), and derive y; = bx; + e:. (c) Compute ordinary least squares estimate bi un 
ridge-regression estimate 5," of b; from sample vectors x; and yi by the appropriate formulas. Note that 


sampling model for this simulation is random criterion residuals given a fixed predictor distribution. 


together with the probability that 5,2 = 0 
at each parameter point. (Note that the 
table’s rows are indexed by the product of У 
and py: rather than by the latter alone). 
Estimator 0% is more efficient than 6 so long 
as py < N^ and is substantially so when 
Py" is closer to zero than to N—, But S(b*) 
begins to exceed S(b) when py: becomes only 
fractionally greater than N=, reaching ineffi- 
ciencies half again as large as those of OLS. 
Moreover, although 6* returns essentially to 
parity with b as рг? becomes sufficiently large, 
the range of py values over which 6* is 
appreciably inferior to 6 for any given N is 
much larger than its range of superiority. 
Table 1 makes plain that 5(6*) is less than 
S(b) for just those combinations of N and 
py? under which zero руг? has rather high 


probability. Indeed, it is precisely this rounding 
of the bias-corrected r,,.2 up to zero that mad 
b* superior to b in the p,.? < N~ region. It? 
not so clear, however, that successful ridge 
regression depends on the zero bound on bd 
Instead, what seems to be the fundatii 
nature of ridge regression, and more generi 
of any "shrinkage" adjustment of any samp 
based estimator, can be insightfully idealiza 
as follows: Given some function Ó of samp à 
data construed to estimate a scalar populato 
parameter 6, select for each sample 812 $ 
some fixed bounded interval Гу of 0 ke 
define 6 to be the sample function W 
value respectively equals the midpoint 6r a 
or the value of 6 according to whether 
latter is or is not in Zy, and consider 
6-estimation efficiency of ð versus 0 * 
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function of 6 with fixed ЈУ. It is not hard to 
see that so long as Ју is neither too wide nor 
too narrow and the sampling distributions of 
6 have reasonably orthodox properties (no- 
tably, tailing off in both directions from a 
center in the vicinity of 6), S(6)/S(6) has а 
minimum value less than unity at a value 
of @ near 6; from which, as @ increases (or 
decreases), S(8)/S(0) first increases to a 
maximum greater than unity and then sub- 
sides to an asymptotic value of unity. The 
details of this relative-inefficiency function 
are rather sensitive to the width of Гу; and 
one can seek a width that yields a useful 0 
region within which S(0)/S(6) is appreciably 
less than unity while the degree to which 5 (8) 
exceeds (0) elsewhere is not large enough 
to be a significant loss. How closely such an 
ideal choice of y can be attained in various 
particular cases I have no idea. 

The simple model I have just described can 
obviously be generalized in various ways. For 
example, more than one shrinkage interval 
can be adopted simultaneously; the width and 
placement of Zy can be allowed to depend in 
part on sample properties additional to JV ; and 
shrinkage toward 6; can be a graded function 
of the difference between 6 and 6; rather than 
all or none according to whether Ê is in Zw. 
Even so, the all-or-none model makes clear 
the rationale of this approach. In particular, 
it removes the prima facie implausibility that 
the logic of ridge regression might apply with 
equal success to improving for example the 
OLS estimate of the population mean. For 
estimation of almost any statistic 0, what the 
generic shrinkage method can achieve by 
modifying a received 6-estimator Ê is a pocket 
of increased estimational accuracy (compared 
with the accuracy of 6) in the vicinity of one 
or more stipulated 6 values 6; at the price of 
decreased efficiency when 6’s actual value lies 
elsewhere, the greatest losses occurring just 
Outside of the region(s) of gain. Evidently, 
pe occasions most appropriate for exploiting 

is technique are those in which our prior 
credibility distribution for 6 is concentrated 
In the vicinity of 6r. 


Multicollinearity Revisited 


ds my analysis above is sound, it greatly 
Clarifies ridge regression's potential value for 
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the predictor multicollinearity problem. Re- 
turning to where the section entitled Pre- 
liminaries leaves off, let fi, ..., fm be the (data 
space) “principal factors” of the X configura- 
tion, that is, f; is the ith principal component 
of the predictor distribution rescaled to have 
unit variance. Then X = КФЛ) or F 
= X(T'Di-), in which rotation matrix DT 
is the product of an orthonormal matrix T 
with the diagonal matrix D, of roots of the X 
configuration’s covariance matrix. It can be 
shown that A» € e;?(1 — Ко), in which R; 
is the multiple correlation of any x; in X with 
the other predictors and A, is the smallest 
predictor root; hence if any of the x; are highly 
predictable from the rest, \m will be very 
large. Moreover, 


Sxi) = У NS br), (6) 
= 

where bx; (bri) is either the OLS or ridge- 

regression estimate of the ith coefficient in 

y’s regression on X (F). Since 


5 
У пр 


i=l 


= Уе 1, 
= 


Equation 6 says that S(bx; is a weighted 
average of the S(br;) after these have been 
inflated (or shrunk) by the associated ^j^; nor 
is there any tendency for the weights hi? to 
countervail that inflation, insomuch as the 
average /;? across i = 1, ..., m is m=, the 
same for all j. The large uncertainty in the 
OLS estimate of bx; under predictor near- 
multicollinearity lies not in any special 
difficulty in estimating br; when f;'s associated 
root is small—for S(br;) is the same for all j, 
namely, N-c2—but in the rotation from F 
to X, and is fully as troublesome for ridge 
regression as it is for OLS estimation. 

Even so, there is a practical admonition for 
ridge regression in Equation 6. Let yr; be the 
inefficiency ratio yr; —aet 5 (6;)/5 (br), and 
for simplicity assume—as will generally obtain 
closely enough when the predictors are nearly 
multicollinear—that for a given i, ЊИМ is 
much larger for those j in a subset J; of indices 
jedem" than it is for the rest. Then 
S(bx)/S(bx;) is approximately equal to the 
average value of yr; over just the j in Ji. 
(Although this is a considerable simplification 
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of the exact relationship, it fairly characterizes 
its essential nature, especially when the yri 
have a small coefficient of variation across 
Ја) Given predictor near-multicollinearity, 
moreover, J; will almost always comprise a 
subset of those j for which А; is very small. It 
follows that if ridge regression can substan- 
tially improve the estimation of regression 
coefficients for some of the predictor distri- 
bution's principal factors and can focus its 
prowess on those with the smallest associated 
roots, it may yet manage to provide succor 
specifically for the multicollinearity problem. 

This hope is not entirely forlorn. Given 
fixed predictor scales, fixed regression соећ- 
cients, and fixed residual criterion variance, 
it is easily seen that the squared population 
correlation between y and any predictor x is an 
increasing function of c, and, conversely, 
approaches zero as с: approaches zero. More- 
over, although the principal factors in Equation 
6 have unit variances by scale stipulation, 
there is reason to suspect that very small 
predictor roots often symptomatize predictor 
configurations that, in an obscurely intuitive 
sense, allocate only diagnostically negligible 
variance to the dimensions of predictor space 
corresponding to these roots. It is impossible 
to be clear on this point without wallowing in a 
complicated, assumption-laden account of 
causal structure, scaling conventions, and the 
origins of predictor scores; but we can expect 
that some tendency exists—just how strong 
а one remains an important open question—for 
very small A; to be associated with especially 
small p,;?. And we have already seen that 
ridge regression's best chance to improve 
upon OLS is when py; is very small. Un- 
happily, even when py;? is very small, if it is 
not small enough, br; will be less accurate 
than беј, and this loss will be amplified in 
буг by мл, 

If ridge regression is to be applied to highly 
intercorrelated predictors, one clear recommen- 
dation emerges. One should ло/ routinely 
attenuate the OLS-estimated regression coeffi- 
cients specifically for the predictors’ principal 
factors (or principal components) with very 
small associated roots. Instead ridge regression 
should be applied only to those principal 
factors f; whose (partial) squared correlation 
with the criterion can plausibly be inferred 
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from the sample correlation and any other 
relevant information to be small enough for А 
S(br;) < S(br;). If the f; that so qualify also | 
correspond to very low Aj, as seems not 
unlikely but still far from certain, then that 
same degree of increased efficiency—or loss 
thereof if we have misjudged—can be passed 
along to ridge-regression estimates of the ду, | 
i 


Conclusions 


The import of my present argument is that 
we can expect to benefit from ridge regression— 
and otherwise to lose—only given special 
parametric circumstances whose presence in 
real-life applications cannot yet be reliably as- 
certained. Yet a number of Monte Carlo simu- 
lations in the ridge-regression literature (see 
especially Dempster, Schatzoff, & Wermuth, 
1977, as well as earlier studies referenced by 
Price, 1977) have shown one or more variants 
of ridge regression to be impressively superior 
to OLS under the parameters tested. However, 
the design of these simulations—assigning 
population regression weights to multiple 
predictors also given rather high degrees of | 
near-multicollinearity—does not permit easy 
comparison to Table 1; and there is no in- 
consistency between my present Monte Carlo 
conclusions and the published multivariate 
simulations if the manner in which these 
have been constructed produces parametric | 
partial correlations between the criterion and | 
the predictors’ near-zero-root principal factors i 
so small as to lie within the particular method's 
success region for the sample sizes tested. But 
if this is so—and it must be so if my preceding - 
argument is not badly flawed—then the | 
practical significance of these simulations 
remains severely problematic. They may - 
exhibit a reliable tendency in all multivariate 
data arrays, real and artificial alike, for 
small-root principal factors to have vanishingly | 
small correlations with an outside criterion: | 
(If so, ridge regression can be operationally 
recommended just as soon as we determine hoW 
smalla root is small enough.) But alternatively | 
they may well illustrate merely that artifici? 
data arrays often approach ideal simplicities | 
much more closely than do real data. Unless the | 
ways in which multivariate distributions ап 
empirically are suitably reflected in simulation 
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studies, one must be exceedingly cautious in 
j generalizing from one to the other. 

I am prepared to argue that the only 
natural circumstances in which ridge regression 
can expect a reliable association between 
minuscule predictor roots and vanishing 
criterion correlations are roughly those in 
which the predictor variables are themselves 
causal sources of the criterion. In all other 
cases, the appropriate way to deal with 
intercorrelated predictors is through infer- 
ential factor analysis. But that is a story for 
another occasion. For now, the salient sum- 
mary is that although ridge regression and 
other shrinkage techniques hold promise for a 
considerable diversity of specialized purposes, 
their employment also incurs a distinct risk of 
appreciable loss; and we still lack operationally 
effective criteria, or even a rudimentary theory 
thereof, for judging when the risk is worth 
taking. In particular, before extant variants 
of ridge regression are urged upon unwary 
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statistical consumers, it is imperative that 
we acquire some knowledge about whatever 
relation may obtain in empirical data arrays 
between predictor roots and the criterion’s 
correlations with the corresponding dimensions 
of predictor space. 

For applied statistical estimation, ridge 
regression’s day may come. But it has not 
come yet. 
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Intellectual Functioning in Duchenne Muscular Dystrophy: 
A Review 


Nicholas J. Karagan 
Department of Pediatrics 
University of Iowa 


Duchenne muscular dystrophy has traditionally been thought to be a primary 
disease of muscle, but recently it has been suggested that it may be secondary 
to a neuronal defect or to a generalized disorder of protein synthesis and mem- 
brane. However, to date there is no proof to unequivocally support any of these 
theories. A higher incidence of mental retardation and decreased intellectual 
functioning has been reported in the medical literature for Duchenne muscular 
dystrophy patients than for normals or other control groups. Recently there has 
been strong evidence to suggest that verbal ability, as reflected by the Wechsler 
Intelligence Scale for Children Verbal scale IQ, may be more commonly and 
significantly impaired in Duchenne muscular dystrophy patients than is non- 
verbal ability, as reflected by the Performance scale IQ. This article presents a 
comprehensive review of intellectual functioning in Duchenne muscular dystropy. 
The intent is to provide a basis for future attempts to relate the intellectual 
deficit in Duchenne muscular dystrophy to neuropsychological and neurobiologi- 


cal parameters of the disease. 


The muscular dystrophies are one class of 
inherited myopathies that are characterized 
by progressive muscular weakness and degen- 
eration of muscle tissue without apparent 
cause in either the peripheral or central ner- 
vous system. In this regard they are dis- 
tinguished from another class of myopathies, 
the spinal muscular atrophies, that involve 
progressive weakness and degeneration of 
muscle secondary to degeneration of anterior 
horn cells (Moosa, 1974). Duchenne muscu- 
lar dystrophy has traditionally been consid- 
ered a primary disease of muscle, but re- 
cently it has been suggested that it may be 
secondary to a neuronal defect (McComas, 
Sica, & Currie, 1970, 1971; Moosa, 1974) or 
to a generalized disorder of protein synthesis 
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and membrane (Ionasescu, 1975). Howevely 
to date there is no proof to unequivocally 
support any of these theories. 

Of the various types of muscular dystro- 
phies, Duchenne or pseudohypertrophic mus- 
cular dystrophy is by far the most common) 
as well as the most rapidly progressive. Its 
incidence is 279 per 1,000,000 male births 
(Hanson & Zellweger, 1968). Death usually: 
occurs from inanition or cardiopulmonary 
disease between 15 and 20 years of ag& 
Usually, only males are affected by the 
disease, which is X-linked recessive. Thus, 
female carriers have the theoretical probabili 
ity that 50% of their male offspring will b 
affected and that 50% of their female off 


taneous mutation (Hanson 
1968). Duchenne muscular dystrophy 40 
occur in rare instances in females, such as 1 
Turner's syndrome (sex chromosomal comple: 
ment XO) or in carriers in which the Lyon 
effect (Lyon, 1961, 1966; Murphy & Thomp. 
son, 1969) is thought to permit its СЕВР, 
expression. Diagnosis of the disease is 298 
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Performance 


Can climb stairs without aid of railing 


Can climb stairs with aid of railing with mild 


Can climb stairs exertion 


Can climb stairs with aid of railing but slowly, 


Ambulatory with labor, and cumbrously 


Can assume standing position from standard chair 
Cannot climb stairs independently 
without another 


person's assistance 


Cannot assume standing position from standard 
chair independently 


Predominantly in wheelchair but can walk 50 meters on level, usually with braces or other 
assistance 


VI 


Independent transfer activity (from wheelchair to bed, toilet, etc.) VII 


Can maintain erect sitting posture independently VIII 


Wheelchair 
existence Dependent in 


transfer activities 


(a) Cannot maintain erect sitting posture 
independently 
(b) Cannot raise arms 20 cm off arm rests IX 


Bed existence: Cannot use wheelchair 


Figure 1. Functional classification of Duchenne 
in Muscular Dystrophy Type Ша (Duchenne)” 


muscular dystrophy. (Е 
by H. Zellweger and J. W. Hanson Developmental 


rom "Psychometric Studies 


Medicine and Child Neurology, 1967, 9, 576-581. Copyright 1967 by Spastics International Medical 


Publications. Reprinted by permission.) 


on a positive family history, clinical symp- 
toms, elevated serum creatine phosphokinase, 
electromyography, and ribosomal protein syn- 
thesis (determined from a muscle biopsy). 
Figure 1 summarizes the 10 stages of the 
disease, which are based on the ambulatory 
and transfer capabilities of the patient (Zell- 
weger & Hanson, 1967). As can be seen, 
upper limb involvement progresses slowest, 
and finger dexterity is intact until the very 
latest stages, indicating that manual tasks 
can be accomplished without interference of 
the disease through Stage 5 and with minimal 
to moderate involvement through Stage 8. 
A higher incidence of mental retardation 
and decreased intellectual functioning has 
been reported in the medical literature for 
Duchenne muscular dystrophy patients than 
for normals or other control groups. The 
reason for this finding is thought to be some 
type of abnormal central nervous system func- 


tioning. Recently, there has been strong evi- 
dence to suggest that verbal ability, as re- 
flected by the Wechsler Intelligence Scale 
for Children (WISC) Verbal IQ, may be more 
commonly and significantly impaired than is 
nonverbal ability, as reflected by the WISC 
Performance IQ. 

The purpose of the present article is two- 
fold: first, to introduce a recent, comprehen- 
sive review of intellectual functioning in 
Duchenne muscular dystrophy into the psy- 
chological literature, and second, to highlight 
some of the methodological issues involved in 
the assessment of intelligence in this popula- 
tion. It is important to provide a basis for 
future attempts to relate intellectual deficit in 
Duchenne muscular dystrophy to neuropsy- 
chological and neurobiological parameters of 
the disease. I hope this approach will also 
prove useful in the study of brain-behavior 
relationships in other physical diseases that 
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have behavioral correlates, Reviewed in de- 
tail are studies since 1960 of intellectual 
functioning in Duchenne muscular dystrophy. 


Studies Prior to 1960 


A higher prevalence of mental retardation 
than is found in the general population was 
documented by numerous reports prior to 
1960 (Becker, 1953; Del Carlo Giannini & 
Marcheschi, 1959; Duchenne, 1868; Erb, 
1891; Gowers, 1879; Zellweger, 1946) that 
studied only patients with Duchenne muscu- 
lar dystrophy. Several earlier studies, how- 
ever, did not report a higher incidence of 
mental retardation, They either reported on a 
mixed group of dystrophies (Bell, 1943; Mor- 
row & Cohen, 1954; Schoelly & Fraser, 1955) 
or found the degree of intellectual impairment 
to be relatively mild (Walton & Nattrass, 
1954). These earlier studies frequently in- 
volved clinical rather than psychometric 
evaluation of the retardation. 


Studies Since 1960 


Decline in General Intelligence 


Table 1 contains a summary of studies 
since 1960 of intellectual functioning in Du- 
chenne muscular dystrophy. There is only 
one, which reported on a group of Duchenne 
patients using formal psychometric assess- 
ment, that did not suggest a higher than nor- 
mal incidence of mental retardation or a sig- 
nificantly lower than average overall IQ. 
Sherwin and McCully (1961) studied 15 chil- 
dren with Duchenne muscular dystrophy who 
ranged in age from 10 to 14 years. All were 
confined to a wheelchair. The verbal portion 
of the WISC was administered to all children. 
Mean Verbal IQ was 103 and ranged from 
90 to 120. 

All other studies have reported a lower 
than normal general or Full Scale mean 1Q 
and/or a higher than normal incidence of 
mental retardation (Allen & Rodgin, 1960; 
Black, 1973; Cohen, Molnar, & Taft, 1968; 
Dubowitz, 1965; Florek & Karolak, 1977; 
Gamstrop & Smith, 1964; Karagan & Zell- 

weger, 1978; Kozicka, Prot, & Wasilewski, 
1971; Marsh & Munsat, 1974; Michal, 1972; 
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Nakao, Kito, Muro, Tomonaga, & Mozai, | 
1968; Prosser, Murphy, & Thompson, 1969; | 
Rosman, 1970; Worden & Vignos, 1962; Zell- 
weger & Hanson, 1967; Zellweger & Nieder- 
meyer, 1965). The mean IQs ranged from 68 
to 91, with most about 1 SD below the gen- 
eral population average. Cohen et al. (1968), 
Florek and Karolak (1977) and Prosser et 
al. (1969) studied in detail the distribution 
of IQs in their samples. They concluded that 
the entire IQ distribution of the dystrophic 
children was shifted downward by 1 SD com- 
pared with the normal IQ distribution. Conse- 
quently, they reasoned that even those pa- 
tients with an average or above-average IQ 
were reduced in intellectual ability compared 
with what would be expected if they did not 
have the disease. 


Control Groups 


The study of the intellectual functioning of 
relatives and affected and unaffected siblings 
has yielded some consistent and interesting 
results (Cohen et al., 1968; Kozicka et al., 
1971; Prosser et al., 1969; Worden & Vignos, 
1962). These studies uniformly reported 
higher mean IQs for the unaffected siblings 
and relatives (e.g., parents, aunts, and un- 
cles) than for the affected siblings. Further, 
there was concordance in level of IQ between 
Duchenne patients and their affected siblings. 
The IQs of Duchenne patients also tended to 
follow the intellectual genetic history of the 
family, that is, severe retardation was founi 
in patients from dull families, whereas nor- 
mal intelligence was present in patients from 
bright families. 

Groups of Duchenne patients have also 
been compared with a variety of control 
groups in the assessment of their level of 
intellectual functioning. Groups with chrome 
or physical incapacities such as diabetes mek 
litus (Worden & Vignos, 1962), spinal mus 
cular atrophy (Florek & Karolak, 1977; 
Kozicka et al, 1971; Worden & Vigo 
1962), and postpoliomyelitis (Michal, 1972) 
all manifested average mean IQs, whereas 
Duchenne patients manifested below-average 
mean IQs. " 

These findings on relatives and other c 
trol groups are parsimonious, given the sa 
eral finding that the distribution of 10" 
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the Duchenne patients is shifted downward 
about 1 SD from the normal distribution. 


Early Impairment of Verbal Intelligence 


The conclusion gleaned thus far, that there 
isa generally decreased overall or Full Scale 
IQ in patients with Duchenne muscular dys- 
trophy, is exceedingly impressive. Several in- 
vestigators (Black, 1973; Prosser et al., 
1969: Sherwin & McCully, 1961; Worden & 
Vignos, 1962; Zellweger & Hanson, 1967; 
Zellweger & Niedermeyer, 1965) have sought 
to associate a differential pattern of impair- 
ment with a lower Verbal than Performance 
IQ in patients with muscular dystrophy, but 
were either unsuccessful or did not explain the 
finding of a depressed Verbal IQ. 

Marsh and Munsat (1974) were the first to 
document an early impairment of verbal in- 
telligence in Duchenne muscular dystrophy. 
They studied 34 boys with Duchenne muscu- 
lar dystrophy who ranged in age from 5 to 15 
years, Of the subjects, 16 were mildly physi- 
cally impaired, and 18 were moderately or 
severely disabled by the disease and required 
placement in a wheelchair, All were tested 
with the WISC. The mildly disabled group 
had a significantly lower Verbal scale IQ (M 
— 85) than Performance scale IQ (M = 98). 
However, for the moderately or severely dis- 
abled groups there was no significant dif- 
ference between their Verbal scale IQ (M — 
88) and their Performance scale IQ (M = 
90). All children in the mild group were still 
ambulatory and, with one exception, were 
less than 10 years of age. Marsh and Munsat 
concluded that verbal intelligence was de- 
pressed and nonprogressive in these cases of 
Duchenne muscular dystrophy. On the other 
hand, Performance IQ tended to decrease 
with time as the physical disability inter- 
fered with the Performance scale items. 

The authors noted that prior to their study 
there was no available published report of 
verbal and performance intelligence testing of 
dystrophic children using only the WISC. All 
Previous studies had used two or more dif- 
ferent intelligence tests to cover the age 
range in their samples. Further, in a detailed 
review, Marsh and Munsat pointed out that 
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the various tests used were not truly equiva- 
lent (Barclay & Carolan, 1966; Hannon & 
Kicklighter, 1970) and that real differences 
may have been obscured by the use of several 
different measures. 

Karagan and Zellweger (1978) attempted 
to replicate the findings of Marsh and Munsat 
and to study the pattern of Verbal and Per- 
formance IQ in this population in more de- 
tail, They studied intellectual functioning in 
a group of 53 boys with Duchenne muscular 
dystrophy. All the children were under 10 
years of age and were ambulatory, that is, 
they were Stage 5 (Zellweger & Hanson, 
1967) or less of their disease. All were ad- 
ministered the Verbal and Performance scales 
of the WISC; mean Verbal IQ (81) was sig- 
nificantly lower (¢ = 4.48, P< .001) than 
Performance IQ (88). Further, the range of 
Verbal IQs (58-108) was more constricted 
and the distribution more skewed than was 
the range (51-118) and distribution of Per- 
formance IQs. The mean Verbal-Performance 
discrepancy score was —7, significantly lower 
than the score for the standardization sample 
of the WISC, which was 0 (Seashore, 1951). 


Two Patterns of Intelligence Test 
Performance 


In an effort to analyze this pattern in more 
detail, Karagan and Zellweger (1978) dichot- 
omized this sample first at the mean (—7) of 
the Verbal-Performance discrepancy distribu- 
tion, Two distinct patterns of intelligence test 
performance emerged. Both groups demon- 
strated depressed mean Verbal IQs of 81. 
However, the first group, with a Verbal-Per- 
formance discrepancy score that was less than 
or equal to —7, had a mean Performance IQ 
of 97. The other group, with a Verbal-Per- 
formance discrepancy score that was greater 
than —7, also had a significantly lower than 


normal Performance IQ (78). 


Relationship of IQ to Psychosocial Factors 


Emotional Factors 


Studies prior to 1960 (eg., Morrow & 
Cohen, 1954) generally attributed any identi- 
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fied intellectual deficit in Duchenne muscular 
dystrophy to emotional factors associated 
with chronic and fatal illness. Although Allen 
and Rodgin (1960) and Florek and Karolak 
(1977) identified in the patients in their stud- 
ies emotional components that could have 
had a depressing effect on their level of intel- 
lectual functioning, these components could 
not account for the overall depression of IQ, 
particularly the lower IQs. 

Karagan and Zellweger (1978), Kozicka 
et al. (1971), Marsh and Munsat (1974), 
and Prosser et al. (1969) did not find the 
generally lower IQs of dystrophic patients to 
be the result of secondary or environmental 
effects of the disease. Kozicka et al. found an 
average IQ (92) in their control group of 
patients with spinal muscular atrophy who 
were about as equally chronically ill and 
physically disabled as the Duchenne patients, 
who had a mean IQ of 76. Karagan and Zell- 
weger, Marsh and Munsat, and Prosser et al. 
concluded that psychosocial factors have lit- 
tle influence on the depressed IQs of Duchenne 
patients for several reasons. First, the de- 
pressed intellectual functioning is present at 
an early age. Second, most young children 
with Duchenne muscular dystrophy are no 
more than minimally restricted in mobility 
and thus are able to adequately explore their 
environment. And finally, most of the younger 
children in these studies either were enrolled 
in a regular school program or were placed in 
Special programs solely on the basis of their 
lower mental functioning, and they did not 
manifest educational handicaps directly at- 
tributable to their physical disability, 


Socioeconomic Status 


No relationship has been demonstrated be- 
tween socioeconomic status and IQ, except 
that between extreme classes the IQ dif- 
ferences are not unlike those for the general 
population (Cohen et al, 1968; Florek & 
Karolak, 1977; Prosser et al., 1969; Worden 
& Vignos, 1962; Zellweger & Niedermeyer, 
1965). Prosser et al. commented that at both 
the low and middle socioeconomic levels, the 
dystrophic patients have lower IQs than do 
their normal sibs. 
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Relationship of IQ to Physical Factors 
Serum Enzymes 


Worden and Vignos (1962) found no rela- 
tionship between IQ and creatinuria or serum 
aldolase. Prosser et al. (1969) looked at 
the relation between creatine phosphokinase 
(CPK) and IQ in relatives of Duchenne pa- 
tients. In each family there was no difference 
in mean IQ between carriers and normals 
(both male and female). There was also a 
low nonsignificant correlation between the 
highest recorded CPK level of each carrier 
and her IQ. No known study has reported on 
the relationship between CPK level and IQ in 
the Duchenne patients themselves. 


Sporadic Versus Familial Cases 


Although Dubowitz (1965) found that a 
positive family history for the disease was 
much less frequent in patients of normal 
intelligence than in retarded patients, these 
findings were not supported by Prosser et al. 
or Zellweger and Niedermeyer (1965). 


Severity of Disability 


In several studies, IQ was unrelated to se- 
verity of disability (Allen & Rodgin, 1960; 
Black, 1973; Cohen et al., 1968; Kozicka et 
al, 1971; Nakao et al., 1968; Prosser et ај, 
1969; Worden & Vignos, 1962; Zellweger & 
Hanson, 1967), despite two reports suggesting 
a deterioration in intellectual functioning with 
disease progression (Dubowitz, 1965; Florek 
& Karolak, 1977). However, as noted previ- 
ously, Marsh and Munsat (1974) reported à 
lower WISC Performance IQ in their patients 
who were physically more significantly im- 
paired and concluded that the lower Perform- 
ance IQ was a function of the interference of 
the physical disability with Performance scale 
items. This, of course, would also tend to pr 
duce a lower Full Scale IQ. 

The findings of two reports suggest a? 
interesting phenomenon with respect to Verba 
IQ over time. Prosser et al. (1969) found ? 
significant correlation of .38 between age 4? 
Verbal IQ in 39 of their patients, Karag?? 
and Zellweger (1976; Karagan & Zellweg®™ 
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Note 1) reported on a preliminary test-retest 
study of the intellectual functioning of 22 
children with Duchenne muscular dystrophy. 
The children were tested with the WISC first 
when they were under age 10 and at less than 
Stage 6 of their disease (ambulatory) and 
again when they were not ambulatory, an 
average of 48 months later. Again, consistent 
with their findings on 53 young dystrophic 
patients (Karagan & Zellweger, 1978), two 
groups emerged. On initial testing, the mean 
Verbal-Performance discrepancy score for the 
22 patients was —11. The group of 11 patients 
with a Verbal-Performance discrepancy score 
that was less than or equal to —11 had an ini- 
tial Verbal IQ of 79, but on retesting they 
had a mean Verbal IQ of 87 (p < .10). The 
mean Performance IQ of 99 on initial testing 
dropped to 88 (p < .05). The other group of 
11 patients, which had on initial testing a 
Verbal-Performance discrepancy score that 
was greater than —11, had identical scores on 
test and retest—mean Verbal IQs of 79 and 
mean Performance IQs of 78. Neither group 
was significantly different in terms of age or 
stage of the disease at the time of the first or 
the second testing. 


Abnormal Brain Functioning 


Although most authors have concluded that 
the intellectual deficit is related to abnormal 
brain functioning (Allen & Rodgin, 1960; 
Cohen et al., 1968; Dubowitz, 1965; Kara- 
gan & Zellweger, 1978; Marsh & Munsat, 
1974; Michal, 1972; Prosser et al., 1969; 
Rosman, 1970; Schorer, 1964; Worden & 
Vignos, 1962; Zellweger & Hanson, 1967), 
several studies have reported a high percent- 
age of abnormal electroencephalogram (EEG) 
findings in patients with Duchenne muscular 
dystrophy (Florek & Karolak, 1977; Kozicka 
et al., 1971; Nakao et al., 1968; Zellweger & 
Niedermeyer, 1965). Further, these studies 
also reported a high correlation between the 
severity of the intellectual deficit and the 
severity of the EEG findings, This finding 

i provides some support for the notion that the 
intellectual deficit is based on some physical 
parameter of the disease that affects central 
nervous system functioning as opposed to the 
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notion that the intellectual deficit is solely 
secondary to psychosocial factors. 


Progression of the Disease 


Three studies provide some interesting but 
tentative and speculative data regarding the 
relation between IQ and progression of the 
disease, Rosman (1970) reported on 10 pa- 
tients with Duchenne muscular dystrophy and 
found that both the clinical myopathy and 
the severity of histopathological changes 
paralleled intellectual functioning: The pa- 
tients with the most severe involvement of 
muscle were those patients with the most 
severe degree of intellectual impairment. This 
relationship was independent of age of onset 
or of the duration or severity of the myopa- 
thy. 

Nakao et al. (1968) found in 77 Duchenne 
patients that the rate of inferior intelligence 
was high (45%) when the disease was of 6 to 
11 years duration and was low (20%) when 
the disease was of more than 11 years dura- 
tion. They concluded that cases of inferior 
intelligence pursue an unfavorable course. 

Karagan and Zellweger (1976; Karagan & 
Zellweger, Note 1) in their preliminary test- 
retest study of 22 Duchenne patients, found 
that the 11 patients with a Verbal-Perform- 
ance discrepancy score of less than or equal 
to —11 progressed from stages 2 and 3 of the 
disease to Stages 8 and 9 in an average of 66 
months. On the other hand, the 11 patients 
with a Verbal-Performance discrepancy of 
greater than —11 (i.e. both Verbal IQ and 
Performance IQ were depressed) progressed 
more rapidly, in an average of 38 months 


(p < 10). 


Conclusion 


Patients with Duchenne muscular dystro- 
phy, traditionally thought to be a primary 
disease of muscle, manifest a higher incidence 
of mental retardation and decreased intellec- 
tual functioning than the general population. 
There is strong evidence that this decrease is 
the result of abnormal central nervous system 
functioning. Although it has recently been 
suggested that the disease may be secondary 
to a neuronal defect or to a generalized dis- 
order of protein synthesis and membrane, no 
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proof of any of the proposed etiologies exists. 
The specific mechanisms underlying the 
disease itself and the accompanying intellec- 
tual deficits are unknown at the present time. 
However, a relationship between the intel- 
lectual deficit and the disease is an intriguing 
possibility. 

The entire IQ distribution of patients with 
Duchenne muscular dystrophy appears to be 
shifted downward by about 1 SD compared 
with the normal IQ distribution. The IQs of 
these patients follow the genetic intellectual 
characteristics of the family, and there is 
high concordance between the intellectual 
levels of affected siblings. Duchenne patients 
typically have significantly lower IQs when 
compared with a variety of control groups 
who have chronic disease or physical inca- 
pacity. Verbal ability, as defined by the WISC 
Verbal IQ, is depressed at an early age and 
appears universally affected in Duchenne pa- 
tients. Some patients, however, manifest a 
Significant deficit in both Verbal and Per- 
formance IQ on the WISC. 

There is no evidence to date that relates 
IQ deficit to serum enzymes, sporadic or fa- 
milial etiology, age of onset of clinical symp- 
toms, or the severity of disability. However, 
Performance scale IQ does decline over time 
as a function of the interference of physical 
disability with performance scale items. 
There is no evidence of any intellectual or 
cerebral deterioration, In fact, there is some 
suggestion that in some patients Verbal IQ 
manifests a modest increase over time. 

Finally, there are several reports that sug- 
gest that the more severely affected a child's 
intellectual functioning is, the more rapidly 
the disease progresses. The evidence for this 
is by no means unequivocal yet. 3 

What appears necessary at this juncture is 
to identify other neuropsychological char- 
acteristics of this disorder and, in turn; to 
relate these as well as the intellectual deficits 
to physical parameters of the disease. These 
factors, of course, do not appear to relate in 
a simple fashion. The study of neuropsycho- 
logical aspects of other myopathic disorders 
is likewise important, particularly because 
these intriguing yet unfortunate human mala- 
dies have received relatively little attention in 
the psychological literature. 
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This article critically examines the pervasive assumption found in psychotherapy 
literature that disconfirmation of client role expectations has been demonstrated 
to be a negative influence in psychotherapy. When the empirical literature is 
examined, this hypothesis does not appear to be as conclusive as has been sug- 
gested. In fact, the empirical studies are evenly divided in supporting this 
hypothesis. Implications for future research are discussed. 


The expectations of client and therapist 
have for some time been postulated to be 
important influences in psychotherapy. In a 
monograph-length review and analysis of the 
early expectation research, A. Goldstein 
(1962b) extracted two types of expectations 
relevant to the study of psychotherapy. These 
two types were characterized as (a) prog- 
nostic expectations and (b) participant role 
expectations. The former were defined as the 
assessments of therapist and client regarding 
the probability of success in the therapeutic 
intervention, The latter were defined as the 
anticipations held by therapist and client re- 
garding the behavior that will be displayed 
in the psychotherapeutic relationship by both 
participants. Researchers have hypothesized, 
for example, that clients bring to psycho- 
therapy certain preconceived ideas about what 
the psychotherapist will do and how the 
client should behave. * 

The present review focuses on this second 
category—participant role expectations, More 
specifically, the review examines extant re- 
search that bears on the hypothesis that 
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failure to confirm client expectations of the 
therapists role results in negative conse- 
quences, In many ways, this hypothesis has 
already come to be accepted as fact by а 
large proportion of the psychological com- 
munity. In the common wisdom, it enjoys the 
status of a virtually unquestioned assump- 
tion. Many such allusions to the “demon- 
strated” relationship of disconfirmed client 
role expectations and various dependent 
variables come across the desk of even the 
casual reader of research in psychotherapy. 
To demonstrate this point, three such allu- 
Sions are presented here. That all three ex- 
amples are drawn from recent, scholarly re- 
views of related areas of research only serves 
to emphasize the pervasiveness of the as- 
sumption that disconfirmation of client role 
expectations has been demonstrated to be 4 
negative influence in psychotherapy. 

First, Lorion (1974a) reported that “the 
importance of patient expectations to treat- 
ment variables . . . has been demonstrated" | 
(p. 347). He felt sure enough of this state- 
ment to cite in support only seven repre 
sentative studies that used five different types | 
of dependent variables. Second, Baekelan 
and Lundwall (1975), in an extensive review 
of the psychotherapy dropout literature 
wrote that “it is known that discrepant ё 
pectations about treatment promote drop- 
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ping out" (p. 758). This inference was sup- 
ported by the citation of six studies, five of 
which were published prior to 1966. Thus, 
these authors showed little hesitation about 
accepting the validity of the disconfirmed 
expectations – negative consequences hypoth- 
esis. Third, Heitler (1976) made reference 
to the “substantial body of theory and re- 
search evidence" (p. 340) that indicates that 
some mutuality of patient-therapist role ex- 
pectations is crucial. He was so confident of 
this generalization that he cited only six 
references in support. 

Each of the preceding examples demon- 
strates the extent to which the hypothesized 
telationship of disconfirmed expectations and 
negative consequences in psychotherapy has 
been incorporated into the belief system of 
the field. The articles cited were not selected 
because they are unique. On the contrary, 
they represent a pervasive assumption among 
dinical researchers. The examples are par- 
ticularly forceful because they appear in the 
context of high quality scholarly papers. 

The present review was undertaken to ex- 
amine more systematically whether the as- 
Sumption of proven for the role expectation 
hypothesis is warranted by the available 
empirical evidence. The literature is discussed 
in the following order. First, descriptive ex- 
Positions of client role expectations are pre- 
sented briefly. Second, the theoretical and 
experimental background leading up to A. 
Goldstein’s (1962b) keystone monograph is 
Summarized. Third, the experimental litera- 
ture since 1962 that deals with the question 
of the negative consequences of disconfirmed 
tole expectations in psychotherapy is re- 
viewed. Fourth, this literature is discussed 
in terms of its implications and ambiguities. 


Descriptive Studies 


"d Wide range of studies have empirically 
Dated the existence of client expecta- 
( | кс the therapist's role. Apfelbaum's 
tive Work is perhaps the classic descrip- 
3 da Udy in this area. Using outpatients of 
COSE psychiatric clinic, Apfelbaum 
Gn T analyzed the clients’ Q sorts that re- 

ed their pretherapy expectations of the 
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therapist's attitudes and interview behavior. 
The resulting clusters suggested three major 
types of client role expectation: (a) the 
nurturant therapist—giving, protecting, and 
guiding without pushing or criticizing; (b) 
the model—well adjusted, diplomatic, a per- 
missive listener but not protective; and (c) 
the critic—analytical, critical, and demand- 
ing considerable responsibility from the client. 

Heine and Trosman (1960) identified two 
types of role expectations held by clients re- 
garding the psychotherapy process. They 
called these two types of expectations the 
guidance model and the collaboration model. 
By far, most of the outpatients in their sam- 
ple held expectations of psychotherapy that 
clearly fit the guidance model. They saw the 
therapist as the source of diagnostic infor- 
mation, of advice, and of medicine. Thera- 
pists, on the other hand, anticipated a ther- 
apy relationship closer to the collaborative 
model. They were generally less directive in 
style and were not oriented toward the use 
of diagnosis or psychopharmacological agents. 

Finally, Begley and Lieberman (1970) 
broke down client role expectations in still 
another way. Using a modified version of a 
questionnaire developed by McNair and Lorr 
(1964), they identified two clusters of pa- 
tient expectations of the therapist’s role. One 
patient group anticipated an active, directive, 
but warm therapist. The other group expected 
a more passive, detached, objective therapist. 

Using these and other approaches to the 
measurement of client expectations, a num- 
ber of investigators have limited themselves 
to reporting the types of expectation pre- 
dominant in their sample in the hope of de- 
termining the characteristic expectations of 
various populations. Many such reports in- 
dicated that the samples anticipated direc- 
tive therapists. Patients expected the thera- 
pist to be warm but still firmly in control of 
the therapy session. For example, Thomas, 
Polansky, and Kounin (1955) found that 
university students characterized the help- 
ful therapist as more directive, that is, will- 
ing to offer advice and to structure the situa- 
tion. Chance (1957) studied a group of 
mothers who were in therapy concurrently 
with their children. She found that the 
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mothers projected their therapists’ charac- 
teristics in idealized terms. They expected 
the therapist to function in the interviews as 
an active, directive, and supportive teacher. 
Cundick (1963) reported that college stu- 
dent outpatients expected an active and per- 
sonally involved therapist. However, both 
clients and therapists agreed that the client 
had primary responsibility for the direction 
of the interview. This aspect of their expecta- 
tion is different from the findings of the 
previous two studies. Finally, Tinsley and 
Harris (1976) found that their sample of 
college students did expect the therapist to 
be both expert and genuine in personal com- 
munication. Interestingly, this group did not 
generally expect that the therapy would be 
successful. 


Correlates of Client Expectations 


Not too far removed from the purely de- 
scriptive study of client expectations of ther- 
apist role, other researchers have used cor- 
relational designs to study the relationship of 
expectations to various client or psychother- 
apy variables. 

Apfelbaum (1958) found the three role 
expectation factors in his study to be differ- 
entially associated with clients! Minnesota 
Multiphasic Personality Inventory profiles, 
rate of dropping out of psychotherapy, hours 
in therapy, and sex differences. Jacobs, Mul- 
ler, Anderson, and Skinner (1972) studied 
the efficiency of several predictors of im- 
provement in hospitalized patients. They 
found that expectations of a low-directive, 
less concerned, and less sensitive therapist 
were the strongest single predictor of nega- 
tive outcome (r — .31). 

Caine, Wijesinghe, and Wood (1973) 
compared persons with expectations of a di- 
rective, authoritative therapist to persons 
who expected the client to be a more col- 
laborative partner in the process of therapy. 
They found that persons who expected the 
more directive therapist were significantly 
more externally directed, tended to be con- 
vergent rather than divergent thinkers, and 
scored higher on the Conservatism scale. 
Baldwin (1974) reported that university 
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students who expected the therapist to be 
planned rather than spontaneous in his or 
her intrasession behavior were more likely 
to be repressors than sensitizers. In addition, 
repressors considered therapist personality 
to be a less important variable in therapeutic 
outcome than did sensitizers. 

Based on his own work and a review of 
other research, Lorion (1974a, 1974b) con- 
cluded that expectations for psychotherapy 
are not related to socioeconomic status (SES), 
He reported that subjects from all SES 
levels tended to verbalize similar expecta- 
tions of therapist behavior. Low SES clients 
did not express significantly stronger antici- 
pations of an active, problem-solving thera- 
pist style. Supporting this position is Bent, 
Putnam, Kiesler, and Nowicki’s (1975) re- 
port that their sample of relatively well-edu- 
cated community mental health center out- 
patients expected to receive advice and 
medicine to solve their problems. These an- 
ticipations were not particularly different 
from those reported for a group of low SES 
outpatients by Overall and Aronson (1963). 
Garfield and Wolpin (1963) studied a group 
of young, predominantly low SES clients in 
which 67% expected types of therapist be- 
havior that put considerable responsibility 
on the client to help himself or herself. How- 
ever, Garfield and Wolpin also asked these 
clients what type of therapist behavior they 
preferred. In contrast to their expectation, 
the clients generally preferred to be given 
advice. This finding raised the question of 
the interaction between expectation and pref- 
erence. The distinction between these two 
very different concepts has very seldom been 
made by researchers in the area. As we point 
out later in the article, the failure to make 
this distinction has been a source of am- 
biguity in the literature. 

Perhaps the most justified conclusion from 
this series of descriptive and correlation 
studies is that expectations of therapist be 
havior will vary considerably among differ- 
ent samples and even within any given sal” 
ple. In addition, the Overall and Aronso? 
(1963) study hints that disconfirmed €% 
pectations might lead to negative results. 
Bent et al. (1975), in their descriptive !€ 
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port, stated that this question—about the 
role of expectation in the outcome of treat- 


| ment is the next logical question to be ad- 


dressed by their research. They are correct, 
of course, in assigning a high priority to the 
question. The existence of client expectations 
per se is of little importance if the failure to 
acknowledge or confirm these expectations 
does not affect the therapy outcome or pro- 
cess, However, it must be recognized that a 
substantial body of research has been ad- 
dressed to this question. In fact, as sug- 
gested earlier, the hypothesized relationship 
between disconfirmed -expectations for thera- 
pist behavior and negative consequences in 
therapy has apparently been awarded factual 
status in much of professional psychology. 
The examination of the extant research that 
bears on this hypothesis is the crux of the 
present effort to ascertain whether its status 
as demonstrated is justified. 


Theoretical and Experimental Background: 
Research Before 1962 


Interest in the impact of client role ex- 
pectations in psychotherapy was already 
evident by the early 1950s. Initial efforts pro- 
vided at least explicit speculation on the im- 
portance of role expectations in psychothera- 
peutic treatment (e.g, Kamm & Wrenn, 
1950; Seeman, 1949). 

Kelly (1955) considerably elaborated the 
theoretical underpinnings of the early prop- 
Ositions regarding client role expectations in 
Psychotherapy. He postulated that almost 
any dient already holds a highly personal- 
ized conceptualization of the nature of the 
У relationship and of the psy- 
n. erapist’s role within that relationship 

Prior to the initiation of treatment. 
г ay argued, on the basis of his theoretical 
с a that the psychotherapist must ac- 
ut d client's preconception of the thera- 
dd ole, at least in the beginning stage of 
ae Failure to confirm the client's ex- 
Her B results in confusion or disappoint- 
ae the part of the client. In Kelly’s 
i os ii therefore, the therapist cannot 
dies reject without negative effect the 

5 anticipations, even though the thera- 
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pist may attempt in the long run to change 
the role he or she initially accepts. 

Danskin (1955) also posited that the ef- 
fective therapist typically attempts to play 
the role expected of him or her by the client. 
On the other hand, Patterson (1958) argued 
that there are more and less effective ways 
to conduct therapy. He favored the low-direc- 
tive, client-centered approach. Patterson's 
position was that therapists should not at- 
tempt to meet client expectations for thera- 
pist behavior. He admonished them to help 
clients, if necessary, learn to respond favor- 
ably to the low-directive therapy style. 

At the same time, early empirical efforts 
to examine the expectation hypothesis were 
also under way. McGowan (1955) reported 
that clients in his sample experienced equiva- 
lent levels of satisfaction regardless of the 
style used by the counselors. Satisfaction 
was related, however, to the perceived ex- 
pertise of the therapist. A major factor in 
the perception of a counselor as expert ap- 
peared to be his or her ability to form a 
close, facilitative relationship with the client. 
Frank, Gliedman, Imber, Nash, and Stone 
(1957) found that individuals assigned to the 
relatively unfamiliar (at that time) group 
method of therapy dropped out of treatment 
at a much higher rate than did those as- 
signed to the more frequently anticipated 
individual therapy. The authors speculated 
that the group therapy experience was too 
sharply incongruent with the clients’ expecta- 
tions of psychotherapy to permit beneficial 
participation. 

Biddle (1958) built his reasoning on the 
foundation of earlier work in social psychol- 
ogy. This literature suggested that if one 
person conforms to the role expectations held 
by a second person, the first will enjoy 
greater influence over the second than will a 
person who does not conform to those role 
expectations. Biddle hypothesized that ana- 
logue subjects would express low satisfac- 
tion with a taped therapist who failed to 
meet their expectations. The results sup- 
ported the hypothesis only under one of two 
special conditions. First, when the therapist 
had some power to punish the client, noncon- 
formity to client expectations led to less 
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satisfaction. Second, when the client was led 
to believe that the therapist expected to be- 
have in much the same way as the client ex- 
pected him or her to behave, then noncon- 
formity by the therapist resulted in the ex- 
pression of less satisfaction by the client. 

Heine and Trosman (1960) used a sam- 
ple that held predominant expectations of an 
active, directive doctor and a passively co- 
operative patient. Their therapists! expecta- 
tions were generally much more along the 
lines of the collaborative model. The authors 
found that patients with the strongest ex- 
pectations of a directive therapist were much 
more likely to drop out of therapy than 
were persons with somewhat less directive 
expectations, Although therapist styles were 
not closely controlled in this study, the re- 
sults were suggestive. Similar findings were 
reported at about the same time by Skinner 
and Anderson (1959) and by Hankoff, Engle- 
hardt, and Freeman (1960). 

Emboldened by these promising data, re- 
searchers began major studies, and theoreti- 
cal propositions were offered. Chance (1959) 
reviewed the literature up to that point and 
reported the findings of her own extensive 
study. She concluded that “this mutuality 
of expectation may be one of the prerequisites 
to therapy" (p. 105). Wallach and Strupp 
(1960) hypothesized, on the basis of the 
available evidence, that if the client's ex- 
pectations regarding the therapist were con- 
firmed, then the therapy situation would 
appear more rewarding and the therapist 
would be perceived more positively. There- 
fore, they suggested that congruence between 
the type of assistance expected and the type 
of assistance proffered might be an impor- 
tant variable in psychotherapy. 

Further, Lennard and Bernstein (1960) 
reported on their major investigation of the 
nature and interrelationship of role expecta- 
tions and therapist-client communication in 
the psychotherapy interview. Their results 
suggested a significant relationship between 
the degree of dissimilarity of the participants’ 
role expectations and the degree of dysfunc- 
tion in the communication system. They 
concluded that when the expectations were 
very dissimilar, the resultant strain in the 
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dyadic interpersonal system placed that sys. 

tem in proximate danger of disintegration, | 
Finally, A. Goldstein (1962a, 1962b) com- 

prehensively reviewed the literature that di- 

rectly and indirectly bore on the influence 

of role expectations in psychotherapy. On 

the basis of those data, he adopted the posi- 

tion that the mutuality of participant role 

expectations is indeed an important influence 

in psychotherapy. He argued that the avail | 
able evidence indicated that adverse effects 

follow the disconfirmation of client expecta- 

tions of the therapist's role. Goldstein also 

urged that the logical ‘next questions be in- 

vestigated, such as the endurance of the al- 

leged adverse effects and the malleability of 

clients' incongruent role expectations. 

In summary, the early research in this | 
area constituted an auspicious beginning. It 
may be due in part to the enthusiastic re- 
sponse elicited by this early work that the 
hypothesized impact of disconfirmed role ex- 
pectations has become such a common as- 
sumption today. Nevertheless, a review of 
the efforts since 1962 to validate and extend 
the early findings indicates that the post- 
1962 research did not lend itself to the same 
unequivocal and enthusiastic conclusions. 
The discussion of this later research is or- 
ganized around the various types of depen- 
dent: variables that were used. It summarizes 
empirical efforts to demonstrate the effects 
of disconfirmed client role expectations as re- 
flected in client satisfaction, premature ter- 
mination, outcome, process, and changes in 
client expectations. 


Experimental Research Since 1962 
Client Satisfaction 


Isard and Sherwood (1964) reported that | 
client satisfaction was not related to the раг" 
ticular interview style of the counselor 5 
long as that style was similar to the clients 
expectation of how the interview would pro 
ceed. Mendelsohn (1964) found that when 
clients recognized that the therapist was I€ 
sponding differently from their expectations: 
they rated the interviewer more critic 
than did clients whose expectations were CO” 


firmed. Interestingly, some of the clients 
falled to recognize that the therapist was re- 
sponding in a style different from the one 
they expected. Goin, Yamamoto, and Silver- 
man (1965) examined ‘the satisfaction of 
those clients in their sample who expected 
advice from their therapists. Of this group, 
72% of those who received advice from the 
therapist reported satisfaction. Only 57% 


satisfaction with therapy. Severinson (1966) 
reported that disconfirmed client expecta- 
tions of the degree of therapist empathy re- 
sulted in lesser client satisfaction. The rela- 
tionship held without regard to the direc- 
tion of the disconfirmation. Kumler (1969) 
found a similar relationship between dis- 
confirmed expectations of therapist age and 
warmth and lesser client satisfaction. It is 
important to note, however, that the negative 
effects resulted primarily from cases in which 
expectations of high therapist warmth were 
not confirmed. Therapist warmth was posi- 
tively related to client satisfaction regard- 
less of client expectation. 

On the negative side, Cundick (1963) 
found no significant relationship between 
the congruence of client-therapist role ex- 
Pectations and satisfaction with psychother- 
apy as rated by counselor and client. Geller 
(1965) reported similar findings. Further, 
Severinson (1966) and Kumler (1969), who 
found evidence supporting the negative ef- 
fect on satisfaction of disconfirmed client 
expectations of empathy and warmth, re- 
Spectively, both failed to find similar effects 
of disconfirmed expectations of therapist di- 
Tectiveness. In addition, Klepac (1970), 
using only the audio portion of the videotapes 
Presented by Kumler, failed to replicate 
Kumler's finding that disconfirmed expecta- 
tions of therapist warmth led to lesser client 
Satisfaction. Consistent with Kumler, Klepac 
also Teported no relationship between client 
TUE and disconfirmed expectations 
на therapist directiveness. Finally, Glad- 

€in’s (1969) research suggested that client 
expectations for the therapist's role and the 
Psychotherapy process were multidimensional. 
x data indicated that disconfirmation of 

© expectations resulted in diminished 


DISCONFIRMED EXPECTATIONS 


of those who did not receive advice indicated 
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client satisfaction only when none of the 
several dimensions of expectation was con- 
firmed. Thus, he argued that any one given 
expectation is not itself an important factor 
in psychotherapy. 


Premature Termination 


Other researchers have used premature 
termination of psychotherapy as the depen- 
dent variable. Overall and Aronson (1963) 
compared client expectations with drop-out 
rate from therapy. A dropout was defined as 
a client who failed to return for the second 
interview. The authors found that clients 
with especially high expectations of an ac- 
tively supportive, directive therapist dropped 
out of therapy at a significantly higher rate. 
They attributed this result to the fact that 
their therapists tended to see their own role 
as less directive and less actively supportive. 
It is unfortunate that the measure of client 
expectation in this oft cited study was de- 
signed without safeguards against a "yes" 
set by the respondent. The questionnaire 
was administered orally by a staff social 
worker, thus making the danger of yes sets 
especially strong. This threat to internal va- 
lidity can be used to formulate an alterna- 
tive explanation for the authors' observation 
that, in general, clients tended to expect all 
things from the therapist. On the other hand, 
enough variance among clients did exist to 
establish that nonreturners held expecta- 
tions that were more directive and, therefore, 
more discrepant from the prevailing thera- 
pist expectations at that clinic than were 
the expectations of those who did return for 
their second session. · 

In a-similar vein, Borghi (1968) found 
himself unable to differentiate terminators 
from continuers in therapy on measures of 
ego strength, anxiety, and dependency. Sub- 
sequent detailed interviews with these clients 
strongly suggested that early termination of 
treatment occurred most frequently when 
these clients held role expectations that were 
incongruent with those held by the thera- 
pists. One of the most commonly held incon- 
gruent expectations was that the therapist 
would give a great deal of advice. Sandler 
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(1975) reported that the expectations of 
therapist behavior held by early terminators 
were more discrepant from their therapists" 
own role expectations than were the expecta- 
tions of those who continued in therapy. 
Premature terminators tended to expect more 
advice from the therapist and expected the 
therapist to be more responsible for leading 
the interview interaction. 

Opposing these positive findings, Goin et 
al. (1965) found no difference in length of 
stay between those clients whose expecta- 
tion of advice was met and those whose ex- 
pectation was not met. Fiester (1974) re- 
ported no relationship between disconfirmed 
client expectations for therapist role and 
dropping out of psychotherapy. Similarly, 
Vail (1974) found no relationship between 
the extent of the discrepancy between 
clients’ and therapists’ role expectations of 
the first interview and continuation in treat- 
ment. Finally, Horenstein (1974) measured 
the extent to which clients thought their 
expectations of therapist role were confirmed 
or disconfirmed in the actual interviews. He 
found no significant relationship between fail- 
ure to confirm expectations and dropping 
out of therapy. There was the suggestion of 
a nonlinear trend in the data, with discon- 
firmation contributing more to negative ef- 
fects than confirmation to positive effects. 


Psychotherapy Outcome 


Several studies have used psychotherapy 
outcome as the dependent variable. H. Gold- 
stein (1965) reported that describing the 
potential therapist to the client in terms 
congruent with the client's expectations of 
therapist role was quite instrumental in 
establishing a placebo effect. The effect 
was even stronger than that produced by 
positive prognostic expectancy. Dougherty 
(1973) matched one group of clients with 
therapists based on a number of variables 
including their respective orientations to 
therapy. He found that these "optimally 
matched" clients had more success in ther- 
apy, as rated by their therapists, than did 
the poorly matched and nonmatched subjects. 
Unfortunately, it is not easy to interpret 
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these results, since the therapists! percep- 
tions of success may merely reflect their 
greater comfort with clients who share their 
expectations of how psychotherapy will pro- | 
ceed. Gulas (1974) also found that clients 
whose initial role expectations were highly 
congruent with those held by their thera- 
pists demonstrated greater improvement in 
psychotherapy than those with less con 
gruent expectations. > 

On the negative side, Volsky, Magoon, 
Norman, and Hoyt (1965) found no evi- 
dence in their data to support the position 
that clients’ expectations about their role, 
the therapist’s role, or other aspects of the 
therapy process have an important bearing 
on psychotherapy outcome. In addition, 
Horenstein (1974) found no relationship be- 
tween the disconfirmation of client expecta- 
tions of therapist role and therapy outcome. 
As with the premature termination data, he 
did uncover a nonsignificant parabolic trend, 
such that disconfirmation of expectation 
contributed more to unsuccessful therapy” 
outcome than confirmation of expectation 
contributed to successful outcome. | 


Psychotherapy Process 


Research investigating the effects of dis- 
confirmed expectations on the therapy pro- 
cess has produced results that are equally 


. ambiguous. Clemes and D'Andrea (1965). 


derived their hypothesis from a consideration 
of an earlier study (Pope & Siegman, 1962) 
in which the less structured, low-directive 
therapist style produced greater anxiety 1n 
the clients than did the high-directive style: 
Clemes and D'Andrea postulated that it 
was not the low-directive therapist style pe 
Se that aroused anxiety during the inter- 
views. Rather, they argued, it was the 1n- 
teraction of that style with the clients’ ЄХ 
pectations for a high-directive style that was - 
the causal agent. To test their hypothesis, 
Clemes and D'Andrea measured each client $ 
expectations for high-low therapist directive- 
ness and then manipulated the actual intel 
viewer style to which each client was expose 
They found that, independent of the pa 
ticular interviewer style, clients whose © 


pectations меге not confirmed reported 
| greater anxiety than their counterparts whose 
expectations were realized. Unfortunately, 
this self-rating of anxiety was not confirmed 
by a complementary card-sort measure of 
anxiety, leaving the results unclear. 
Pope, Siegman, Blass, and Cheek (1972) 
also found effects of disconfirmed expecta- 
tions on interview process. These writers ex- 
perimentally induced in one of their two 
groups the expectation that the second of 
two interviews would comprise a test inter- 
pretation by the therapist. The control group 
was correctly informed that in the second 
session the therapist would ask questions 
soliciting personal information. This sort of 
manipulation, of course, is very strong. It 
is akin to a therapist deliberately misrepre- 
senting his or her therapy style to a new 
client. It did effect differences in interview 
speech behavior between the two groups. 
The experimental group was less orally pro- 
ductive and displayed more avoidance in 
their speech patterns. Pope et al. interpreted 
these results as support for the hypothesis 
that the experimental group would exhibit 
More strain in the oral interaction of the in- 
terview, Ziemelis (1974) also included an 
expectation manipulation as part of his study 
of the process of the psychotherapy inter- 
view. He reported some effects of the expecta- 
tion manipulation: In general, a client’s ex- 
pectation that he or she would not get the 
type of therapist desired resulted in negative 
effects, as reflected in some of the pencil-and- 
Paper measures of the process. This effect 
was not reflected in the ratings of the actual 
depth of interaction. 
On the negative side, Warren (1973) re- 
в that disconfirmed role expectations 
В result in any lower rating of rela- 
eee or therapeutic conditions as 
1 ship T y the Barrett-Lennard Relation- 
Ван innt In addition, Klepac and 
Pm n challenged the generalizability 
he АБ ts reported by Pope et al. (1972). 
SEM er writers summarized certain. rele- 
Klepac du of an earlier study in which 
fees 0) studied the effects of not con- 
a 5 iens expectations for the therapist's 
of directiveness. In that investigation, 
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each person was interviewed by a therapist 
whose level of directiveness was manipulated 
to be high or low, thus confirming or discon- 
firming the person's expectation. The inter- 
view itself was conducted via closed-circuit 
television. To maintain the picture on the 
monitor, subjects were required to press a 
switch at a given rate. Thus, two measures 
of process were available. First, the amount 
of switch pressing was recorded and was 
understood as an index of the reinforcement 
value of the interviewer. Second, oral produc- 
tivity was recorded in a manner equivalent to 
that used by Pope et al. Klepac's results in- 
dicated that persons whose expectations for 
a low-directive therapist were disconfirmed 
exhibited significantly more switch pressing 
than did persons in any other groups. Thus, 
not only were there no negative effects of 
the failure to confirm client expectations but 
there was a distinctly positive effect of the 
failure to confirm expectations for a low-di- 
rective therapist. Finally, Horenstein (1974) 
also included a measure of interview pro- 
cess: resistance to psychotherapy. As with 
the drop-out and outcome variables, he found 
no relationship between confirmation of ex- 
pectation and degree of client resistance ex- 
hibited, but he did note the parabolic trend 
that was also reported for the premature 
termination and therapy outcome variables. 


Change in Expectation 


Another type of dependent variable used 
in role expectation research has been the ex- 
tent and direction of change in the expecta- 
tion itself as a result of its confirmation or 
disconfirmation. If role expectations are in- 
deed an important influence in psychother- 
apy, it is hypothesized that they should be 
rather strongly held. Kumler (1969) re- 
ported that client expectations were rela- 
tively stable in the face of disconfirmation. 
However, clients did tend to change their 
expectations in the direction of very warm 
therapists, regardless of original expecta- 
tion. Sandler (1975) noted that there was a 
reduction of dissimilarity in role expectation 
as therapy continued but that clients who 
eventually dropped out of therapy exhibited 
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a much lesser tendency to change expecta- 
tions in the direction of actual therapist be- 
havior. 

On the other hand, Cundick (1963) found 
that client and therapist role expectations 
became significantly more congruent as coun- 
seling progressed. Klepac (1970) was unable 
to replicate Kumler's results using only the 
audio portion of Kumler's videotape. Klepac 
reported that expectations changed in the 
direction of the therapist's actual style. He 
replicated his own findings in a second study 
(Klepac, 1970) in which he reported that 
client expectations regarding therapist di- 
rectiveness were quite fluid, again conform- 
ing to the assigned therapist's style. Finally, 
Gulas (1974) also found that congruence of 
client-therapist role expectations increased 
as therapy progressed. 


Shaping Role Expectation 


One other approach has been tried in the 
attempt to demonstrate the hypothesized re- 
lationship between disconfirmed role expecta- 
tions and negative consequences. The logic 
underlying this research might be recon- 
Structed as follows: If it is important that 
dient role expectations be confirmed, then 
clients whose role expectations have been Sys- 
tematically shaped toward congruence with 
actual therapist style should exhibit more 
positive consequences. Albronda, Dean, and 
Starkweather (1964) instructed psychiatric 
Social workers to use preliminary client con- 
tacts to help clients form accurate expecta- 
tions regarding the nature of psychotherapy 
and the roles of the participants. The authors 
reported that clients who received this treat- 
ment dropped out of therapy at a lesser rate 
and exhibited improved outcome. Hoehn- 
Saric et al. (1964) reported that role induc- 
tion interviews significantly improved client 
outcome in psychotherapy. The role induction 
interview consisted of a brief session struc- 
tured to develop accurate expectations of the 
psychotherapy process. These results were 
replicated in a later study by Schonfield, 
Stone, Hoehn-Saric, Imber, and Pande 
(1969). 

Related manipulations have also proven 
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successful. Mosby (1972) used a procedure. 
in which he informed therapists of existing 
differences between their own role expecta- 
tions and the role expectations held by their 
clients. The therapists were told to try to 
modify their clients’ expectations early in 
therapy to conform more closely with their 
own. Clients in the experimental condition 
changed their expectations of the therapist's 
role more quickly and dropped out of ther- 
apy at a lesser rate than did the controls, 
Heilbrun (1972) attempted to modify client 
expectations by using a brief pretherapy 
session in which clients were instructed that 
counselors of various styles could be equally 
effective and that the client should try to 
adapt his or her expectations to the coun- 
selor's style to obtain maximum benefit. Heil- 
brun found that clients characterized by low 
readiness for therapy demonstrated a lower 
drop-out rate when they received the pré 
therapy briefing than when they did not. 
However, high-readiness clients were not dif- 
ferentially affected by the briefing process. 
Contrary to these uniformly positive те 
sults, Venema (1972) was unable to dem 
onstrate the relationship between expecta: 
tions shaped before therapy and positive 
therapeutic consequences. An important dif 
ference between Venema’s study and the 
studies that reported positive findings is that 
Venema did not use live interaction in the 
change-of-expectation process. Instead № 
used a videotaped manipulation of inappro 
priate role expectations. In using a proce 
dural check, Venema insured that the group 
exposed to the videotape entered therapy 
with significantly more accurate expectations 
of the therapist’s role than did the control 
group. The experimental group also expel 
enced fewer role expectation disconfirmations 
during the initial interview. Nevertheless 
Venema found no relationship between role 
expectation disconfirmation and attrition. t 
may be that it was not the modification 0 
expectations at all that effected the positiv? 
outcomes in the previous studies. me 
may be that the extra personal attention 2" 
forded the clients who received the role 1 
duction interviews—the one most imports 
condition that discriminates between ' 


earlier studies апа Venema's—accounted for 
their more positive results. This alternative 
| interpretation is supported by Fernbach's 
(1975) failure to effect more positive results 
with clients who received a written clarifica- 
tion of therapist and client roles. In addi- 
tion, Orenstein (1974) found no statistically 
significant effects of a role preparation tape 
and an attraction induction message on sub- 
jects’ perceptions of the relationship and of 
the psychotherapist. It might be fruitful to 
compare in a single study the separate and 
interactive effects of extra attention and ex- 
tra information on client expectation and on 
| psychotherapeutic process and outcome. 
In summary, a comprehensive review of the 
available literature suggests considerable am- 
biguity regarding the validity of the hypoth- 
esis that disconfirmed role expectations re- 
sult in negative consequences. Research since 
1962 has not clearly supported the enthusi- 
astic inferences drawn from the early specu- 
lative and experimental literature, nor does 
the hypothesis warrant the assumption of 
proven that it has so often enjoyed in the 
current literature. The “box score" may be 
Summarized as follows. Single studies are 
Counted more than once if they used two or 
more distinct classes of dependent variables 
or two or more clearly distinct independent 
variables. 
Among studies that used satisfaction as 
the dependent variable, five (4596) sup- 
ported the hypothesized relationship and six 
ps did not support it. Among studies 
at used premature termination from psy- 
Chotherapy as the dependent variable, three 
a supported the hypothesized relation- 
( s and four (57%) did not. Three studies 
bo that used outcome as the dependent 
Til xu een the hypothesized relation- 
dier n two (40%) did not. Studies of the 
E de disconfirmed expectations on psy- 
ае were evenly split. Three 
ilie ata e hypothesized relationship and 
the mem. Among studies that measured 
obe y with which disconfirmed expecta- 
Ene ux held, two (33%) found expecta- 
E s e held quite strongly. On the other 
TM our studies (67%) found expectations 
à € rather fluid, changing in the direction 
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of the assigned therapist. The former results 
support the hypothesized relationship; the 
latter do not. Finally, five studies (62%) 
found that pretherapy modification of ex- 
pectations resulted in more positive conse- 
quences; three others (38%) found no ef- 
fects of pretherapy induction efforts. 

The empirical foundation for the hypothe- 
sized negative effects of disconfirmed role: 
expectations is imbued with a great deal of 
ambiguity. In only one type of empirical in- 
vestigation of this problem—the role induc- 
tion strategy—is there any hint of a pre- 
dominance of studies in favor of the expecta- 
tion hypothesis. Even in this research, the 
interpretations are confounded by the failure 
of most of the positive studies to separate 
the effects of extra personal attention from 
those that resulted from changes in role ex- 
pectations. Overall, 21 studies (4996) sup- 
ported the hypothesized relationship; 22 
studies (51%) did not. It would be diffi- 
cult to find a more even split in a research 
area. The ambiguity is highlighted even more 
when one remembers the reluctance of so 
many editors and reviewers to judge favor- 
ably those manuscripts that report nonsig- 
nificant results and the consequent reluctance 
of many researchers to submit such results. 
Clearly, the conclusion that it is important 
to meet client role expectations in psycho- 
therapy does not deserve the sustained sup- 
port that it has received. It must be removed 
from its status as demonstrated in the com- 
mon wisdom of psychology. 


Assessment of the Literature 


This section considers the contribution of 
several factors to the state of ambiguity 
regnant in the role expectation literature. 
Implicit in the discussion of these factors 
are implications for improvements in future 
research design to clarify the importance of 
client role expectations in psychotherapy. 

Klepac and Page (1974) addressed them- 
selves to the problem of contradictory find- 
ings between their report and that of Pope 
et al. (1972). They argued that one factor 
that contributes to ambiguity in research on 
role expectations is the tendency of research- 
ers in the area to study imprecisely defined 
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or globally assessed expectations. Venzor, 
Gillis, and Beal (1976) found a clear lack 
of correspondence between subjects' responses 
to an adjective checklist of preferred thera- 
pist characteristics and а quasi-behavioral 
index of preferred therapist style. They sug- 
gested that future studies use measures that 
are as close as possible to a behavioral repre- 
sentation of the therapist’s behavior. Only 
a few studies have assessed role expectations 
with video, audio, or written examples of 
the behavior in question. Further, some in- 
vestigators have used instruments for the 
assessment of expectation that use open- 
ended response formats, which permit con- 
siderable subjective bias in scoring. In one 
case, the items were objectively written, but 
the questions were asked orally of the patient 
by an intake worker. Not surprisingly, these 
investigators reported that the patients ex- 
pected almost everything from the therapist; 
that is, they answered yes to almost every 
question that began “Do you expect ... ?” 
Methodologically, one must suspect that a 
positive response bias may have resulted. In 
other situations, authors have used relatively 
brief instruments to assess expectations on 

number of dimensions. Since only a few 
items could be applied to each dimension, the 
question of reliability must be raised. Finally, 
some researchers have used instruments of 
adequate characteristics but have elected to 
measure expectancy for therapist character- 
istics that are by definition rather broad in 


nature, for example, therapist personality.* 


It is this multifaceted tendency toward 
imprecision that Klepac and Page (1974) 
held to be a major cause of the inconsistency 
of experimental findings in the research on 
client role expectations. They proposed that 
future research attempt the twofold task of 
(a) establishing which are the most relevant 
basic dimensions along which client role ex- 
pectations may be described and (b) de- 
termining the nature of the effects, if any, of 
failing to meet client expectations along 
these various dimensions. This paradigm has 
a real potential for building a more solid 
empirical basis on which to make judgments 
regarding the actual relevance of client role 


expectations to psychotherapy. 


A second problem area in the research on 
role expectations has been the ambiguous 
definition of the term expectation. In the 
precise sense in which it was originally used 
by Kelly (1955) in his theory of personality 
and psychotherapy, and by Apfelbaum 
(1958) in the classic study of role expecta- 
tions, expectation was clearly defined as the 
anticipation of some event. There was the 
implication that the anticipation is held with 
some degree of certainty. Klepac (1970) and 
Pope et al. (1972) are examples of later 
researchers who have been careful to define 
expectation for their subjects as this same 
sense of anticipation. However, most re. 
searchers have not been so careful in draw- 
ing the distinction, for their subjects or for 
themselves, between this definition of ex| 
pectation and the alternative, competing con: 
notation that can be attached to the same; 
word. In this alternative connotation, the 
term expectation can carry the implication 
that the expector is due, has the right to, or 
demands that which is expected. This mean- 
ing of the term suggests that the person (0 
whom the expectation is extended has some 
obligation to meet that expectation. То sayi 
to another person “I expect you to be force- 
ful with your boss!" clearly may be under- 
stood to imply more than mere anticipation. 
In truth, such a statement uses expect to 
communicate a desire that such behavior МИ 
be forthcoming. More precisely, it is a prefer 
ence that some event should occur. 

It seems straightforward to state that these 
two concepts—expectation as anticipation 
and expectation as preference—are suffi- 
ciently different aspects of human cognition 
to warrant distinct treatment. Nevertheless 
most researchers in the area of role expecta 
tions have not dealt specifically with this 
issue (cf. Rosen, 1967). The majority of 
authors have simply neglected the problem, 


*The authors are grateful to an anonymous 1 
viewer who noted that expectations of the ther 
Pist’s personal qualities and expectations of the 
therapist's behavior may be differentially determin 
and may respond differently to disconfirmation- This 
conceptual distinction has not often been made 4” 
may warrant closer consideration in future 800% 


parently leaving subjects to interpret ex- 
ресіайоп as they wished. The interpreta- 
tion of their results is not clear, therefore, 
and the variance among studies is not sur- 
prising. Some other authors, while indicating 
‘that they were studying expectations, pre- 
cisely defined their independent variable in 
terms of client preference for some therapist 
behavior. For example, one investigator's 
instruction to describe one's ideal therapist 
is clearly not likely to have elicited client 
expectations (anticipations) of therapist be- 
havior. Thus, the imprecision with which ex- 
pectation has been defined is a second factor 
in the inconsistency of the research results. 
The series of studies in this area may have 
intermittently manipulated one of two dis- 
tinctly different independent variables. The 
definitional problem is compounded by the 
fact that few researchers have stayed with 
Мі topic for more than one study. This 
fluctuation has added considerably to the 
already large variance in what precisely is 
being assessed as the independent variable. 

The third factor discussed in the present 
article cannot be so definitively stated. It is 
presented for consideration by researchers 
in the area, It is speculated that expectation 

| and preference may not only be related, but 
May be related hierarchically, such that 
studies of client expectation cannot be un- 
€quivocally understood without the simul- 
taneous investigation of preference. 

As early as 1955, Shaw noted that client 
Tole expectations might be disconfirmed in 
а desired direction as well as in an undesired 
direction. Helson (1959, 1964) developed a 
Sophisticated statement of a similar posi- 
NC epee this statement as an al- 
ETE É earlier hypothesis regarding 
ete isconfirmed expectation that 
neg by McClelland, : Atkinson, 

tòn j^ fü Td (1953) in their elabora- 
Boiss ve ао motive. Helson s hy- 
Њено ~ ing the effects of disconfirmed 
hat when n a bipolar one. He postulated 
quM е expectation under investiga- 
E ^ that embodies an affective com- 
m » then the affective. and motivational 
fi Sequences associated with the discon- 
mation of the expectation arise as a func- 
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tion of the direction, as well as the inten- 
sity, of the discrepancy. 

Helson did not question the adequacy of 
McClelland et al.’s model for explaining and 
predicting reactions to the disconfirmation 
of expectations that are primarily sensory in 
nature. However, he hypothesized that ex- 
pectations with an affective or aesthetic com- 
ponent are markedly different in nature from 
expectations of a purely sensory event. He 
reasoned that the former types of expecta- 
tions could be ranked along a continuum 
ranging across levels of desirability, from 
most preferred to least preferred, passing 
through a zone of indifference. Therefore, 
Helson deduced that given some particular 
affectively toned expectation, a discrepancy 
from that expectation in one direction along 
the desirability continuum would elicit reac- 
tions very different in quality from those 
elicited by a discrepancy in the opposite di- 
rection. If the event that actually occurs is 
more desirable than the expected event, the 
result will be positive affect and approach 
motivation. Conversely, if the actual event 
is less desirable than the expected event, the 
result will be negative affect and avoidance 
motivation. Further, Helson also predicted 
that as the actual event extends farther from 
the expected event along the desirability 
continuum, it will elicit increasingly strong 
positive or negative reactions. In summary, 
it is clear that Helson considered preference 
to be a more basic variable than expecta- 
tion, undergirding it so to speak, and one 
that must necessarily be known if one is 
to predict the nature of a given person’s re- 
sponse to disconfirmation of an expectation, 

This bipolar position stands in contrast to 
the unidimensional theory of reactions to 
disconfirmed expectations developed by Mc- 
Clelland et al. (1953). The unidimensional 
position implicitly or explicitly underlies 
most investigators’ hypotheses that discon- 
firmation of client role expectations in psy- 
chotherapy will negatively influence process 
or outcome. The unidimensional position 
holds that the resultant affective and motiva- 
tional states arise solely as a function of the 
extent of the discrepancy between the actual 
event and the expected event. A slight dis- 
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crepancy in either direction will elicit reac- 
tions slightly more positive than will no 
discrepancy. Beyond that point, the larger 
the discrepancy, the more negative the re- 
action. 

If the unidimensional theory of expecta- 
tion disconfirmation is adequate, preference 
Should be considered a noninteracting vari- 
able with expectation. On the other hand, if 
the bipolar theory provides a more adequate 
explanation for affectively and aesthetically 
toned expectations, then preference must be 
considered to be a variable inextricably in- 
tertwined with expectation and integral to 
an understanding of reaction to disconfirma- 
tion of expectation. The bipolar theory pre- 
dicts that the preference modifies the quality 
as well as the quantity of the reaction to 
disconfirmed expectation. 

Block (1964) undertook an empirical com- 
parison of the two theories of reaction to 
disconfirmed expectation as they apply to 
client role expectation in psychotherapy. Ex- 
perienced raters were trained to note all 
client remarks that expressed a felt discrep- 
ancy between the actual therapy experience 
and the prior expectation. These remarks 
were categorized in terms of the size and the 
direction (more preferred — less preferred) of 
the discrepancy. Manifest client affect and 
inferred secondary motivational states were 
also recorded. Approach motives in psycho- 
therapy were operationalized to include such 
behaviors as taking initiative in talking, pre- 
senting new associations and dreams, mak- 
ing productive use of silence, and exhibiting 
actions and movements that reflect deeper 
involvement in the psychotherapy. Avoid- 
ance motives were operationalized to include 
such behaviors as evasions, forgetting dreams, 
angry silences, coming late, threatening ter- 
mination, and withdrawing to shallower in- 
volvement in Psychotherapy. The results 
clearly supported the bipolar position's pre- 
dictions. Affective responses and inferred sec- 
ondary motives varied with the direction of 
the discrepancy. There was no effect of the 
size of the discrepancy alone. Thus, the 
question of whether the client's expectations 
were confirmed or disconfirmed is too sim- 

plistic when asked alone. One must also ask 
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whether the person wanted or did not want 
what he or she expected. 

Although Block's study was reported some 
time ago, and recently was cited in some de- 
tail by Meltzoff and Kornreich (1970), re- 
searchers in the area of role expectations in 
psychotherapy have seemingly not grappled 
with its implications. It appears at least to 
us that role expectation research has con- 
tinued to be based on the unidimensional 
position, although without such specific con- 
ceptual elaboration of the basic assumptions 
on which the hypotheses are based. We hy- 
pothesize that inattention to the bipolar posi- 
tion and the resulting failure to account for | 
the preference variable are two of the factors 
that have led to the currently highly am- 
biguous state of the research on disconfirma- 
tion of expectation in psychotherapy. 


Summary 


Early research findings lent support to 
previous speculation regarding the impor- 
tance of meeting client role expectations in 
Psychotherapy. The beginnings of this re- 
search effort were auspicious. Reviewers of 
the expectation research offered strong state- 
ments regarding the importance of role ex- 
pectation in psychotherapy. Such statements 
were met with enthusiastic acceptance, in | 
part because they appeared to herald an al- 
liance of social psychology and psychother- 
apy research. The belief that negative effects 
result from the failure to confirm client role 
expectations was quickly and easily accepted 
into the common wisdom of psychology. This 
belief is typically considered to be demon- 
strated by the research and is usually cited 
as such without encountering serious chal- 


lenge. A review of the available empirical | 


literature since 1962 revealed that the va 
lidity of the disconfirmed expectations – neg 
ative effects hypothesis has not been estab- 
lished with certainty. The research was 2 
most evenly divided in terms of support for 
and lack of support for the hypothesis. 

The ambiguous state of the research Nc 
discussed in terms of problems in the desig’ 
and conceptualizations that have been used: 
Namely, it was noted that (a) the ope? _ 
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tionalization of the independent variable has 
often not been adequately precise or reliable, 
(b) the definition of expectation has usually 
not been clearly specified for the reader or 
subject, and (c) the theoretical position that 
has implicitly undergirded almost all the 
research on role expectation in psychotherapy 
may not be appropriate for the kinds of af- 
fectively toned expectations that are involved 
in psychotherapy. In conclusion, it would be 
most appropriate to approach the subject 
of role expectations more cautiously for the 
present. Continued research will be necessary 
to evaluate the true impact of disconfirmed 
expectations in psychotherapy. This research 
should incorporate procedures to counter the 
types of problems that have limited past 
work. In the interim, it may be unwarranted 
to continue to publish the many limited re- 
ports that describe the various role expecta- 
tions of different client groups. It may not 
be important, in a functional sense, what 
their expectations are. Furthermore, theses 
based on the so-called established relation- 
ship of disconfirmed expectations and nega- 
tive effects in psychotherapy should be re- 
examined in light of the fact that their rela- 
_tionship is not as clearly understood as has 
been supposed. 


References 


Albronda, H., Dean, R., & Starkweather, J. Social 
dass and psychotherapy. Archives of General 
Psychiatry, 1964, 10, 276-283. 

Apfelbaum, B. Dimensions of transference in psy- 
chotherapy, Berkeley: University of California 
Press, 1958, 

Baekeland, F, & Lundwall, L. Dropping out of 
treatment; A critical review. Psychological Bul- 
letin, 1975, 82, 738-783. 
aldwin, B. Self-disclosure and expectations for 
P otherapy in repressors and sensitizers. Jour- 

Were of Counseling Psychology, 1974, 21, 455-456. 
is y, C, & Lieberman, L. Patient expectations of 

erapists’ techniques. Journal of Clinical Psy- 
chology, 1970, 26, 112-116. 
ae is Putnam, D., Kiesler, D., & Nowicki, 5. 
di E Pectandes and characteristics of outpatient 
nom „арени for services at a community men- 
ата th center facility. Jourmal of Consulting 

Biddle SUE Psychology, 1975, 43, 280. 

e n . An application of social expectation the- 
sM 2. the initial interview (Doctoral disserta- 
» University of Michigan, 1957). Dissertation 


bstracts, 1958, 1 Edi : 
No. 58-1377) , 19, 186. (University Microfilms 


213 


Block, W. A preliminary study of achievement mo- 
tive theory as a basis of patient expectations in 
psychotherapy. Journal of Clinical Psychology, 
1964, 20, 268-271. 

Borghi, J. Premature termination of psychotherapy 
and  patient-therapist expectations. American 
Journal of Psychotherapy, 1968, 22, 460-473. 

Caine, T., Wijesinghe, B., & Wood, R. Personality 
and psychiatric treatment expectancies. British 
Journal of Psychiatry, 1973, 122, 87-88. 

Chance, E. Mutual expectations of patients and 
therapists in individual treatment. Human Rela- 
tions, 1957, 10, 167-178. 

Chance, E. Families in treatment. New York: Basic 
Books, 1959. 

Clemes, S., & D’Andrea, V. Patients’ anxiety as a 
function of expectation and degree of initial in- 
terview ambiguity. Journal of Consulting Psy- 
chology, 1965, 29, 397-404. 

Cundick, B. The relation of student and counselor 
expectations to rated counseling satisfaction (Doc- 
toral dissertation, Ohio State University, 1962). 
Dissertation Abstracts, 1963, 23, 2983-2984. (Uni- 
versity Microfilms No. 63-0044) 

Danskin, D. Roles played by counselors in their 
interviews. Journal of Counseling Psychology, 
1955, 2, 22-27. 

Dougherty, F., Ш. Patient-therapist matching: An 
empirical approach toward the improvement of 
psychotherapy outcome (Doctoral dissertation, 
Vanderbilt University, 1972). Dissertation Ab- 
stracts International, 1973, 33, 6074B. (Univer- 
sity Microfilms No. 73-14505) 

Fernbach, R. Preparation of clients for individual 
psychotherapy using a written document to orient 
expectations and indicate appropriate behaviors 
(Doctoral dissertation, Ohio University, 1974). 
Dissertation Abstracts International, 1975, 35, 
6092B-6093B. (University Microfilms No. 75- 
11963) 

Fiester, A. Pre-therapy expectations, perception of 
the initial interview and early psychotherapy ter- 
mination: A multivariate study (Doctoral dis- 
sertation, Miami University, 1974). Dissertation 
Abstracts International, 1974, 35, 1907B. (Uni- 
versity Microfilms No. 74-2 1729) 

Frank, J., Gliedman, L., Imber, S, Nash, E, & 
Stone, A. Why patients leave psychotherapy. 
Archives of Neurological Psychiatry, 1957, 77, 
283-299. Е 

Garfield; S, & Wolpin, М. Expectations regarding 
psychotherapy. Journal of Nervous and Mental 
Disease, 1963, 137, 353-362. 

Geller, M. Client expectations, counselor role per- 
ception, and outcome of counseling (Doctoral 
dissertation, University of California, Berkeley, 
1965). Dissertation Abstracts, 1965, 26, 4073. (Uni- 
versity Microfilms No. 65-13489) 

Gladstein, G. Client expectations, counseling experi- 
ence, and satisfaction. Journal of Counseling Psy- 
chology, 1969, 16, 476-481. 


274 


Goin, M., Yamamoto, J., & Silverman, J. Therapy 
congruent with class-linked expectations. Archives 
of General Psychiatry, 1965, 13, 133-137. 

Goldstein, A, Participant expectancies in psycho- 
therapy. Psychiatry, 1962, 25, 72-79. (a) 

Goldstein, A. Therapist and patient expectancies in 
Psychotherapy. New York: Macmillan, 1962. (b) 
Goldstein, H. Placebo psychotherapy and change in 
anxiety, mood, and adjustment (Doctoral disserta- 
tion, University of Florida, 1965). Dissertation 
Abstracts, 1965, 26, 1775. (University Microfilms 
No, 65-9602) 

Gulas, I. Client-therapist congruence in Prognostic 
and role expectations as related to client’s improve- 
ment in short-term psychotherapy (Doctoral dis- 
sertation, Ohio University, 1974). Dissertation 
Abstracts International, 1974, 35, 2430B. (Univer- 
sity Microfilms No. 74-23852) 

Hankoff, L., Englehardt, D., & Freeman, N. Placebo 
in schizophrenic outpatients. Archives of General 
Psychiatry, 1960, 2, 33-42. 

Heilbrun A. Effects of briefing upon client satisfac- 
tion with the initial counseling contact. Journal 
of Consulting and Clinical Psychology, 1972, 38, 
50-56. 

Heine, R., & Trosman, H. Initial expectations of the 
doctor-patient interaction as a factor in continu- 
ance in psychotherapy. Psychiatry, 1960, 23, 275- 
278. 

Heitler, J. Preparatory techniques in initiating ex- 
Pressive psychotherapy with lower-class, unso- 
phisticated patients, Psychological Bulletin, 1976, 
83, 339-352. 

Helson, H. Adaption-level theory. In S. Koch (Ed.) 3 
Psychology: A study of a science (Vol. 1). New 
York: McGraw-Hill, 1959, 

Helson, H. Adaption-level theory. New York: Harper 
& Row, 1964. 

Hoehn-Saric, R, et al. Systematic preparation of 
patients for psychotherapy: I. Effects on therapy 
behavior and outcome. Journal of Psychiatric Re- 
search, 1964, 2, 267-281. 

Horenstein, D, The effects of confirmation or discon- 
firmation of client expectations upon subsequent 
psychotherapy (Doctoral dissertation, University 
of Kansas, 1973). Dissertation Abstracts Interna- 
tional, 1974, 34, 6211B. (University Microfilms No, 
74-12575) 

Isard, E, & Sherwood, Е. Counselor behavior and 
counselor expectations as related to satisfactions 
with counseling interview. Personnel and Guidance 
Journal, 1964, 42, 920-921. 

Jacobs, M., Muller, Ja Anderson, J., & Skinner, J. 
Therapeutic expectations, premorbid adjustment, 
and manifest distress level as predictors of im- 
provement in hospitalized patients, Journal of 
Consulting and Clinical Psychology, 1972, 39, 455- 
461. 

Kamm, R., & Wrenn, C. Client acceptance of self 
information in counseling. Educational and Psy- 
chological Measurement, 1950, 10, 32-42. 

Kelly, G. The psychology of personal constructs 
(Vol. 2). New York: Norton, 1955. 


P. DUCKRO, D. BEAL, AND C. GEORGE 


Klepac, R. An experimental analogue of psycho. 
therapy involving “client” behavior as a function" 
of confirmation and disconfirmation of expecta- 
tions of "therapist" directiveness (Doctoral dis- 
sertation, Kent State University, 1969). Disserta- 
tion Abstracts International, 1970, 30, 5690В- 
5691B. (University Microfilms No. 70-11348) 

Klepac, R., & Page, H. Discrepant role expectations 
and interviewee behavior: A reply to Pope, Sieg- 
man, Blass, and Cheek. Journal of Consulting and 
Clinical Psychology, 1974, 42, 139-141, 

Kumler, M. Client expectations of therapist role; 
Relationship to initial commitment in a psycho. | 
therapy analogue (Doctoral dissertation, Kent 
State University, 1968). Dissertation Abstracts, 
1969, 29, 4848B-4849B, (University Microfilms No, 
69-9561) 

Lennard, H., & Bernstein, A. The anatomy of psycho- 
therapy. New York: Columbia University Press, 
1960. 

Lorion, R. Patient and therapist variables in the 
treatment of low-income patients. Psychological 
Bulletin, 1974, 81, 344-354, (a) 

Lorion, R. Social class, treatment attitudes, and ex- 
Dectations. Journal of Consulting and Clinical 
Psychology, 1974, 42, 920. (b) 

McClelland, D., Atkinson, J., Clark, R., & Lowell, L. 
The achievement motive. New York: Appleton- 
Century-Crofts, 1953. Ў 

McGowan, J. Client anticipations and expectancies 
as related to initial interview performance and 
perceptions (Doctoral dissertation, University of 
Missouri—Columbia, 1954). Dissertation Ab- 
Stracts, 1955, 15, 228-229. (University Microfilms 
No. 00-10120) 

McNair, D., & Lor, M. An analysis of professed 
Dsychotherapeutic techniques. Journal of Consult- 
ing Psychology, 1964, 28, 265-271. 

Meltzoff, J, & Kornreich, M. Research in psycho- 
therapy. New York: Atherton Press, 1970. 

Mendelsohn, R. The effects of cognitive Фіѕѕопапо 
and interview preference upon counseling-tyP 
interviews (Doctoral dissertation, University of 
Michigan, 1963). Dissertation Abstracts, 1964, 24 
2987-2988. (University Microfilms No, 64-860) 

Mosby, R. Alteration of clients’ expectations about 
counseling in the direction of client-counselor mu- 
tuality by means of an experimental intervention 
Procedure (Doctoral dissertation, University of 
Texas at Austin, 1971). Dissertation Abstracts Im 
ternational, 1972, 33, 446B-447B. (University 
Microfilms No. 72-19635) 

Orenstein, L. Pre-therapy role preparation and at- 
traction induction: An experimental analogue 
(Doctoral dissertation, Kent State Universitys 
1973). Dissertation Abstracts International, 19% 
34, 3505В. (University Microfilms No. 73-3235) 

Overall, B., & Aronson, Н. Expectations of psycho 
therapy in patients of lower socio-economic Em 
American Journal of Orthopsychiatry, 1963, 9^ 
421-430. m 

Patterson, C. Client expectations and social COM 


tioning. Personnel and Guidance Journal, 1958 9^ 
136-138, 


DISCONFIRMED EXPECTATIONS 


Pope, B, & Siegman, A. The effect of therapist 
verbal activity level and specificity on patient 
productivity and speech disturbance in the initial 
interview. Journal of Consulting Psychology, 1962, 
26, 489. 

Pope, B., Siegman, A., Blass, T., & Cheek, J. Some 
effects of discrepant role expectations on inter- 
viewee verbal behavior in the initial interview. 
Journal of Consulting and Clinical Psychology, 
1972, 39, 501-507. 

Rosen, A. Client preference: An overview of the 
literature. Personnel and Guidance Journal, 1967, 
45, 785-789. 

Sandler, W. Patient-therapist dissimilarity of role 
expectations related to premature termination of 
psychotherapy with student therapists (Doctoral 
dissertation, City University of New York, 1975). 


Dissertation Abstracts International, 1975, 35, 
6111B-6112B. (University Microfilms No, 75- 
12691) 


Schonfield, J., Stone, A., Hoehn-Saric, R., Imber, S., 
& Pande, S. Patient-therapist convergence and 
measures of improvement in short-term psycho- 
therapy. Psychotherapy: Theory, Research and 
Practice, 1969, 6, 267-272. 

'Seeman, J. An investigation of client reactions to 
vocational counseling. Journal of Consulting Psy- 
chology, 1949, 13, 95-104. 

Severinson, J. Client expectation and perception of 
the counselor’s role and their relationship to client 
Satisfaction. Journal of Counseling Psychology, 
1966, 13, 109—112. 

Shaw, F. Mutualities and up-ending expectancies in 
Counseling. Journal of Counseling Psychology, 

„ _ 1955, 2, 241-247. 

Skinner, К., & Anderson, G. Personality and attitude 
characteristics associated with therapy readiness. 

American Psychologist, 1959, 14, 367-377. 


275 


Thomas, E., Polansky, N., & Kounin, J. The expected 
behavior of a potentially helpful person. Human 
Relations, 1955, 8, 165-174, 

Tinsley, H., & Harris, D. Client expectations for 
counseling. Journal of Counseling Psychology, 
1976, 23, 173-177. 

Vail, A. Dropout from psychotherapy as related to 
patient-therapist discrepancies, therapist charac- 
teristics, and interaction in race and sex (Doc- 
toral dissertation, Fordham University, 1974). 
Dissertation Abstracts International, 1974, 35, 
2452B. (University Microfilms No. 74-25087) 

Venema, J. The effects of expectancy training, com- 
mitment, and therapeutic conditions upon attrition 
from outpatient psychotherapy (Doctoral disserta- 
tion, Fuller Theological Seminary, 1970). Disserta- 
tion Abstracts International, 1972, 32, 6664B- 
6665B. (University Microfilms No. 72-15871) 

Venzor, E. Gillis, J., & Beal, D. Preference for 
counselor response styles. Journal of Counseling 
Psychology, 1976, 23, 538-542. 

Volsky, T., Jr., Magoon, T., Norman, W., & Hoyt, D. 
The outcomes of counseling and psychotherapy: 
Theory and research. Minneapolis: University of 
Minnesota Press, 1965. 

Wallach, M., & Strupp, H. Psychotherapists’ clinical 
judgments and attitudes toward patients. Journal 
of Consulting Psychology, 1960, 24, 316-323. 

Warren, B. Client expectations and the client-coun- 
selor relationship in a counseling analogue. JSAS 
Catalog of Selected Documents in Psychology, 
1973, 3, 131. (Ms. No. 492) 

Ziemelis, A. Effects of client preference and ex- 
pectancy upon the initial interview. Journal of 
Counseling Psychology, 1974, 21, 23-30. 


Received October 28, 1977 m 


ihe Bulletin 
1979, Vol. 86, No. 2, 276-296 


Taste Aversion and the Generality of the Laws of Learning 
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Results from research on aversions acquired through the pairing of ingested 
substances with illness have recently been used to challenge the assumption that 
there are laws of learning that hold across different species and tasks. The taste 
aversion literature is selectively reviewed and compared with data from tradi- 


tional experiments in order to evaluate this challenge. Areas sufficiently docu- F 
mented or controversial to warrant inclusion are associative fluidity, conditioned 
stimulus and unconditioned stimulus characteristics, temporal relationships, ob- 

taining and processing information, and age differences. The conclusion is that 

in no instance are different principles required to describe taste aversion and 
traditional learning. In some cases large parametric differences between the two 
research areas are apparent. It is suggested that at the present time it is not $ 
necessary to dispense with the notion of general laws of learning. 


An increasingly prevalent position in psy- 
chology is that more attention must be paid to 
previously ignored species-specific and task- 
Specific differences in the general laws of 
learning and that a reassessment of the field 
is necessary (see, e.g., Bolles, 1973; Breland 
& Breland, 1961; Hinde, 1973; Lockard, 
1971; Schwartz, 1974; Shettleworth, 1972a; 
Staddon & Simmelhag, 1971; Weisman, 
1977). Two widely cited articles in Psycho- 
logical Review (Rozin & Kalat, 1971; Selig- 
man, 1970) dealt extensively with this subject. 

Seligman focused ‘on the number of trials 
or amount of information néeded for learning 
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to occur, a dimension he called preparedness 
(for similar concepts see Capretta’s, 1961, 
stimulus relevance and Thorndike's, 1932, be-. 
longingness). According to Seligman, previous 
psychological research has assumed that all 
associations are equally prepared; in fact, the 
number of trials required for learning to take 
place can vary a great deal. The degree and 
direction of variation are a function of the 
organism's inherited associative. apparatus, 
acquired throughout a long evolutionary his 
tory. Seligman believes that contraprepared | 
associations, learning that takes a great mally 
trials to occur, may follow different laws than 
prepared associations, which take very few 
trials to occur. The characteristics of learning 
described by these laws might include.‘acqu! 
sition . . . resistance to extinction; maximum 
delay of reinforcement, flatness of general- 
ization gradient" (Seligman & Hager, 1972, 
p. 5). * E. 

Rozin and Kalat alternatively maintain А 
that there is по a priori reason to e 
degree of preparedness to be predictive 0, 
other laws of learning. Their position #25 
that laws of learning for each species and ё А 
learning situation evolve in such a way 4$ 
maximize survival; two learning tasks equiv 
lent in preparedness may or may not be equi 
alent in other aspects of learning, such 
maximum delay of reinforcement. 


TASTE AVERSION 


Although Seligman and Rozin and Kalat 
_ differ in their beliefs about the manner in 

which evolution influences the laws of learn- 
ing, both their views emphasize the impor- 

ce of this influence. Both stress the neces- 
sity of modifying or removing existing gen- 
eral laws of learning to accommodate adaptive 
specializations in learning. 

The area of psychology that has received 
the most attention, attack, and theoretical 
speculation with respect to these issues is 
feeding behavior, specifically, the learning of 
taste aversions, For many years taste aversion 
acquisition was known as the bait-shy phe- 
nomenon. Barnett (1975) has carefully de- 
scribed how rats, who consume their food in 
discrete meals, will briefly sample a new food 
(the bait), wait, and if they get ill, subse- 
quently avoid that food, 


Bait shyness was introduced into the lab- . 


. oratory by Garcia and his colleagues (Garcia, 
Kimeldorf, & Koelling, 1955). They discov- 
ered that when a gastrointestinal-like illness, 
produced in rats by irradiation, was paired 
with food, the food became aversive (Garcia, 
Kimeldorf, & Hunt, 1961). Eventually these 
data came to the attention of the psychologi- 
cal community (Seligman & Hager, 1972), 
and an onslaught of taste aversion experi- 
ments began (Riley & Clarke, 1977). Several 
procedural refinements were made, including 
the substitution in many studies of injections 
of the poison lithium chloride (LiCl) for 
| irradiation, since LiCl is easier to administer 
(Nachman & Ashe, 1973). The convention in 
the taste aversion literature has been to use 
à classical conditionfng terminology, with the 
taste as the conditioned stimulus (CS) and 
the illness as the unconditioned stimulus 
(US) (Garcia, McGowan, & Green, 1972). 
This convention is followed here. 

The taste aversion experiments appeared to 
demonstrate that there were many differences 
between taste aversion and traditional laws of 
learning. In particular, acquisition of taste 
Aversions despite extremely long delays be- 
tween the CS and the US (Garcia, Ervin, & 


Коби, 1966) and a proclivity for animals _ 
© associate tastes and not other stimuli with | 


(Garcia & Koelling, 1966) were noted. 
di € implications of these findings were widely 
Scussed, with some authors taking the view 
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that the data could be incorporated into exist- 
ing, or somewhat modified, general laws of 
learning (Krane & Wagner, 1975; Mackin- 
tosh, 1974; Revusky, 1977d; Revusky & 
Garcia, 1970; Testa & Ternes, 1977) and 
others feeling that more extensive revision of 
the laws was necessary (Kalat, 1977; Shettle- 
worth, 1972a; Zahorik & Houpt, 1977; see 
Rozin, 1977, for a more detailed analysis of 
the reactions to the anomalous taste aversion 
findings). 

Two problems were inherent in. these dis- 
cussions. First, the traditional laws of learn- 
ing used for comparison were not clearly 
specified, Seligman and Garcia (Garcia, Han- 
kins, & Rusiniak, 1974; Seligman, 1970; Sel- · 
igman & Hager, 1972) referred to the equipo- 
tentiality and contiguity assumptions of 
Thorndike’s (1911/1965) law of effect as 
principles that have been considered to hold 
in most situations and in most higher species. 
The former assumption is directly opposed to 
the concept of preparedness; it states that all 
stimuli and responses are equally associable. 
The latter declares that learning will only 
occur if the stimuli and the responses in- 
volved are in close temporal contiguity. Nev- 
ertheless, exactly what qualifies as equiva- 
lence of associability is never stated; close 
temporal contiguity is equally difficult to 
define. Seligman hinted at additional laws, 
those regarding generalization, extinction, 
partial reinforcement, and so forth, but these 
are also never defined. This is not a problem 
unique to taste aversion theorists, but a state- 
ment of the general condition of learning 
theory. No list of equations presently exists 
that accurately describes all that we know 
about the learning process. 

Even if we had a list of the laws of learn- 
ing, another problem would remain. What 
constitutes an exception to a general law of 
learning? As Kalat (1977) has pointed out, to 
learn more easily does not necessarily mean 
to learn differently. Bitterman (1975) and 
Hull (1945) have made a distinction between 
qualitative and quantitative differences in 
learning with respect to laws that vary be- _ 
tween species. These categories can also be 
employed to assist in the study of laws that 
may vary within a species as a function of the 
stimuli and responses involved, as may be the 
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case with prepared versus unprepared associ- 
ations. The quantitative-qualitative question 
boils down to “whether the performance of 
all animals can be deduced from a common 
set of principles or whether different princi- 
ples are necessary" (Bitterman, 1975, p. 
707). Thus, if one were attempting to identify 
merely quantitative differences between taste 
aversion learning and the general laws of 
learning, one would look for cases in which 
the same equations could be used, only with 
different parameter values. What seems to be 
an intuitively more fundamental, qualitative 
difference would necessitate the use of dif- 
ferent equations to describe the acquisition of 
taste aversions and more traditional tasks 
(Herrnstein, 1977; see Hull, 1945, for a de- 
tailed, though hypothetical, example of this 
method of analysis). 

A moment's consideration of the qualita- 
tive-quantitative distinction reveals it to be 
somewhat arbitrary when carried to extremes. 
A large change in the value of a parameter is 
dubbed a quantitative change, while the 
removal of a term whose value is already near 
zero constitutes a qualitative change. In 
other words, the continuum of small to large 
differences in laws of learning does not easily 
Split into two pieces, quantitative and quali- 
tative. However, this dichotomy is useful for 
investigating and categorizing differences be- 
tween taste aversion and traditional learning. 

This article reviews and compares results 
from taste aversion and more traditional 
learning experiments in an attempt to assess 
the generality of accepted laws of learning. 
Clearly this presents some difficulties, At 
times it is possible to determine for a given 
Situation the consensus with respect to how 
learning should take place (a general law of 
learning). In these cases the qualitative— 
quantitative distinction may prove helpful, 
At other times, the effort to differentiate be- 
tween principles that describe taste aversion 
and traditional learning seems no more than 
shadowboxing because neither side is able to 
clearly define its stance, In any Case, as a 
result of the multitude of claims and counter- 

claims that have been made about taste aver- 
sion learning and consequently about the 
generality of learning principles, a close ex- 
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amination of some of the data should be 
instructive. 

Because of the extensiveness of the taste 
aversion literature, only well-documented 
areas are discussed and representative studies 
cited. Emphasis is placed on controversial 
and unusual findings. Readers interested in 
obtaining additional information from the 
taste aversion literature should see the almost 
700 references compiled by Riley and his 
colleagues (Riley & Baril, 1976; Riley & 
Clarke, 1977). The areas discussed here are 
associative fluidity, CS and US characteris- 
tics, temporal relationships, obtaining and 
processing information, and age differences. 

A comparison of the physiological bases of 
taste aversion and traditional tasks has not 
been included for three reasons. First, there 
has been a recent review of the physiological 
taste aversion literature (Bureš & Burešová, 


1977). That review in large part covers ће! 


issues of concern here. Second, the physio- 
logical mechanisms of taste aversions are at 
this point far from clear. The studies per- 
formed most often train the taste aversions 
and perform the physiological manipulations 
in vastly different ways, and obtain vastly 
different results. For example, in studying 
hippocampal involvement in the acquisition of 
taste aversions, some researchers have le 
sioned the ventral or dorsal hippocampus, 
exposed the subjects once to a taste before 
pairing it with illness, and then assessed 
aversion using a one-bottle test (McGowan, 
Hankins, & Garcia, 1972), whereas others 
have aspirated most of the hippocampus, ёк 


posed the subjects seven times to a taste, 


before pairing it with illness, and assess! 

aversion using a two-bottle test (Miller, El- 
kins, & Peacock, 1971). It is not surprising 
that the results of these studies were incon- 
Sistent, with Miller et al. but not McGowan 
et al. finding taste aversion disruption; recon- 
ciliation of their results is extremely difficult. 
Third, a difference in physiology between two 
learning tasks does not necessarily imply £ 
difference in overt behavior. Even leaving 
aside discrepancies due to different sensory 
systems, activity in two separate areas of the 
brain can, perhaps through some later com- 
mon neural path, result in the same Me. 
responses, It is those external responses 10? 
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we observe and monitor and by which the 
organism interacts with the rest of the world. 


Associative Fluidity 
Acquisition 

Taste aversions are noted for forming in 
one trial. This unusual characteristic alone 
obliges their classification as prepared associ- 
ations (Seligman, 1970). Testa and Ternes 
(1977) have suggested that one-trial taste 
aversion learning results from rats’ lifetime 
experience with tastes and associated illnesses. 
Experiments with young organisms in which 
only one trial was needed for an aversion to 
form strongly suggest that Testa and Terne's 
explanation is not correct (Galef & Sherry, 
1973; Grote & Brown, 1971; Rudy & Cheatle, 
1977). There does indeed seem to be an 
inherited mechanism that enables taste-illness 
associations to be acquired rapidly. 

Nevertheless, this prewiring for rapid 
learning may not be unique to the acquisi- 
tion of taste aversions. One-trial learning can 
also occur, for example, in standard condi- 
tioning avoidance paradigms. Avoidance con- 
ditioning may occur rapidly, though only 
when the avoidance response, such as freez- 
ing, is the natural defense reaction of a spe- 
cies (Bolles, 1970). In that case the avoidance 
learning would also be classified as a pre- 
pared association. 

Although the rapidity with which taste 
aversions are learned may not signify more 
than a parametric difference between taste 
aversion and other learning, it does make the 
Process of acquisition more difficult to ob- 
Serve because of ceiling and floor effects 
(Kalat, 1977; Revusky & Garcia, 1970). 
Garcia et al. (1966) circumvented this con- 
straint by use of a procedure that condi- 
tioned a weak aversion. They were thus able 
to observe the development of an aversion 
Over several trials. The amount of saccharin 
Consumed decreased as the number of sac- 
charin-illness pairings increased in a Way 
Consistent with learning curves in more 
Standard paradigms (Kimble, 1961; Mackin- 
tosh, 1974). 


Retention 


A Once acquired, taste aversions are retained 
ver an extremely long period of time. Ex- 
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periments have shown retention for as long as 
90 days (Dragoin, Hughes, Devine, & Bent- 
ley, 1973). Even day-old guinea pigs trained 
to avoid sucrose after one trial retained their 
aversions more than a month later (Kalat, 
1975). But such long periods of retention are 
not unusual for aversively motivated learning 
in general. For example, Hoffman, Fleshler, 
and Jensen (1963) showed retention of con- 
ditioned suppression in pigeons to be virtually 
perfect 2.5 years after training. Gleitman 
(1971) extended these findings using rats, and 
he suggested that long retention may be the 
rule for classical aversive conditioning. 


Extinction 


To extinguish a conditioned response an 
experimenter presents the CS without the 
US. With repeated exposure to the unac- 
companied CS, the conditioned response de- 
creases in intensity or probability until it 
finally disappears. It has been suggested that 
taste aversions, since they are prepared, may 
be much more difficult to extinguish than 
traditional tasks (Mitchell, Scott, & Mitchell, 
1977; Seligman, 1970; Seligman & Hager, 
1972). Nevertheless, neither the overall pro- 
cess of extinction nor the rapidity with which 
it occurs appears discriminably different be- 
tween taste aversion and traditional learning. 

Numerous experiments have shown that 
extinction of a taste aversion will occur when 
the CS is repeatedly presented alone (e.g. 
Baum, Foidart, & Lapointe, 1974; Domjan, 
1975; Garcia et al., 1966; Nowlis, 1974). In 
addition, it has been found that extinction is 
faster if exposure to the CS is facilitated by 
making the CS a subject's sole source of some 
substance of which the subject has been 
deprived. For example, Grote and Brown 
(1973) showed that with only one fluid source 
available water deprivation hastens extinction 
of a saccharin solution CS by increasing 
(safe) CS consumption during extinction 
trials. Similarly, aversion to a sodium chloride 
solution is extinguished faster when a need 
for sodium is produced by injections of for- 
malin or by adrenalectomy, or when a need 
for water is induced by water deprivation or 
drug injections (Balagura & Smith, 1970). 
These experiments indicate that the degree of 
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extinction of taste aversions varies with the 
amount of CS-alone presentation, as does the 
degree of extinction in more traditional learn- 
ing tasks (see also Abelson, Pierrel-Sorren- 
tino, & Blough, 1977). 

There is such a large range in the resistance 
to extinction of tasks acquired in standard 
conditioning procedures that to make a state- 
ment to the effect that taste aversions are 
more or less difficult to extinguish at best ob- 
scures the facts. Characteristics of experi- 
mental procedure during acquisition are in- 
fluential in determining rapidity of extinction 
(for a review of the literature see Kimble, 
1961; Mackintosh, 1974), and this influence 
also makes it difficult to compare taste aver- 
sion with traditional task extinction. A more 
fruitful approach may be to compare extinc- 
tion of prepared and unprepared learning, 
equating as many aspects of the procedures 
for the two types of learning as possible. 
Some data relevant to such a comparison 
already exist (see Seligman & Hager, 1972). 

` However, the ability of the researchers cited 
in the preceding paragraph to extinguish taste 
aversions without inordinate difficulties (in 
Garcia et al.’s, 1966, experiment extinction 
was complete after three 10-minute sessions) 
argues against a statement to the effect that 
taste aversions almost always take longer to 
extinguish than do responses acquired in 
traditional tasks. 


Summary 


Acquisition occurs over time as a CS and a 
US are presented together. Once CS-US pair- 
ings have ceased, responses to the CS gradu- 
ally decrease, although complete cessation of 
responding may take years. This process oc- 
curs more quickly when the CS is presented 
without the US, that is, extinction occurs. 
This description applies equally well to both 
taste aversion and traditional learning. 
Further, the data point to no unique temporal 
aspects of these processes for taste aversion 
learning; taste aversions are not acquired 
faster, retained longer, or extinguished over 
more trials than all other traditional learning 
tasks. Previous studies of operant and clas- 

sical conditioning have included a wide range 
of prepared and contraprepared associations. 
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At most the research indicates that taste 
aversions are relatively easier to acquire, a 
quantitative, not a qualitative, difference, 


CS and US Characteristics and 
Their Interaction 


Intensity 


US. With increasing US intensity, learned 
taste aversions are more pronounced (Dra- 
goin, 1971; Nachman & Ashe, 1973; Revu- 
sky, 1968). This is consistent with other find- 
ings in the operant and classica! conditioning 
literature. Ray and Bivens (1968) trained 
mice on a passive avoidance task and found 
that with a greater intensity US, learning was 
more persistent following amnesic electrocon- | 
vulsive shock, In Passey's (1948) experiment 
with humans, conditioned eyelid responses 
were larger and were acquired more quickly 
when a stronger air puff was used as the US. 
Finally, Church (1969) has summarized evi- 
dence indicating that the extent to which 
punishment of an instrumental response sup- 
presses that response is a positive function ої | 
the intensity of the punishment. 

CS. The taste aversion CS also seems to | 
support better conditioning when it is stronger | 
(Barker, 1976; Dragoin, 1971; Nowlis, 
1974). Pavlov (1927/1946) observed this 
phenomenon in his own experiments on sali- 
vary conditioning, and Hull (1949) incorpo- 
rated it into his learning theories as stimulus 
intensity dynamism. Gray (1965) discussed 
the more recent evidence for this phenomenon 
in classical conditioning generally. 


Specificity of CS to US 


Taste aversion. Garcia and his colleagues, 
working with rats, were the first to report 
specificity of CS to US in taste aversion 
learning. The seminal study (Garcia & К al 
ling, 1966) found that if both an audiovist 
and a taste cue are correlated with illness, 
only the taste readily becomes aversive. Simi- 
larly, if the two cues are paired with shock 
only the audiovisual cue readily pM 
aversive. Additional work (Garcia & vi 
ling, 1967; Garcia, McGowan, Ervin, & Ko! to 
ling, 1968) showed that a taste was easier 
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associate with illness than was an odor or the 
size of a piece of food. Green, Bouzas, and 
Rachlin (1972), using a more controlled pro- 
cedure, have also demonstrated specificity of 
CS to US. They followed rats’ saccharin 
drinking with 1 hour of poisoning, pulsed 
shock, or continuous shock. Only poisoning 
suppressed drinking. 

The tendency of the taste of the food, 
rather than any other aspect, to become as- 
sociated with illness is one of the best known 
characteristics of illness-induced learning. 
However, it should be remembered that de- 
spite occasional failures (Larsen & Hyde, 
1977), it is possible for cues other than 
tastes to be associated with illness in rats, 
although often more trials and more careful 
» training procedures are required (Riccio & 
. Haroutunian, 1977; Rozin, 1969). P. J. Best, 
. Best, and Henggeler (1977) summarized a 
good deal of evidence in support of this 
Statement, but they also pointed out that 
ilness-induced aversions to environmental 
cues are not as robust as taste aversions. The 
maximum delay of reinforcement is shorter, 
and the aversions extinguish faster. 

Traditional learning. But what about 
More standard learning paradigms? Do they 
also show specificity of CS to US? The an- 
Swer is yes, although examples are not as easy 
to find for them as they are for taste aversion 
learning Shettleworth (1972b) used three 
different procedures in her experiments with 
& chicks: punishment with foot shock for drink- 

Ing water, punishment with shock in the water 
for drinking, or fear conditioning by pairing 

,SXteroceptive stimuli with foot shock while 

the chicks drank. For the first two procedures, 

4 compound CS consisting of a flashing light 

Plus clicking was associated with shock and 

Controlled behavior. For the conditioned emo- 
tional response paradigm, the clicking con- 
, tolled behavior. However, Shettleworth's 

Studies were not ideal for showing CS to US 

Specificity, as they used different responses 

and different experimental procedures. 

ie operant conditioning experiments did 

en Es this problem. Delius (1968) showed 

for S еп pigeons were thirsty and pecking 
Rs reinforcement they were more likely 
iy Ck at stimulus cards toward the short- 
A (blue) end of the spectrum relative to 
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when they were hungry and pecking for food 
reinforcement. In  Foree and  LoLordo's 
(1973) experiments pigeons pressed a treadle 
either to obtain food or to avoid shock. Both 
visual and auditory discriminative stimuli 
were present. Visual stimuli were prepotent 
in controlling behavior when food was being 
obtained; auditory stimuli were prepotent 
when shock was being avoided. 

What these results may indicate is not 
simply that standard learning can show in- 
stances of CS to US specificity, as occur in 
taste aversion acquisition, but that the fre- 
quency of prepared and contraprepared as- 
sociations is greater and more pervasive than 
has been previously assumed. Clearly, the 
preparedness of associations can affect experi- 
mental results, and taste aversions are by no 
means the only example. Only additional ex- 
periments will determine the extent of CS to 
US specificity in traditional learning. 

Differences between species. As a result of 
the influence of Darwin's theory of natural 
selection, all species are thought to have 
evolved from some small number of ancestors 
and thus to have many characteristics in com- 
mon. This belief in the continuity of the 
species is fundamental for animal psycholo- 
gists who spend their careers working with 
lower organisms, thereby developing princi- 
ples that many of these researchers hope will 
one day be applied to human beings. Never- 
theless, the taste aversion literature has 
pointed out some very striking species-specific 
differences. In particular, for some species 
gustatory cues do not serve as the most pre- 
pared CS in illness-induced learning. Instead, 
visual cues are prepotent. Wilcoxon, Dragoin, 
& Kral (1971) provided quail with blue sour 
water to drink and then poisoned the quail. 
Aversions were formed to the blue color and 
not to the sour taste of the fluid. This modal- 
ity preference is adaptive, for quail’s food 
consists of seeds that are covered by a hard, 
flavorless shell. To determine what is and is 
not poison, diurnal birds that consume seeds 
must use visual cues (Garcia et al., 1974; 
Gustavson, 1977). 

It could be argued that the ability of quail 
to form visually based illness-induced aver- 
sions is solely attributable to the fact that 
the visual apparatus of quail, diurnal birds, 


282 


is so highly developed. But this argument 
cannot be made for results from taste aversion 
experiments that employed guinea pigs 
(Braveman, 1974, 1975b). This species has 
no difficulties associating visual cues with ill- 
ness, despite the fact that their visual system 
is no more developed than that of the rat. In 
contrast, rats do not easily associate visual 
cues with illness, Guinea pigs and quail search 
for their food by day and use their visual 
Systems in this search. Rats obtain food at 
night using their visual systems to a lesser 
extent. The stimuli of the modality used in 
searching for food may be those that are pre- 
pared to associate with illness rather than 
simply the stimuli perceived by the most 
highly developed sensory system (Braveman, 
1974, 1975b). 

Gustavson (1977) summarized much of the 
evidence on the formation of illness-based 
aversions in different species, He concluded 
that to a surprising extent, these aversions 
are formed in a similar manner across many 
Species. The prepotent CS modality differs 
depending on the particular ecological niche 
of the species, but overall the effects of CS 
and US intensity, and the maximum interval 
between the CS and the US, are strikingly 
alike across species. It should also be noted 
that species differences in cue modality 
predominance are not confined to the taste 
aversion paradigm (see Kalat, 1977, for a 
discussion of this point). 

CS salience as a. function of the US. Not 
only are some CSs predisposed to be associ- 
ated with some USs in taste aversion learn- 
ing but, even more specifically, some CSs 
may be associated with injections of certain 
aversive substances and not others. Weis- 
inger, Parker, and Skorupski (1974) have 
reported experiments showing that if saline 
is used as the CS it will become aversive if 
insulin, which decreases blood sugar, is used 
as the US. However, if formalin, which causes 
a need for sodium, is used as the US, an aver- 
sion to saline will not form. On the other 
hand, when sucrose is employed as the CS 
the opposite results are obtained. Domjan 
and Levy (1977) pointed out the possible 

evolutionary basis of this behavior: Animals 
low in blood sugar are unlikely to become 
ill while consuming sucrose, nor while con- 
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suming saline when they are low in salt, In 
keeping with these findings, Frumkin (1975) 
was unable to train adrenalectomized (adren- 
alectomy causes a decrease in body sodium) 
rats to avoid sodium chloride or to train para- 
thyroidectomized (parathyroidectomy causes 
a decrease in body calcium) rats to avoid 
calcium lactate. 

Contrary to Weisinger et al.’s and Frum- 
kin’s results, Domjan and Levy were able to 
obtain conditioning of sucrose and saline with 
both insulin and formalin, Domjan and Levy 
could not completely isolate the reason for 
the discrepancy in results, although they 
eliminated many possibilities. Trent and 
Kalat (1977) obtained data similar to those 
of Domjan and Levy. They trained a taste 
aversion to sodium chloride in sodium-defi- 
cient subjects. They attributed their success 
to the design of the experiment: During test- 
ing the rats did not have a strong sodium 
hunger that would have masked any aversion 
to sodium. Clearly, even tastes that possess 
some kind of a positive biological significance 
can be made aversive in a taste aversion 
paradigm. This is consistent with the find- | 
ing that the vaginal, sexual attractant secre- 
tion of the female hamster is easily associated. 
with illness in male hamsters (Johnston & 
Zahorik, 1975; Zahorik & Johnston, 1976). 


Generalization of the CS 


Variation along more than one stimulus 
attribute. Generalization of the CS does ос 
cur to substances that have tastes similar (0 
the CS. Domjan (1975) reported that after 
saccharin had been made aversive through 
Contingent poison, casein hydrolysate, but | 
not vinegar, was also subsequently avo! 
Domjan drew the plausible conclusion that 
casein hydrolysate tastes more like sacchari? 
than does vinegar. In experiments in whic 
rats actively ingested LiCl (Balagura 
Smith, 1970; Nachman, 1963; D. Е. Smith ® 
Balagura, 1969), the taste of the LiCl "| 
came an aversive CS and the aversion gue 
alized to NaCl, a substance that causes I. 
sponses similar to LiCl in the chorda pe 
nerve (Nachman, 1963). These results ud 
comparable to those obtained in tradito 
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conditioning procedures (Sutherland & Mack- 
intosh, 1971). 

Variation along one stimulus attribute. 
By testing for the amount of aversion to 
stimuli similar to the CS except for varia- 
tions in one CS attribute, it is possible to de- 
termine the shape of a taste aversion gen- 
eralization gradient. Generalization curves 
found using the usual laboratory paradigms 
take one of two basic forms: a straight line 
with positive or negative slope (stimulus in- 
tensity dynamism) or a curve passing through 
a minimum at the CS. The former is obtained 
when generalization is investigated along an 
intensity continuum such as brightness or 
loudness. The latter is observed with con- 
tinua that change qualitatively as they are 
varied, for example, wavelength and pitch. 

Despite suggestions that taste aversion gen- 
eralization curves may demonstrate no stim- 
ulus control (Seligman & Hager, 1972), simi- 
lar curves, and therefore similar equations, 
describe results of generalization testing in 
taste aversion and traditional learning para- 
digms when comparable modalities are em- 
ployed. Several investigators have performed 
illness-induced aversion generalization experi- 
ments by, poisoning after consumption of 
one saturation of blue water (Czaplicki, Bor- 
tebach, & Wilcoxon, 1976), one saccharin 
Concentration (Logue, Note 1), or one salt 
Concentration (Nowlis, 1974) and then test- 
ing with varying saturations or concentra- 
tions. The results suggested stimulus inten- 
sity dynamism with greater aversion as the 
tested stimulus increased in strength (Czap- 


» licki et al., 1976), a generalization curve that 


Passed through a minimum at its center, show- 
ing less aversion to test stimuli either stronger 
9r weaker than the CS (Logue, Note 1), or 
both (Nowlis, 1974). Since saccharin con- 
centration was the only clearly nonintensity 
continuum used in these experiments (the 
taste of saccharin changes qualitatively as its 
concentration changes; Collier & Novell, 
1967), the results are consistent with predic- 
tions from the traditional literature. 


Hedonic Value of the CS 


A much discussed question in psychology 


I5 whether a neutral stimulus paired with a 


283 


reinforcer becomes a signal for that rein- 
forcer or whether thè previously neutral stim- 
ulus itself acquires some hedonic value (Gleit- 
man, 1974). Taste aversion theorists have 
claimed that a taste aversion CS is unusual in 
that it changes in hedonic value when it is 
paired with the US (Garcia & Hankins, 1977; 
Garcia et al., 1974; Garcia, McGowan, & 
Green, 1972; Garcia, Rusiniak, & Brett, 
1977). In taste aversion experiments, not 
only does the organism act as if the CS 
causes illness but the subject displays other 
behaviors toward the CS indicating the 
changed hedonic tone of the stimulus. Garcia, 
McGowan, and Green (1972) cited as evi- 
dence for this that animals avoid the CS 
from a taste aversion experiment no matter 
where they encounter it, whereas when a 
taste is paired with shock the taste is avoided 
only in the experimental chamber in which 
the conditioning took place. However, note 
that money, a traditional, generalized, condi- 
tioned reinforcer, is of value even in situa- 
tions in which it has never been paired with 
a primary reinforcer. 

Additional evidence for the changed he- 
donic value of a taste aversion CS comes from 
Garcia et al.’s (1977) report of their ob- 
servations of predator-prey relationships. 
These researchers have studied predators’ re- 
actions to their natural prey after consump- 
tion of the prey has been paired with ill- 
ness. When presented with a meat carcass 
previously associated with LiCl, besides show- 
ing retching movements, coyotes demonstrate 
what the authors call conditioned disgust re- 
sponses, including urinating on, burying, or 
rolling on the meat. A third class of behav- 
iors are conflict responses, abnormal responses 
to the prey that take the place of the usual 
prey-predator relationship. 

These types of responses to the CS are 
not confined to the taste aversion paradigm. 
Hearst and Franklin (1977) showed that 
pigeons will withdraw from a response key 
that has a negative correlation with grain. 
Timberlake and Grant (1975) used the pres- 
ence of a rat as a food predictor for another 
rat. Subjects in the experiment directed so- 
cial behaviors, not eating responses, toward 
the predictive rat. The authors stated that 
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their experiment provides support for the 
hypothesis that in classical conditioning a 
whole set of related responses is conditioned, 
not just a single reflex. Further, the concept 
of changed hedonic value has been present 
for many years in a traditional theory of 
human motivation under the name of func- 
tional autonomy (Allport, 1937, chapter 7). 
According to Allport, some stimuli paired 
with reinforcement become reinforcing in their 
own right and reduce drives that are not 
necessarily the same as those reduced by the 
original primary reinforcer. 


Summary 


This section has made abundantly clear 
the problems that arise in trying to compare 
taste aversion with so-called traditional learn- 
ing. Although it was possible to establish 
some general effects that occur for both taste 
aversions and traditional tasks, for example, 
better learning with more intense CSs and 
USs and generalization of the CS to similar 
stimuli, other areas were not as easy to eval- 
uate. There is evidence that some target re- 
sponses in standard learning paradigms are 
easier to acquire with particular reinforcers 
than are others. Similarly, there are sugges- 
tions in the literature that the CS at least 
sometimes elicits a complex of interrelated 
behaviors both for animals acquiring taste 
aversions and for animals acquiring responses 
in traditional tasks. Therefore, with respect 
to specificity of CS to US and the hedonic 
value of the CS, it is difficult to compare 
taste aversion learning with the whole of tra- 
ditional learning. The function that the pre- 
ceding sections serve is to point out that in 
these respects the acquisition of taste aver- 
sions is not unique. 


Temporal Relationship of CS and US 
Trace Conditioning 


Taste aversions are usually conditioned by 
first presenting the CS and then after a de- 
lay ranging from minutes to hours, present- 
ing the US. This is similar to the classical 
conditioning procedure of trace conditioning. 
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In a taste aversion paradigm delays of about 
an hour are common laboratory procedure. 
Learning has even occurred under optimal 
conditions with delays of 24 hours (Etscorn 
& Stephens, 1973). In accordance with what 
has been reported in traditiona! learning ex- 
periments (Renner, 1964), it has consistently 
been found that the strength of a learned 
taste aversion is less with longer intervals 
between the CS and the US during acquisi- 
tion (see, e.g, M. К. Best & Barker, 1977; 
BureSova & BureS, 1974; Garcia et al., 1966; 
Kalat & Rozin, 1971; Nachman, 1970; Re- 
vusky, 1968; J. C. Smith & Roll, 1967; Wil- 
coxon, 1977). 

Although most experiments in traditional 
learning have shown that conditioning is pos- 
sible only with CS-US delays of at most a 
few seconds (Kimble, 1961), a few experi- 
ments have demonstrated learning with 
longer delays. The latter studies examined 
the delay of reinforcement gradient by em- 
ploying long intertrial intervals in two ways: 
Either reinforcement on one tria! determined 
whether a response would be reinforced on 
the next trial or reinforcement for a trial 
was simply delayed until the start of the 
next trial. The first type of study has been 
reported by Capaldi (1967), Petrinovich and 
Bolles (1957), Petrinovich, Bradíord, and 
McGaugh (1965), Pschirrer (1972), and 
Tyler, Wortz, and Bitterman (1953). In Се 
paldi’s study rats ran in a runway, and rein- 
forcement was presented on alternating trials. 
Despite 24-hour intertrial intervals the rats 
learned to run slower on trials that would not 
receive reinforcement. 

Lett (1973, 1974, 1975) performed the 
second type of long-delay, traditional learn- 
ing experiments. For her studies she used 
mazes and delays of up to 1 hour between 
when the rat made its choice and was Те 
moved from the maze and when it was Put 
back in the maze for another trial and те 
ceived its reinforcement for the first trial. 
In a recent investigation (Lett, 1977), 16 
ward was given to a subject in its home cage 
following a session in the T maze. Animals 
still learned with delays of up to 2 hours be- 
tween removal from the T maze and rewa" 
in the home cage. Lett was able to obtain 
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long-delay learning only when she removed 
apparatus for the delay. 

Lett’s experiments have been offered as 
support for Revusky's (1971, 1977a) con- 
current interference theory of long-delay 
learning. Revusky has stated that the maxi- 
mum delay possible between the CS and US 
is an inverse function of the number of ad- 
ditional associations to the CS and US, other 
than the reference CS-US association, that 
form during the delay. These additional as- 
sociations interfere with CS-US learning. 
Taste aversions are learned over long delays 
because other stimuli predisposed to associate 
with illness infrequently intervene between 
the CS and US. On the other hand, in tradi- 
tional conditioning procedures employing a 
light or tone as the CS, other visual and 
auditory stimuli are very likely to intervene 
between the CS and US unless the CS-US 
delay is very short (see Kalat & Rozin, 1971, 
and Rozin & Ree, 1972, for additional sup- 
port and criticism of the interference theory 
of long-delay learning). 

Theories other than Revusky’s have been 
offered to explain the differences in maximum 
delay of reinforcement usually found be- 
tween taste aversion and traditional learn- 
ing. An explanation based on the presence of 
an aftertaste that would bring the CS closer 
to the US has been sufficiently laid to rest 
(Garcia, Hankins, Robinson, & Vogt, 1972; 
Garcia, McGowan, & Green, 1972; Revusky 
& Garcia, 1970; Rozin, 1969). An alterna- 
tive explanation relies on evidence of the 
unusual duration of the gustatory memory 
trace (Krane & Wagner, 1975). Regardless 
of which theory one wishes to accept or pro- 
pose, if there is a difference between taste 
aversion and standard conditioning with re- 
Spect to maximum delay of reinforcement it 
15 а quantitative one; the data in question 
involve simply longer or shorter delays with 
Perhaps some degree of overlap between the 
two paradigms. 


Backward Conditioning 


Backward conditioning, learning when the 
ЫМ presented just prior to the CS, was 
itionally assumed to Бе impossible 
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(Kimble, 1961). But recently some standard 
learning experiments using more sophisticated 
procedures have reported backward condi- 
tioning (Heth & Rescorla, 1973; Keith-Lucas 
& Guttman, 1975; Wagner & Terry, 1975). 
In these experiments, maximum delays be- 
tween the US and the CS are in the order of 
seconds. Results from taste aversion experi- 
ments are strikingly different. Although the 
very first attempt to demonstrate backward 
conditioning was not successful (Garcia & 
Kimeldorf, 1957), since then backward con- 
ditioning has been repeatedly obtained. Still 
at issue, however, is the maximum delay be- 
tween the US and CS that will still support 
learning. Values range from .5 to 12 hours 
(Barker, Suarez, & Gray, 1974; Boland, 
1973; Domjan & Gregg, 1977; Scarborough, 
Whaley, & Rogers, 1964). Domjan and Gregg 
reported that the maximum delay depends 
on the intensity of the CS. 

It is not clear why this discrepancy in 
maximum US-CS delays between taste aver- 
sion and more standard learning exists, but 
it may have something to do with US dura- 
tion. The US-CS delays in the taste aver- 
sion experiments were calculated as the time 
between irradiation (or poison injection or 
intubation) and the opportunity to taste. 
This is an inaccurate estimate, since the US 
consists of the sensations of illness, not of 
the manipulation that causes the illness, and 
since the CS consists of the actual taste, not 
simply of the availability of a taste. Illness 
resulting from poison or irradiation may 
still have been present hours later when the 
CS was presented, or even hours after CS 
termination, for an effective combination of 


" backward, simultaneous, and trace condition- 


ing. The traditional experiments all used 
shock of brief duration. These hypotheses 
are difficult to test because of methodological 
problems in measuring illness and aftertastes 
(see Barker, Smith, & Suarez, 1977, for a 
more detailed discussion of these issues), but 
it should be noted that in general, as in tradi- 
tional learning the maximum delay with 
which acquisition will occur in a backward- 
conditioning taste aversion paradigm is 
shorter than in a trace-conditioning taste 
aversion paradigm. 


286 


Summary 


The temporal relationships between the 
CS and the US that support learning are not 
qualitatively different for taste aversion and 
traditional learning. In both cases, whether 
the CS or the US occurs first, close temporal 
contiguity is more effective than very long 
delays, and trace conditioning is more effec- 
tive than backward conditioning. Most often, 
though not always, taste aversion learning 
is supported by longer delays than are pos- 
sible in traditional learning. Several attempts 
have been made to explain this difference by 
reliance on such factors as the amount of 
interfering associations during the delay or 
the duration of the US rather than by re- 
liance on a unique process of taste aversion 
formation. Nevertheless, it is striking that 
taste aversions have been acquired in one 
trial with a 24-hour delay (Etscorn & 
Stephens, 1973); nothing even approaching 
this has been demonstrated in traditional 
paradigms. This distinction between taste 
aversion and traditional learning appears to 
be one instance of such a large quantitative 
difference that simply calling it a quantita- 
tive, and not a qualitative, difference seems 
inappropriate. 


Obtaining and Processing Information 
Novelty 


CS. "Taste aversion conditioning appears 
to be easier with certain tastes than with 
others. Kalat and Rozin (1970) used the 
term salience to describe this finding. How- 
ever, as was discussed above, CS intensity 
affects CS conditionability, and Kalat and 
Rozin did not equate the subjective intensity 
of their solutions, clearly a difficult task. In 
a later experiment, Kalat (1974) found that 
the degree to which a solution was novel 
could account for both CS intensity and 
salience effects. Although in most prior ex- 
periments rats had no previous experience 
with the solutions presented to them, the 
solutions differed to a varying extent from 
what the rats were used to drinking, which 
was plain water. Kalat raised rats on water or 
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a high concentration of a particular solute, 
He then allowed them to drink a low and 
a medium concentration of that solute fol- 
lowed by poison. Rats raised on water formed 
an aversion to the medium concentration, 
whereas rats raised on a high concentration 
formed an aversion to the low concentration, 
There are many additional experiments that 
have carefully demonstrated the difficulty in 
associating a taste with illness if the taste 
has previously been presented without ill- 
ness (e.g, M. R. Best & Gemberling, 1977; 
Fenwick, Mikulka, & Klein, 1975; Kalat & 
Rozin, 1973; Kiefer, Phillips, & Braun, 1977; 
Revusky & Bedarf, 1967; Siegel, 1974). This 
is comparable to the procedure in classical 
conditioning known as latent inhibition. 
There too a CS presented without a US, and 
before any presentation of a US, is more 
difficult to condition with that US (see Lu- 
bow, 1973, for a review of the literature). 
There has recently been a great deal of 
controversy in the taste aversion literature 
over whether noncontingent illness can by 
itself cause an aversion, or an increased aver- 
sion, to novel foods similar to the process of 
sensitization in classical conditioning (Bit- 
terman, 1976; Garcia, Hankins, & Rusiniak, 
1976; Mitchell, 1977; Mitchell et al., 1977; 
Revusky, 1977b). Appeals to sensitization 
effects have been used in attempts to explain 
away the unusual associative properties of 
tastes with illness. Current evidence indi- 
cates that there are two types of enhanced 
aversion to novel foods following illness 
(Domjan, 1977). The first is a short-lived 
aversion to all novel foods, as a direct effect 
of the poison-induced illness. This phenome- 
non is observed after taste-contingent O 
taste-noncontingent poisoning and thus !5 
comparable to sensitization. Second, after 
taste-contingent poison presentation, a more 
lasting aversion occurs to novel foods that 
taste similar to the taste paired with illness. 
The aversion to the taste paired with illness 
generalizes to similar novel foods, an 450010" 
tive process (ie. not sensitization). There 
fore if subjects are completely recover 
from illness before any aversion testing be 
ings, it is extremely unlikely that any 521" 
sitization effects will be observed. Most ©% 
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perimenters have indeed waited until the 

| animal were no longer ill before beginning 
testing (for similar opinions and supporting 
evidence, see Garcia & Hankins, 1977; Re- 
vusky, 1977c; Rozin, 1977; Testa & Ternes, 
1977; Wilcoxon, 1977; Zahorik, 1977). 

US. Parallel to the finding with CSs and 
contrary to what sensitization would predict, 
if a US is presented alone and not correlated 
with a CS, subsequent conditioning of a 
taste aversion with that US is retarded (Can- 
non, Berman, Baker, & Atkinson, 1975; Cap- 
pell, LeBlanc, & Herling, 1975; Elkins, 1974; 
Kiefer et al., 1977; Mikulka, Leard, & Klein, 
1977). This was demonstrated even when 
tolerance and addiction effects were con- 

trolled for (Bravemen, 1975a; Vogel, Note 

! 2). The complementary US preexposure ef- 
fect has been noted in traditional condition- 
ing experiments (e.g, Ayres, Benedict, & 
Witcher, 1975; Mis & Moore, 1973; Siegel 
& Domjan, 1971). 


Learned Irrelevance 


Learned irrelevance is the term that Mack- 
intosh (1973, 1974) used to describe an 
'erganism's learning that a given CS and US 
ate not correlated. This occurs with random 
Présentations of both the CS and the US 
ànd is more deleterious to further condition- 
Mg than simply presenting the CS alone 
(latent inhibition). Mackintosh cited a great 
deal of evidence to show that organisms can 
learn a lack of correlation between the CS 
and the US. So far this phenomenon has not 
ien explicitly investigated in taste aversion 
BER This is probably due to the prob- 
By oe in randomly presenting CSs 
a Ss without having some association 
"i de between them as a result of the 
an elays of reinforcement possible with 
Ea aversions. That taste aversions do not 
m extremely long. CS-US delays, 
ths iS. ly because the organism learns that 
lin 15 not associated with poison, has 
described as demonstrating learned ir- 


relevance jn ^ 
Bt i 
| (К, alat, 1977), aste aversion paradigm 
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Blocking 


There are at least two basic procedures 
with which to study the blocking effect. 
Kamin originally reported blocking in 1969 
upon showing that prior conditioning with 
one member of a compound stimulus blocked 
later conditioning to the other member of 
the compound when the compound was paired 
with a US. Revusky (1971) obtained block- 
ing in a taste aversion experiment using this 
type of a blocking paradigm and taste stim- 
uli. Kalat and Rozin (1972), also employing 
taste stimuli, were unable to confirm this re- 
sult; the novel member of the compound 
stimulus paired with the US still became 
aversive. However, subsequent experiments 
have demonstrated that this type of block- 
ing does occur in taste aversion learning (Gil- 
lan & Domjan, 1977; Rudy, Iwens, & Best, 
1977). Rudy et al. first paired novel extero- 
ceptive cues with illness. Later, acquisition 
of an aversion to a taste was impeded when 
the compound of the taste and the familiar 
exteroceptive stimulation were paired with 
illness. Gillan and Domjan used only oral 
stimuli in their experiments. 

The second type of procedure used in dem- 
onstrating blocking was employed by Wagner, 
Logan, Haberlandt, and Price (1968). In 
this method one compound containing two 
stimuli is paired with the US, and another 
compound containing one stimulus from the 
first compound plus a third stimulus is pre- 
sented but not paired with the US. A test for 
learning is then conducted with the stimulus 
common to both compounds. Subjects in 
Wagner et al's experiments demonstrated 
blocking by responding less to the common 
stimulus under these reinforcement condi- 
tions than when each compound was followed 
by the US half the time. Luongo (1976) ob- 
tained similar results from a taste aversion 
experiment. 

There have been other attempts to manip- 
ulate the amount of information provided by 
a taste CS paired with an illness US (see 
Kalat & Rozin, 1972; Revusky, Parker, & 
Coombes, 1977), but since comparable tradi- 
tional designs have yet to be performed, it is 
difficult to interpret the taste aversion re- 


sults. 
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Conditioned Inhibition 


A conditioned inhibitor is a cue that guar- 
antees that reinforcement will not occur when 
it might otherwise be expected to occur; it 
has a negative correlation with reinforce- 
ment (Kimble, 1961; Mackintosh, 1974). 
Conditioned inhibition has been demonstrated 
in taste aversion. M. R. Best (1975) paired 
saccharin with illness and then saline with 
no illnéss. The subjects subsequently chose 
between saline and some third liquid. Their 
preference for saline as compared with the 
other liquid available was greater than that 
of control animals. In Taukulis and Re- 
vusky's (1975) experiment saccharin was al- 
ways followed by illness, but saccharin ac- 
companied by an odor was never followed by 
illness. The odor then satisfied three of 
Rescorla’s (1969, 1971) conditions for clas- 
sification as a conditioned inhibitor: The 
presentation of the odor with a CS inhibited 
the conditioned response to the CS, aversion 
conditioning of the odor was harder than 
aversion conditioning of a neutral stimulus, 
and the odor enhanced conditioning to a 
neutral stimulus when odor and neutral stim- 
ulus presented together were followed by re- 
inforcement. 


Sensory Preconditioning 


The first step in attempting sensory 
preconditioning is to pair two stimuli. One 
of them is subsequently reinforced. If the 
other, nonreinforced stimulus also acquires 
motivational properties, as if it too had been 
paired with the reinforcer, sensory precon- 
ditioning has taken place. This phenomenon 
has frequently been demonstrated in tradi- 
tional laboratory experiments (see, e.g., Brog- 
den, 1939; Rizley & Rescorla, 1972). It has 
also been shown to occur in a taste aversion 
paradigm (Lavin, 1976). Lavin exposed rats 
to two tastes, one right after the other. He 
then followed a presentation of one of these 
tastes with illness. The taste that was not 
paired with illness became aversive. In addi- 
tion, this example of sensory preconditioning 
in a taste aversion paradigm occurred only 
if during the original pairing the two tastes 
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were separated by no more than a few sew 
onds, as in traditional paradigms. 


Second-Order Conditioning 


Second-order conditioning is similar to 
sensory preconditioning except that the two 
stimuli are presented together after опе of 
them has been paired with illness instead of 
before the pairing (see Rizley & Rescorla, 4 
1972). Although it has been difficult to estab- 
lish the effect in traditional classical condi- 
tioning experiments, Rescorla's (1977) те 
cent experiments have been successful. Sec- 
ond-order conditioning has also been shown 
for taste aversion learning. P. J. Best, Best, 
and Mickley (1973) paired a taste cue with 
a visual cue previously made aversive through 
contingent illness. The taste was subsequently 
avoided. Bond and Harland (1975) used a. 
similar procedure, but employed only tastes 
as stimuli; they also obtained second-order 
conditioning. 


Summary 


The ways in which information is obtained 
and processed could potentially have revealed | 
qualitative as well as quantitative differencs 
between taste aversion and traditional learn 
ing. For example, had blocking, learned it 
relevance, and conditioned inhibition 10 
been found to occur within taste aversion | 
paradigms, a good case could have been made 
for the necessity of using different basic prin 
ciples to describe the learning of taste avet 
sions. Nevertheless, the data appear to show 
that in every respect the acquisition an^. 
manipulation of information are quite similar 
for taste aversion and traditional learning. 


Age Effects ‘ 


Several experimenters who used а, Б. 
aversion paradigm have reported learning / 
rat pups (Ader & Peck, 1977; Galef & Sherry 
1973; Grote & Brown, 1971; Klein, мша 
Domato, & Hallstead, 1977; Rudy & Ce у 
1977). One trial is often all that is necessa! у 
ior learning to occur (Адег & Peck, | ih 
Galef & Sherry, 1973; Grote & Brown, Da 
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Rudy & Cheatle, 1977). This seems incon- 


'sistent with results obtained in traditional 


learning in which acquisition of passive 
avoidance tasks, a paradigm similar in many 
ways to the taste aversion paradigm, is worse 
in weanling than in adult rats (Brunner, 
1969; Campbell & Coulter, 1976; Riccio, 
Rohrbaugh, & Hodges, 1968). However, it 
would be premature to conclude, based on 
this evidence alone, that weanling rats learn 
taste aversions as easily as adults, but learn 
passive avoidance tasks much worse than 
adults. 

Although young rats do learn taste aver- 
sions, it may not be as easy for them to ac- 
quire aversions as it is for adult rats (Klein, 
Domato, Hallstead, Stephens, & Mikulka, 
1975). On the traditional learning paradigm 
side of the question, Feigley and Spear 


.(1970) reported an experiment in which they 


showed that weanling rats do as well as adults 
at a passive avoidance task if the young 
tats are trained in an apparatus proportional 
to their body size. The question of the com- 
parative conditionability of weanling and 
adult rats in standard and taste aversion 
learning paradigms is far from settled. 
Similarly, it first appeared that rat pups, 
unlike adults, show little long-term reten- 
tion of all types of tasks (Campbell & Coul- 
ter, 1976; Feigley & Spear, 1970) except 
laste aversions (Ader & Peck, 1977; Klein 
et al, 1977). However, a recent article by 
Coulter, Collier, and Campbell (1976) has 
own that if a conditioned emotional re- 
"nes paradigm is used in which the animal’s 
an 15 irrelevant to the performance of the 
ae may be the case with taste aversion 
cae long-term retention is found even 
is hr animals are trained at a much earlier 
Bom cd than was previously thought 
® Cau, or traditional conditioning. The rats 
d ulter et al.’s studies were at least as 
ung as those used in the taste aversion re- 
ntion experiments. 


Conclusions 


A detailed 


data comparison of taste aversion 


with results from more standard para- 
1S significant not for the number of 
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differences revealed between the two research 
areas, but for the number of similarities. In 
virtually all cases the same principles are 
sufficient for describing taste aversion and 
traditional learning data. In addition to the 
qualitative similarities, the two research 
areas are also often quantitatively similar. 
To take a simple example, taste aversions 
are extinguished by presenting the CS with- 
out the US, as in a standard classical condi- 
tioning experiment; similar principles apply 
for both paradigms. Further, several and 
sometimes many trials are needed for extinc- 
tion of a taste aversion to occur, numbers 
that apply as readily to tasks learned in tradi- 
tional paradigms. 

Some qualifications are in order, however. 
Two problems of comparing taste aversion 
with traditional learning were stated in the 
Introduction. These were, first, defining the 
laws of learning and, second, specifying what 
constituted a deviation from one of these 
laws. After reviewing the literature, a third 
difficulty became apparent. The research dis- 
cussed in the section entitled Traditional 
Learning is not homogeneous. Within that 
category are many examples of what Selig- 
man (1970; Seligman & Hager, 1972) would 
call prepared and contraprepared learning. 
In addition, much of traditional learning 
also involves feeding or interoceptive stimuli. 
The characteristics of taste aversion learning 
that are said to be unique appear frequently 
to be better described as characteristics of 
prepared or feeding behavior. Any compari- 
son of taste aversion and traditional learn- 
ing should take these factors into account. 
Nevertheless, this review has demonstrated 
that at the level of the whole organism there 
is little basis at this time for a claim that 
there are qualitative differences between taste 
aversion learning and traditional learning, as 
best demonstrated by the sections entitled 
Obtaining and Processing Information and 
Generalization of the CS. 

At several points in this article the data 
suggested that taste aversion learning shows 
certain quantitative differences from the 
learning of most other traditional tasks. In- 
cluded among these are differences concern- 
ing rapidity of acquisition and long-delay 
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learning. Although one might say that these 
are only parametric peculiarities, they are 
important determinants of behavior, which 
may be evolutionarily determined, and they 
should not be ignored. It was also noted that 
the fact that taste aversions are acquired in 
one trial with extremely long delays is such 
a large parametric difference that in this in- 
stance the qualitative-quantitative distinc- 
tion appears inappropriate. 

To be able to describe and predict what 
Occurs in taste aversion learning, it is neces- 
sary to have a full understanding of each 
species’ feeding behavior under natural condi- 
tions. Otherwise, one would be unlikely to 
predict that, for example, rats will more 
easily associate tastes than exteroceptive 
stimuli with illness. The unusual quantitative 
properties of taste aversion learning men- 
tioned above are in accord with what is 
adaptive for the organism (Kalat, 1977). 
Credit must be given to those who are call- 
ing for more examination of laws of learn- 
ing and behavior with regard to these laws’ 
adaptiveness for a species’ ecological niche 
(e.g., Bolles, 1973; Garcia et al., 1977; Hinde, 
1973; Lockard, 1971; Rozin, 1976, 1977; 
Schwartz, 1974; Seligman & Hager, 1972). 

Many of these authors have in addition 
disparaged some of the more traditional learn- 
ing theorists, who they claim stated that any 
stimulus is equally associable with any re- 
Sponse and that laws of learning are the 
same across all species. However, the tradi- 
tional theorists very often did realize that 
there are differences between species, but 
thought that the similarities between species 
are the more interesting and more important 
phenomena to study in developing a science 
of behavior (see, e.g. Skinner, 1959, pp. 
374-375). Other psychologists unfortunately 
too often interpreted this belief as a theoreti- 
cal assumption of more similarity than the 
data ever indicated. 

Recent emphasis on discoveries of diver- 
sity and complexity in the characteristics of 
learning does not mean that an attempt must 
now be made to describe new principles and 
laws. Instead, in at least some cases the tra- 

ditional ones should be made flexible enough 
to deal successfully with the species- and 
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task-specific characteristics that are mo 
prevalent than previously recognized (| 
Krane & Wagner, 1975; Rozin, 1977; 
Seligman, 1973, for similar views). With 
out a system of general laws we might find 
ourselves in an overwhelming jungle of unique 
abilities and behaviors. In addition, a hy- 
pothesis of general laws of learning is not 
necessarily unreasonable from an evolutionary. 
standpoint. Since every species has been 
shaped by evolution and since certain selec- 
tion pressures are common to us all, species: 
general behaviors could result (Lockard, 
1971; Rozin & Kalat, 1971). Further, phy- 
letic closeness (Lockard, 1971), economy of | 
neural wiring of the organism (Rozin & 
Kalat, 1971), or the uniformity of rules of, 
future prediction (Revusky, 1977d) may 
give rise to similar processes of learning in 
different species and in different situations. | 
The effects that taste aversion experiments 
have had on preexisting psychological the- 
ory are consistent with Kuhn's (1962) de 
scription of the progress of scientific revolt 
tions. Anomalous taste aversion data chal: 
lenged traditional theory (Rozin, 1977). 
These data were different enough to impel 
some psychologists (e.g., Seligman, 1970) 
to suggest that many of the standard laws 
would not apply. Others (e.g., Rozin & Kalat, 
1971) advocated the replacement of the ol 
paradigm, general laws of learning, with à 
new one, a principle of evolutionary adap- 
tiveness. A thorough comparison of the taste 
aversion and traditional learning data noW 
reveals them not to be so discrepant after 
all. Yet without the emphasis by various 
learning theorists on the anomalous chan 
of the taste aversion results, it is doubtful 
whether the somewhat unusual aspects di 
these data would ever have been notice 
(see also Boring, 1929). | 
After the proposal of general laws of n 
ing and a recent swing to laws that are p 
and specialized, a less extreme swing, back E 
the eclectic middle ground, is due. This VI^ 
recognizes the existence and utility of 
eral laws of learning, but it also je 
the necessity of acknowledging and inves 
gating the dissimilarities in the learning 0 
different species and in the learning of di 
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ferent tasks. Otherwise we are likely to as- 
sume generality where none exists. The re- 
search on taste aversion has taught us at 
least this much. 


Reference Notes 
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Most previous accounts of the factor-score indeterminacy problem have failed 
to give empirical meaning to alternative factor variables. An empirical interpre- 
tation of alternative factor variables is developed in terms of alternative tests of 
infinite length that include a given core set of variables as a subtest. We show 
that if the core set can be given the same factor loadings on a factor when 
analyzed alone or ih the context of an infinite domain of variables, then there is 
just one factor variable in the domain that is a possible factor variable of the 
core set. If consistent factor loadings cannot be found, there is no factor variable 
in the domain that is a possible factor of the core set. In the latter case, alterna- 
tive subdomains of variables may contain alternative possible factors of the 
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core set. 


` 
Psychologists who use factor analysis un- 
doubtedly know that the common factor 
model is considered to have special problems 
that arise from the indeterminacy of its 
factor scores. Nevertheless, the actual nature 
of these special problems is almost certainly 
not widely understood. 
box publications (Green, 1976; Mc- 
AM 1974, 1977; Mulaik, 1972, 1976; 
Dum & McDonald, 1978; Schénemann, 
pes Schónemann & Steiger, 1976; Schóne- 
ànn & Wang, 1972) have dealt with the 
iom of factor-score indeterminacy in a 
ighly technical manner. Consequently, the 
Fast ‘who uses factor analytic tech- 
E Occasionally in his research may be 
i igi about the conflicting views that 
il expressed on this problem and its 
SB g > his own research, and he may be 
rob] Te о a developing consensus about the 
a and its implications. 
i Eun of this article are first to give 
"os ively nonmathematical account of 
T-score indeterminacy, second to review 
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the major recent discussions of it, and third 
to offer a resolution of the conflicting views 
that have arisen, while making clear the 
implications of factor-score indeterminacy for 
the definition of common factors and for the 
practice of factor analytic research. 

Factor-score indeterminacy refers to the 
fact that the common and unique factor scores 
in the common factor model are not uniquely 
determined by the observed variables whose 
correlations they explain, since in general the 
multiple correlation between a common or 
unique factor and the observed variables is 
less than unity. 

This fact has sometimes been taken to 
mean that one cannot obtain exact scores on a 
common factor and that one must therefore 
settle for regression estimates of them. Such 
a view is not strictly correct, as it is quite 

ssible to construct numbers that behave 
precisely like scores on a given common factor. 
The difficulty is instead that infinitely many 
sets of such numbers can be constructed, each 
set in correspondence with a given set of 
observations. In other words, the factor scores 
do not have a unique mathematical definition. 
Given a factor analysis of the observed vari- 
ables, with factor pattern, factor structure, and 
factor correlations all determined to one’s 
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satisfaction (i.e., with rotational indeterminacy 
eliminated), there are still infinitely many 
random variables that can satisfy the condi- 
tions for being a possible factor variable that 
corresponds to each column of factor loadings, 
and there need not be a high correlation be- 
tween two alternative possible factor variables. 

The observation that factor scores lack a 
unique mathematical definition has led some 
writers to conclude that there is something 
wrong with the common factor model and 
that we should abandon it in favor of some 
approximately equivalent model that is not 
also believed to have something wrong with it. 
The usual alternative is some version of 
component theory (e.g., Kaiser, 1970; Schóne- 
mann & Steiger, 1976). 

On the other hand, many psychologists who 
use factor analysis would not see any im- 
portance in the questions that have been 
raised about the indeterminacy of factor 
scores. Essentially, they know very well how 
seldom they require the assessment of factor 
scores, and they see no logical connection 
between questions about factor scores and 
the widely accepted aim of factor analysis, 
namely, the interpretation of a common factor 
in terms of the common attribute of the tests 
that have high factor loadings on it. (Indeed, 
so strong is the effect of this practice that some 
users forget basic theory and come to think 
of a factor as a profile of factor loadings of 
tests, not as a score that characterizes a 
subject.) 

It turns out, however, that in exploratory 
factor analysis as it is usually employed, the 
range of possible mathematical constructions 
of a possible factor variable corresponds to a 
range of possible ambiguity in the interpre- 
tation of a common factor. Moreover, this 
ambiguity cannot be avoided by approxi- 
mating the model with components. In its 
broader implications, factor-score indetermin- 
acy does not merely concern the problem of 
obtaining a score, or the problem of obtaining 
too many scores, that might be the score of 
an individual on a common factor. Rather, 
it concerns the inability of a finite set of 
observed variables in an exploratory factor 

analysis to determine unambiguously what 
attribute of the individuals the factor variable 
represents. This is important, as one will see, 
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because factor analysis has commonly been 
treated as a theory-generating device; that is, | 
it has been treated as a device for the post 
facto discovery of the psychological concepts 
that explain the correlations of the variables 
one has chosen to measure. 

In the following discussion, we simplify the 
issues by concentrating on the special case 
of common factor analysis in which there is 
just one general factor (the Spearman case), 
In this special case, the mathematics can be 
kept simple, and there is no possible confusion 
between the indeterminacy of factor score 
and the indeterminacy of factor loadings due 
to rotation. For the most part, the corre 
sponding results for multiple factors are 
straightforward analogues. 

L 
The Basic Results | 

We begin by considering a vector of y 
random variables, y’ = (Vi, ..., V»), each of 
which has a mean of zero and a standard 
deviation of unity. In most applications one 
can think of y as corresponding to the score 
on n tests obtained by a person randomly 
drawn from an infinite population Ф. 

A very simple and sufficient definition of the 
general factor model is to postulate that there 
exists a random variable X (an addition 
variable defined on the set of persons mM 
such that the (linear) regression estimator 
of Y; ..., Y, from X have mutually Ш“ 
correlated residuals. This defines the model 
completely. We express this statement вуш 
bolically by writing 

$ -tX, (0) 
where ў = (А... Ро), f^ Љ 9 
and Ў; is the regression estimator of y; from 


X with slope fj. The vector e of residuals É 
then defined by i 
0 


e-y-j, 
and by definition the residuals are uncorrclatt 
with X. It then follows that the @ 
correlation matrix y takes the form 


4 
R= ff +U, | 
where by the assumption of uncorrelat 
residuals, 
ü 


(sero (a PEE 70 


COMMON FACTORS 


In conventional terminology, the regression 
‘weights f; are factor loadings. These are also 
the correlations of the tests with the general 
factor. The residual variances wj; = 1 — ју 
are the unique variances. 

In applications of the model, one attempts 
to fit the hypothesis in Equation 3 to a sample 
correlation matrix. If the fit is acceptable, 
one interprets the general factor X as whatever 
attribute of the persons in Ф seems to be 
common to the tests. 

It is reasonable to suppose that all unique 
variances are greater than zero, that is, no 
test is perfectly correlated with the factor, if 
only because of errors of measurement. If 
these assumptions are true and the number of 
tests is finite, the multiple correlation of the 
common factor X with the observed variables 
is strictly less than unity. The regression 
estimator of X from y is given by 


£ =ťRy, 6) 


and the corresponding squared multiple corre- 
lation p? of X and X is given by 
p = f'R?f. (6) 


| , Because of Equation 4, Equation 5 can be 
expressed as 


IS 


Ea Y; 7 
тј У 
and Equation 6 can be expressed as 
ё = g/g - 1, © 
where 
e= 2/0- fA] ©) 
= 


ue €g., McDonald & Burr, 1967). From the 
pe of Equation 8, if g is finite, р is strictly 
еза unity (since g is strictly less than 
f к ). If one can conceivably find infinitely 
пи ree with nonzero loadings on X, 
е limit as approaches infinity, 

Approaches infinity and xe ci 
lim р? = 1 (10) 


n>% 


Pn 
(Piaggio, 1931, 1933). One can think of 
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Equation 8 with Equation 9 as a generalization 
of the Spearman-Brown formula for the 
reliability of a test as a function of its length 
(cf. McDonald & Burr, 1967, p. 392). In 
effect, one sees that the general factor cannot 
be determined with perfect reliability by a 
test battery of finite length. This simple fact 
is not, however, the problem of factor-score 
indeterminacy. If it were, one would deal with 
it in practice by simply admitting that one 
expects all social science measurements to be 
imperfect, and one would be satisfied with 
values of р? in the general range of acceptable 
reliability coefficients. These values are ordi- 
narily obtainable in factor analytic studies. 
Instead the problem of factor-score in- 
determinacy as it has been presented is as 
follows. A necessary and sufficient condition 
for a random variable X* (in standard score 
form) defined over the population ® to have 
the properties of X in Equations 1-4 is that 


the correlations of X* with Y; ..., Vn be 
given by 
p(X* Y) =f, wherej =1,...,”. (11) 


If a standardized random variable X* has 
the required correlation with y, one says that 
it is a possible factor variable of y. There are 
infinitely many such factor variables definable 
over Ф (Guttman, 1955; McDonald, 1974; 
Mulaik, 1976). (Note also that X is not itself 
a possible factor variable of y.) 

If X; and X; are two standardized random 
variables that have the required correlations 
with y, one can obtain bounds on the corre- 
lation ріг between them by showing that the 
partial correlation of X4 and Xs, with Ys, ..., 
Y, partialed out, is given by 


рву = ре — 0/1 — P. (12) 
Since this partial correlation must lie between 
—1 and +1, it follows that 


29 —1X px € 1, (13) 


where p? is given by Equation 8 as before. This 
result was first given by Guttman (1955) in 
the more general case of multiple factors, 
using a different line of argument. 

Another necessary and sufficient condition 
for X* to have the properties of the general 
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factor X in Equations 1-4 is that 
X*- X + Dt, (14) 


where X is the regression estimator given by 
Equation 7 and D* is any random variable 
defined over the population that has the 
properties of a residual of X about X, that is, 
any random variable that has variance 1 — р? 
and correlations 


p(D*, Ү;) =0, where = 1,...,». (15) 


This result was first given by Spearman (1922). 
It was also obtained by Heywood (1931) 
rather more rigorously and, in the general 
case of multiple factors, by Kestelman (1952) 
and Guttman (1955). (The latter was the 
first to establish that the condition is necessary 
as well as sufficient.) 

That there is a problem of factor-score 
indeterminacy, beyond the mere fact of im- 
precision of the estimates given by X, is first 
noticed when one observes that since the 
range of possible values of р? is from zero to 
unity, the range of possible minimum values 
of py: is from minus unity to unity; and if p is 
less than 1/(2)!, then pi» is possibly zero. That 
is, if the correlation between the estimator 
and a possible factor variable is less than .707, 
the correlation between two such possible 
factor variables can be as low as zero. It is 
not immediately obvious what empirical 
interpretation to place on this fact, however. 


Interpretations of 
Factor-Score Indeterminacy 


Guttman (1955), having introduced the 
bounds (Inequality 13) and having noted in 
particular that one requires р? > .71 if the 
lower bound on p; 15 to be positive, set out 
its implications for the scientific meaning of 
factor analysis in the following remarks: 


It appears from the relation of [20° — 1] to [22] (for 
each intended factor) that the predictability of factors 
from [y]is not merely a practical problem. If [202 — 1] 
is low, it raises the question of what it is that is being 
estimated in the first place; instead of only one ‘primary’ 
trait there are many widely different variables associ- 
ated with a given profile of loadings. . . . If more 
direct observations on factor scores cannot be made 
than statistical analysis of R and [у], the Spearman- 
Thurstone approach may have to be discarded for lack 
of determinacy of its factor scores. (p. 79) 
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Similarly one finds: 


The widespread practice of trying to name or attach 
meaning to factors merely by studying factor loadings 
is clearly suspect if the same loadings can be derived 
equally well from radically different sets of factor 
scores. (Guttman, 1957, p. 149). 


Schónemann and Wang (1972) pointed out 
that in estimating the factor model on the 
Lawley-Rao basis (which is usually employed 
in fitting the unrestricted model by maximum ` 
likelihood), instead of retaining factors corre- 
sponding to eigenvalues greater than unity, 
one would have to retain only factors with 
eigenvalues greater than two 


if one were to insist on factors which are better deter- 
mined than a set of standardized random numbers are 
(in the sense that the scores on factor [X1] can be | 
predicted better from scores on its equivalent twin 
[X2] than from a set of random numbers, (Schóne- 
mann & Wang, 1972, p. 69) 


They then proceeded to show that in empirical 
studies, some factors that are needed to 
explain the correlations do not have accept 
able—apparently meaning  positive—lowtr 
bounds to the correlation between possible , 
alternative factors, and they claimed that 
this appears to create a dilemma for the user 
of the model. Д 
We notice a difficulty in coming to grip 

with these interpretations of factor-Scot 
indeterminacy. As stated above, given at lest 
the existence of errors of measurement, 10 
test of finite length will be perfectly correlated 
with the factor, hence no test of finite leng 
will have the properties of a common facto! 
variable. It is not enough to be told Ши 
attaching meanings to factors on the basis 0” 
factor loadings is "clearly" rendered suspect 
by the existence of alternative possible facto! 
variables. One needs also to be told how 0 
find at least two distinct possible facto! 
variables and on what grounds опе migi, 
place distinct interpretations on them. 1 
stead, Guttman (1955) showed only that o 
can construct alternative factor variables f | 
infinitely many ways by adding to a regress 
estimate Y a value of a random variable” 1 
as in Equations 14 and 15, defined “by E 
ing dice or turning a roulette wheel" (p. a 
More precisely, to construct a possible E A 
score for any person randomly drawn "d 
the defined population 6 one makes 07 
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throw of the dice, or turn of the roulette wheel 
'(or generates one number by means of a 
random number generator), and arbitrarily 
associates the additional random number D* 
so obtained with the scores of the randomly 
drawn person. Schénemann and Wang (1972) 
suggested similar procedures for constructing 
possible factor scores and remarked that since 
infinitely many different sets of such numbers 
can be computed, they need not be estimated. 
However, although such computational 
procedures do indeed yield mathematically 
admissible alternative factor variables, the 
solutions so obtained can hardly be regarded 
as measurements of empirical properties of 
persons in the population Ф. Measurement 
constitutes the assignment of numbers to 
objects in such a way as to represent empirical 
relationships by numerical relationships. It is 
hard to see how assigning artificially generated 
random numbers to persons in the population 
can represent empirical relationships (of order, 
say) among these persons. Thus, these alter- 
Native factor variables lack empirical signifi- 
сапсе and as such cannot be regarded as 
lending themselves to distinct interpretations 
of the factor. Hence the argument that factor 
, indeterminacy renders the interpretation of 
factor loadings suspect is incomplete. There- 
fore, it is understandable if the typical user 
of factor analysis chooses to disregard this 
argument. 
M McDonald (1974) sought to offer an escape 
We the mathematical dilemma pointed out 
y Schónemann and Wang (1972). Based on a 
general mathematical theorem about the 
construction of and relationship between all 
Eee factor variables, McDonald's dis- 
on consisted essentially of two main 
arguments, 
The first of these was a demonstration that 
E M nan between two distinct simula- 
eem of a factor variable, using independent 
idua] variables D* in Equation 14 (generated 
КА ee the artificial devices suggested by 
im Jn 1955, and by Schónemann and 
s А 972), is p. Two implications of this 
к (а) that p? is the correlation between any 
cona factor of y, defined (without simula- 
b; RUE ®, and any simulation of it created 
А € devices recommended and (b) that 2" 
. "ле correlation between two independent 
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simulations of a possible factor of y made by 
independent investigators. 

The second argument amounted to a declara- 
tion that for consistency one must adopt the 
convention that the common factor be thought 
of as uniquely defined, though unobservable; 
hence the (minimum) correlation between 
alternative possible factor variables cannot 
serve as a measure of the indeterminacy of 
the common factor, since the distinct values 
of these alternatives for a given person in Ф 
cannot both be the unique but unknown value 
of his factor. 

The first of McDonald's arguments is 
correct, but is not, as it turns out, particularly 
useful. The second, it can now be shown, is 
ambiguous and incomplete and seemingly has 
proved unconvincing. 

Mulaik (1976) rejected McDonald's second 
argument, claiming that Guttman's lower 
bound represents a measure of the extent of 
possible disagreement between two investi- 
gators about the nature of a factor. Mulaik's 
argument assumes that two investigators might 
actually find two distinct empirical measures, 
either of which has the properties of the 
common factor variable. Further, the empirical 
measures might be imperfectly correlated, and 
their test contents might give conflicting 
interpretations of the factor (presumably 
whether or not these were consistent with the 
contents of the remaining » tests). Mulaik 
further argued that in typical applications 
factor analysts will have at most only a few 
alternative empirical interpretations of a 
factor and that the expected correlation be- 
tween pairs of alternative factor variables’ 
(across a universe of such pairs, randomly 
selected) will be greater than or equal to р? in 
Equation 6. 

Green (1976) offered a generally conciliatory 
conclusion on what he described as the “factor 
indeterminacy controversy,” suggesting that 
all possible factor variables are equally true 
and that both p? and 2p? — 1 serve as measures 
of factor-score indeterminacy. 

With the exception of remarks in Mulaik 
(1976), what all of these discussions have in 
common is a failure to relate the problem 
described to any discernible realities of factor 
analytic practice. 

It turns out that although it is not true that 
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different investigators might construct different 
empirical measures of finite length that 
satisfy Inequality 13 and hence are possible 
factor variables, there are conditions under 
which they might construct the beginnings 
of sequences of variables (each of which 
contains at least some error of measurement) 
that would in the limit yield tests of infinite 
length that satisfy Inequality 13. This fact 
and its consequences form the topic of the 
next section. 


Determinacy of Common Factors in a 
Behavior Domain 


One way to give empirical meaning to 
alternative possible common factor variables 
has been shown by Mulaik and McDonald 
(1978). They considered the possible factors 
of two sets of variables obtained by augmenting 
a core set of n variables (marker variables for 
a factor, one might say) with different sets of 
additional variables, subject only to the 
condition that each augmented set continue 
to satisfy a common factor model consistent 
with the model for the core set. They dis- 
cussed what one might call alternative tests 
of infinite length, developed from the given 
core set and each uniquely determining a factor 
variable. They gave conditions under which 
these tests of infinite length will determine 
the same factor variable. If these conditions 
are not satisfied, the tests may determine 
distinct factor variables whose correlations in 
the limit are subject only to the Guttman 
bounds (Inequality 13). See also McDonald 
(1977). 

Here we give a simpler treatment of the 
problem, directly based on behavior domain 
theory. Following Guttman (1957), we suppose 
that the # variables Y;, ..., У, of the section 
entitled The Basic Results are drawn from a 
set of n + m empirical variables Yi, ..., V, 
and Yay ..., Yay, a universe of content or 
behavior domain that is defined in advance of 
any statistical analysis. 

An ambiguity concerning the possible alter- 
native factor variables of Vi, ..., Y, is 
removed by supposing that fi, ..., f, are fixed, 
known values. In practice, this means that 
one must have at least three variables in the 

core, whose correlations then determine the 
factor loadings uniquely. (In the multiple- 
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factor counterpart of these arguments, we 
would require that the factor loadings be fixed 
against rotation and known.) 


Write 
ул= (Yy..., Yn), y^ (Ents ..., Ума) 
and 
Ru Re 
R- p H (6). 


for the joint correlation matrix of y; and ys 
partitioned conformably. 

One cannot seek to establish any relation- 
ship between the possible factor variables of 
Fa, ..., Yn and those of the behavior domain, 
except under the condition that one is able 
to obtain a joint factor analysis of the n + m 
variables in which the factor loadings of the 
first n are the same as when the first n is 


factored without the further m. If this condis 


tion is satisfied, we say that the core and the 
domain have consistent loadings. If the core 
and the domain do not have consistent loadings 
there is no more reason to seek a relationship 
between their possible factors than to seek @ 
relationship between two of the multiple 
factors in an orthogonal multiple factor 


analysis. Although the effect of adding more: 


variables may be to add further common 
factors, so that, say, there are 7 common 
factors altogether, the requirement of con- 
sistent loadings is that one must be able 0 
fit the model and choose a rotation in such 
way that the factor loadings of the ” core 
variables are unaltered in the context of the 
additional m variables. It can hence be seen 
that the core and the domain are consistent 
in this sense if and only if the factor solutio? 
for the domain is of the form 


Gyr fi E U: ] 
ale el | vl | 


[An TERG 
= | ње + 6,61 ЂЕ, + HA 
an 

where 
(8) 


G,G', = D, 
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a diagonal matrix with nonnegative diagonal 
s terms, and 


(19) 


which also contains nonnegative diagonal terms 
and where U is the original unique variance 
matrix given by Equation 4. What this means 
is that the effect of adding variables may be 
to redefine some of the unique variance of one 
or more components of y; as common variance, 
that is, variance in common with that of 
components of уз. The loadings in Gy, with 
the property defined in Equation 18, take care 
of this possibility, but do not account for any 
correlation between components of уз. The 
loadings in С, account for some of the corre- 
lation between components of y: and, together 
with б, for any correlation between com- 
ponents of y; and components of y2 that is not 
accounted for by the original general factor. 

One now recognizes a simple result of 
fundamental importance: A possible common 
factor variable of the domain is a possible 
general factor of the core factored separately, 
if and only if the domain and the core have 
consistent loadings. To see this, note that if 
One has the structure of Equation 17 with 
Equation 18 and 19, one can write the corre- 
sponding factor model in the form 


aes ћ G']x ei 

HM 2 Е с, Чр 
pus Zisa vector of r — 1 additional common 
‘actor variables and the residuals e; and ез 


have covariances given by U; and U2. Then 
У» is deleted, Equation 20 become 


у = fX + Gz +e = fX +e, 


Uu-U-D, 


Q1) 


qw e is given by Equation 2, since Gz + ex 
am the covariance matrix U = U; + D and 
aves just like the original residual. Con- 
versely, if there is a common factor of the 
n Set that is a possible general factor of 
E set (possibly with additional common 
it sh then it satisfies Equation 11, and so 
ib be possible to express the joint common 
js : Pattern in the required form. This 
T etes the proof of the stated result 
verae by an obvious extension of the 
nown result (Equation 10) given by 


B 
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Piaggio (1931, 1933), it follows that if m, the 
number of additional variables in the domain, 
is infinitely large, then the possible first factor 
is uniquely determined by the n + m variables 
that constitute the domain, since its multiple 
correlation with them is unity. (This assumes 
that a nonzero proportion of the components 
of f, is strictly greater than zero.) 

Thus one finds that if the domain factor 
pattern is not consistent with that of the core, 
that is, if it cannot take the form of Equation 
17 with Equations 18 and 19 and the conditions 
on them, then there is no possible common 
factor variable of the domain that is also a 
possible general factor variable of the core 
set. If the joint factor pattern is consistent 
with the core set, so that possible factor 
variables of the domain and of the core set 
can be defined on the same basis, and if the 
additional variables of the domain have non- 
zero loadings on the factor that is marked by 
the core, there is one and only one factor 
variable in the domain that is a possible factor 
of the core set. In this case, the factor loadings 
of the core set mark a unique factor variable 
in the domain, which is determined as pre- 
cisely as one pleases by the addition of further 
variables with nonzero loadings. This confirms 
the conclusion given by McDonald (1974) 
on purely formal grounds. In other words, 
there are not infinitely many possible alter- 
native factor variables in a behavior domain 
that correspond to the profile of loadings ofa 
given core set of variables from the domain. 
Either there is just one factor variable in the 
domain consistent with the core or there is no 
such factor variable. 

Tf there is a factor variable of the domain 
that is consistent with the core, it can be 
shown that the squared correlation between 
it and the estimator given by Equation 5 is р? 
(given by Equation 6), as one would expect. 
McDonald (1974) has shown that the index p? 
as a measure of the determinacy of factor 
scores does not yield any of the dilemmas 
associated with the lower bound, 20° — T1: 

However, before one concludes that the 
problem of factor-score indeterminacy has just 
been eliminated, one must examine the 
consistency conditions more closely. Essen- 
tially, the requirement is that if, in traditional 
terminology, one “extracts” the first factor, 
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then the first residual matrix, 
ROL |Б- “a it tee el 
Ra Ree ЊЕ hf 
must be at least nonnegative definite, that is, 
a possible covariance matrix of real variables, 
so that it can be factored in its turn in terms 
of real factor loadings and nonnegative unique 
variances. It is important to recognize that 
there is no reason whatsoever to suppose that 
this condition will always, or even usually, be 
fulfilled in practice. Actually, it is a generalized 
counterpart of the condition that there be no 
Heywood variables (i.e., variables with nega- 
tive unique variance). One knows that the 
condition has been violated if one attempts to 
add variables to a core, prescribing a consistent 
analysis, and one obtains a Heywood case. But 
generally one will not know whether the 
domain contains variables that, if one were 
to add them to the core, would violate the 
required condition. 
Mulaik and McDonald (1978) have shown 
that if a behavior domain does not satisfy 
conditions similar to Equations 17-19, then 
such a domain can possibly be divided into 
overlapping subdomains, all containing the 
core set as a subset and possessing disjoint 
sets of factor variables with possible corre- 
lations that are only bounded as in Inequality 
13. That is, if there is no factor variable in the 
domain that is a possible factor variable of 
the core set, there may be factor variables of 
subdomains that are distinct possible factors 
of the core set, with correlations anywhere 
between 2p? — 1 and 1. As a consequence, if 
two investigators take a common core of 
variables as markers for a factor and in- 
dependently draw variables from such a 
domain, not by a sampling rule but by merely 
augmenting the core set subject to the satis- 
faction of the general factor model, they can 
define two distinct tests of infinite length, 
containing distinct variables that lend them- 
selves to distinct interpretations. Only when 
the two sets of augmenting variables are 
merged will one discover that the domain 
does not have a basis consistent with the 
core and hence has no factor variable that is a 
possible factor of the core. 
This implies also that one certainly cannot 
use the factor model to define a behavior 


(22) 
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domain; that is, a behavior domain is not | 


uniquely defined by choosing a core set of 
variables that fits the common factor model 
and then saying that the domain consists of 
all variables that, jointly with the core set, 
fit the common factor model with the same 
number of factors. 

Thus, we have arrived at an empirical 
treatment of the significance of factor in- 
determinacy. It is a variant of results given 
by Mulaik & McDonald (1978) and McDonald 
(1977). If a behavior domain can be factored 
so that one of its common factors has the 
same loadings on a core set of variables as 
when the latter are factored separately, it is 
then meaningful to consider the relationship 
between the possible factors of the core and 
the possible factors of the behavior domain. 
In such a case, the loadings of the core set 
uniquely mark a factor variable in the domain, 
and the addition of further variables with 
nonzero loadings on the factor will determine 
it as precisely as one pleases, ultimately 
yielding an infinite sequence of variables that 
determines the factor exactly. If the behavior 
domain cannot be factored consistently with 
the core, there is no factor variable in the 
domain that is a general factor of the co 
set, but there may be many factor variables 
of subdomains that are distinct alternative 
general factor variables of the core set. In- 
compatible sets of variables added to the core 
set may yield distinct tests of infinite length, 
defining distinct factor variables and sus 
ceptible to distinct interpretations on the 
basis of their profiles of factor loadings 0n. the 
distinct tests. 


These observations eliminate the paradoxe 


to be found in the previous purely mat? 
matical treatments of the topic, sincè 
condition under which two possible tests 0 
infinite length, obtained by augmenting cor 
set, can be correlated zero when each 5 
estimated with p = .707 is the condition that? 
corresponding factor in the domain taken a 
a whole does not exist. If a factor 2 x 
domain is defined by the loadings of tbe 0" 
tests, it is defined uniquely. 
However, these results do not define RU 
the problem of factor indeterminacy- pu. 
they point to its significance for the user 3 
the model Most users of factor analy! 


| 
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employ the model in an exploratory fashion. 
„Та doing so, they do not begin with a set of 
relatively clear psychological concepts and 
then select variables for their study that on 
the basis of theory should contain or be 
determined by these concepts. Nor, on the 
other hand, do they begin with a clearly 
defined behavior domain (from which all 
investigators would recognize equally how to 
develop representative measures) and then 
select from it a reasonable number of variables 
to represent the domain in a balanced fashion. 
Instead, they begin with a number of variables 
in which they are interested, which might or 
might not correspond to a general preconcep- 
tion of a behavior domain of tests that “ге- 
semble” the tests chosen. Their hope is that 
when these variables are subjected to a factor 
analysis, the psychological attributes that 
determine the correlations among the variables 
will reveal themselves. Thus, exploratory factor 
analysis is typically employed to generate a 
theory rather than to test a theory. It is 
Noticeable that in many such applications the 
User appears quite satisfied with the results. 
The problem remains, however, of a range 
of ambiguity with respect to the ways in 
which the results may be extended by the 
пазе of further variables to measure the dis- 
Covered attribute more precisely. 
| the variables have been selected on the 
basis of certain common attributes defined in 
Advance, there is no uncertainty as to the 
Interpretation of a factor found common to 
those variables and no uncertainty as to the 


further variables one would expect to Бе: 


qe with the same factor. But in the 
up of an agreed domain of variables 
m Prespecified common attributes, two 
Nes may seize upon distinct sets of 
ommon attributes and proceed to build 
Stinct extended batteries of variables on 
he basis of the same core set, in terms of these 
E attributes. Ultimately, in principle, 
5 а create two distinct test batteries of 
d p ength, built on the same original test 
amu "responding to two distinct attributes 
of d em. distinct random variables, either 
of fs Might have been the common factor 
De enel variables. It is factor in- 
latitud, acy that supplies a mathematical 
ude for such sets of alternative common 
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attributes to be found. This does not mean 
that they will inevitably be found in practice. 
It means only that such possibilities exist. It 
remains for appropriate empirical work, using 
merged variables, augmented sets, and exten- 
sion procedures (see McDonald, 1978), to 
establish how serious these problems are in 
practice. 

What this analysis reveals, essentially, is 
that it may be naive to use exploratory factor 
analysis to generate theory out of an ill- 
defined domain of variables. The danger is 
that the factor structure of the entire domain 
may not be remotely consistent with the factor 
structure of its subdomains. 

However, it must be pointed out that these 
problems of the common factor model do not, 
as is sometimes supposed, constitute grounds 
for using alternative modes of analysis such 
as component theory. McDonald (1975) has 
discussed some aspects of this choice. Here 
one recognizes that if one worries about the 
relationship of factor-score estimates to the 
variables being estimated, it is because one is 
able, under certain conditions, to define the 
variables being estimated on the basis of a 
behavior domain that fits the common factor 
model consistently with the given variables. To 
show that component scores do not have an in- 
determinacy problem, as studied here, one 
must show that the component score of a 
core set of variables uniquely determines a 
corresponding component score in an infinite 
domain from which the core set has been 
drawn. But one knows immediately that this 
cannot be true. Even if one could find a way 
to define consistent bases for the core and the 
domain, and this seems unlikely, the domain 
component scores would contain error parts 
and specific parts that would be uncorrelated 
with variables in the core. Thus, it seems that 
mathematical arguments (eg, Schönemann 
and Steiger, 1976) that favorably contrast 
component theories, whether principal com- 
ponents, components of images, or other 
components, with the common factor model 
cannot be accepted until they are reformulated 
in terms of behavior domains and in terms of 
the relation of core components to domain 
components. It seems unlikely that core 
components can be shown to yield better 
representations of components of interest in a 


306 


behavior domain than we have in common 
factor estimates. (See McDonald, 1977, for a 
more extended account of this question.) 

The implications of factor indeterminacy 
also render suspect the frequently used tech- 
nique of including sets of marker variables in 
distinct sets of additional variables, in the 
hope of determining whether the same factors 
are to be found in these distinct sets. It is 
supposed that if the marker variables have 
the same loadings on a common factor in each 
of these sets, this is evidence that the same 
factor variable has turned up in each of these 
sets. But this need not be so. The marker 
variables may be associated with different 
factor variables when embedded within the 
contexts of different sets of additional vari- 
ables. The only way to be sure that the same 
factors are common across the sets is to 
analyze the marker variables jointly with all 
the other variables to see if one can obtain 
consistent loadings. But if this is done, then 
there is no need for marker variables in the 
first place. 

Because of the close parallels. between 
common factor theory and classical true score 
theory, factor indeterminacy has important 
implications for test theory and for test 
construction in practice. For example, in the 
construction of homogeneous tests, one might 
begin with a core set of items that are thought 
to define a dimension of interest and then add 
to that core additional items that are factorially 
consistent with the original core to increase 
the total-score reliability of the test. If the 
developer of the tests does this without taking 
account of the empirical content of the addi- 
tional items, which in some instances may 
differ from that of the core variables, he may 
end up with a test whose total score measures 
an attribute that is somewhat different from 
that measured by the original core items. This 
may happen if the core set of items is selected 
from one domain with which the core is 
factorially consistent while the additional 
items are selected from another domain with 
which the core items are also factorially 
consistent. Jointly the union of the two 
domains of items may not be factorially 
consistent with the core items, although the 
domains are factorially consistent separately. 
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In-Group Bias in the Minimal Intergroup Situation: 
A. Cognitive-Motivational Analysis 


Marilynn B. Brewer 
University of California, Santa Barbara 


Experimental research on intergroup discrimination in favor of one's own group 
is reviewed in terms of the basis of differentiation between in-group and out- 
group and in terms of the response measure on which in-group bias is assessed. 
Results of the research reviewed suggest that (a) factors such as intergroup 
competition, similarity, and status differentials affect in-group bias indirectly by 
influencing the salience of distinctions between in-group and out-group, (b) the 
degree of intergroup differentiation on a particular response dimension is a joint 
function of the relevance of intergroup distinctions and the favorableness of the 


in-group’s position on that dimension, and (c) the enhancement of in-group bias 
is more related to increased favoritism toward in-group members than to in- 
creased hostility toward out-group members. The implications of these results 
for positive applications of group identification are discussed. 


E In 1906, sociologist William Sumner articu- 


lated a functionalist approach to the nature 
of intergroup attitudes in his exposition of the 
concept of ethnocentrism. The differentiation 
of peoples into distinct ethnic groups origi- 
nates, according to Sumner, in context of the 
conditions of the struggle for existence.” At 
the individual level, the psychological conse- 
quences of this differentiation both reflect and 
sustain the basic state of conflict between the 
m-group (or “we-group”) and out-groups (or 
Others-groups?) : 


The insiders in a we-group are in a relation of 
EUN order, law, government, and industry, to each 
ет. Their relation to all outsiders, or others- 
Tu is one of war and plunder. . . . Sentiments 
а to correspond. Loyalty to the group, 
oe for it, hatred and contempt for outsiders, 
Que ood within, warlikeness without—all grow 
S ег, common products of the same situation. 
(Sumner, 1906, p. 12) 


From this perspective, then, attitudinal and 
den biases in favor of members of one's 
th Broup over members of other groups are 
2 1 product of intergroup competition, serving 
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the dual functions of preserving in-group 
solidarity and justifying exploitation of out- 
groups. Presumably also, the greater the in- 
tensity of competitive interdependence be- 
tween groups, the more attraction within the 
in-group and corresponding hostility toward 
the other group, whereas low levels of inter- 
dependence between groups should be associ- 
ated with relatively little contrast in attitudes 
toward members of the in-group and out- 
group (LeVine & Campbell, 1972). 

The functionalist concept of intergroup 
relations is epitomized in the ambitious field 
experiment undertaken by Sherif and his col- 
leagues in the context of a boys’ summer 
camp (Sherif, Harvey, White, Hood, & Sherif, 
1961). In the fully implemented version of the 
study, conducted in 1954, two groups of 11- 
year-old boys were formed in isolation from 
each other for a period of 8 days before being 
brought into contact under conditions de- 
signed to maximize competition and mutual 
frustration, The resulting intergroup hostility 
was documented with anecdotal evidence 
based on observation of overt behavior, sup- 
plemented by controlled measures of socio- 
metric preferences, evaluative trait ratings, 
and estimates of performance by group mem- 
bers during a competitive task. On each of 
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these indicators, campers revealed consistent 
biases favoring members of their own group 
over members of the competing group. Reduc- 
tions in bias were not achieved until the 
nature of the functional relationship between 
the groups was altered by systematic intro- 
duction of "superordinate goals" requiring 
cooperative interaction. 

The Sherif et al. field study is essentially a 
demonstration rather than a test of the func- 
tionalist view of intergroup relations, since its 
design took for granted that interaction un- 
der competitive conditions was prerequisite to 
the initial development of in-group bias and 
intergroup hostility. No systematic assessment 
of attitudes toward in-group and out-group 
members was made before the intergroup 
competition phase of the experiment (al- 
though changes were documented after com- 
petitive pressures had been removed). How- 
ever, some anecdotal evidence from the 1954 
study was provided that indicates that nega- 
tive reactions to the out-group were present 
prior to the introduction of structured compe- 
tition, At the close of the first phase of the 
experiment, the two groups were first made 
aware of each other's existence, and at that 
time the mere knowledge of the presence of 
the other group was sufficient to generate 
name-calling and other derogatory commen- 
tary from each group directed toward the 
other (Sherif et al., 1961, p. 95). 

The significance of these initial-contact 
effects has been realized only recently as the 
phenomena associated with intergroup percep- 
tion have been reexamined in light of more 
general cognitive processes by which human 
beings structure, simplify, and give meaning 
to their physical and social environment 
(Hamilton, 1976; Hensley & Duval, 1976; 
Tajfel, 1969, 1970). From this perspective, 
any categorization rule that provides a basis 
for classifying an individual as belonging to 
one social grouping as distinct from another 
can be sufficient to produce differentiation of 
attitudes toward the two groups, in the ab- 
sence of any initial competitive interdepen- 
dence. The present review focuses on research 
directed toward identifying the minimal con- 
ditions necessary to generate in-group-out- 

group discrimination. 
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Defining the Minimal Intergroup Situation 


A number of laboratory studies have at. 
tempted to demonstrate the presence of in. 
group favoritism under conditions in which 
the independence of outcomes for in-group 
and out-group is explicitly controlled. Among 
the earliest of such demonstrations was a 
study reported by Ferguson and Kelly | 
(1964) in which two groups of three to six 
members each worked independently on three 
tasks. Following the interaction group mem- 
bers were asked to rate the quality of the 
products of both groups separately on a 9- 
point scale. Mean ratings obtained were sig- 
nificantly biased in the direction of more posi- 
tive evaluation of subjects’ own groups 
product than of the other group's produci, 
irrespective of any objective differences in 
output between the two groups. 4 

Subjects in the Ferguson and Kelley exper 
ment had an extensive period of familiariza- 
tion and personal investment in the outcome 
which could have influenced their preference 
for own-group products. A clearer demonstra- 
tion of in-group bias is obtained when sub: 
jects are asked to evaluate qualities assoc 
ated with their own and other groups in the 
absence of any interaction or personal influ" 
ence on the qualities being rated, as was the 
case in an experiment by Doise et al. (1972). 
Subjects in that experiment were divided into 
“X-type” and “Y-type” groups and were tol 
that the division was based on photograph 
preferences (although group assignment Wa 
actually determined randomly). In the 002 
trol condition of the experiment, subjects 
were led to anticipate no further interaction 
with members of either group, but were aske 
to describe the other members of their OW 
group and the members of the other group 9| 
a series of 19 evaluative trait ratings. Desp 
the minimal basis for distinction pe 
the two groups, a significant difference 
mean favorableness of ratings was obtaine 
in the direction of more positive ratings і 
members of the subjects’ own group. но 
in an earlier experiment, Rabbie and Ноги 
(1969) found that in a control conditio? И 
which subjects were arbitrarily divided E 
groups labeled blue or green (with Е 
tionale or further interaction), there were © 


IN-GROUP BIAS 


significant differences in evaluative trait rat- 
x ings of individuals in the subjects’ own group 
as opposed to individuals in the other group. 

It appears, then, that there are lower lim- 
its to the effects of grouping on interpersonal 
perception but that in-group bias does occur 
in the absence of explicit competitive inter- 
dependence between groups. The absence of 
implicit competitive orientation in most of 
these studies, however, is difficult to estab- 
lish, Indeed, Rabbie and Wilkens (1971) re- 
ported that their attempt to create coacting 
groups under no-competition instructional 
conditions resulted in ratings of perceived 
competitiveness that were equal to those ob- 
tained under explicit competition instructions, 

As Turner (1975) has suggested, the effect of 

categorization into groups may be mediated 

by an inherent competition for “positive so- 
| E cial identity." Relative to the earlier view of 
"the role of competition in intergroup atti- 
tudes, however, this hypothesis reverses the 
causal ordering in that competition is gen- 
erated by the differentiation between groups 
rather than vice versa. 

The generation of competitive orientation 
as а function of in-group-out-group distinc- 
^ tions, in the absence of any functional conflict 
_ Of interests, is perhaps best illustrated with 
the paradigm originated by Tajfel and his 
colleagues for studying intergroup behavior 
(Tajfel, 1970; Tajfel, Billig, Bundy, & Fla- 
Ment, 1971), The research setting was de- 
signed to meet the following criteria for “‘min- 
imal differentiation” (Tajfel et al., 1971, pp- 
158-154): (a) no face-to-face interaction 
among subjects, within or between groups, 
d anonymity of group membership, (c) 
тон of any instrumental link between the 
cim Or intergroup categorization and the 
ME measure, and (d) a response measure 
x ns) real and significant choices but of 
et Irect utilitarian value to the subject. Fol- 
= ing these criteria, subjects in the Tajfel 
превели are divided into two groups based 
capa on their responses to an irrele- 
SU Judgmental or preference test. After 
iem are informed of their own group 
Dm Mu (but in the absence of any con- 
Na or knowledge of other group mem- 
id су are given a choice task that 
E es allocating money between two other 
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subjects in the same experiment. The identity 
of the other subjects is indicated only by an 
arbitrary identification number and a label 
specifying group membership, which can be 
varied to be the same as that of the subject 
or to indicate a member of the other group. 

The types of choice matrices provided in 
the Tajfel experiments are illustrated in Table 
1, for the case in which one target person is a 
member of the subject's own group and the 
other a member of the out-group. Within each 
matrix, each column represents an alternative 
allocation of points (worth some specified 
fractional amount of money) distributed be- 
tween the two target persons, and the sub- 
ject is to choose one of the alternatives as 
the distribution to be made. Matrices are 
constructed to represent a number of different 
possible distribution rules that could be ap- 
plied, including equality (choosing the alter- 
native that comes closest to giving each person 
the same number of points), maximizing 
joint outcome (choosing the alternative for 
which the total number of points is highest), 
or in-group favoritism (choosing the alterna- 
tive that affords the in-group member more 
than the out-group member). For instance, 
Matrix A in Table 1 pits equality (choices at 
the middle of the series) against favoritism 
(choices toward the extreme right), whereas 
Matrix B varies favoritism (choices at the 
left), equality (midpoint choices), and joint 
outcomes (choices at the right). 

Across a series of studies that used this 
allocation task (Billig & Tajfel, 1973; Tajfel 
& Billig, 1974; Tajfel et al, 1971), Tajfel 
and his colleagues have found that competi- 
tive choices favoring the in-group member 
tend to dominate over alternative available 
choice strategies. However, the matrices used 
in these studies have not been systematically 
varied to compare favoritism with all possible 
choice combinations. In particular, choice 
alternatives that maximize relative gain (i.e., 
the choice that maximizes the difference be- 
tween in-group and out-group points in favor 
of the in-group member) have usually been 
confounded with alternatives that maximize 
absolute gain (i.e., the same choice maximizes 
the number of points that can be provided to 
the in-group member alone; cf. Matrices A 
and B in Table 1). Thus, the task structure 
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Table 1 : 
Multiple-Choice Allocation Matrices? 
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Matrix Payoffs for members of in-group and out-group 
A 
In-group 1 2 3 4 5 6 7 8 2151011 .12 13.13 
Out-group ie edd, 127-141 10 9 8 7 6 5 E 3 2 1 
B 
In-group Јаја об iS 14:043 | 12: Ht. 10 8 7 
Out-group 1 3 5 7 9 НИЗ 12449. 21 23. 25 
c 
In-group TENES АТОН 12299431914 7156: 1718 19 
Out-group d 3 5 7 БА о ted 21. 23 25 


* Adapted from Tajfel, Billig, Bundy, and Flament (1971). 


itself may have dictated a competitive strat- 
egy, in that gain for the in-group could be 
achieved only at cost to the out-group mem- 
ber. Only one matrix format (Matrix C in 
Table 1) has been used in which the alterna- 
tive that maximizes the in-group member's 
outcome is different from the relative gain 
choice, and in this case the former is con- 
founded with maximizing joint gain and 
maximizing the difference in favor of the out- 
group member. 

To test the generality of preference for out- 
comes that maximize the competitive advan- 
tage to the in-group under forced-choice con- 
ditions, Brewer and Silver (1978) constructed 
a series of two-choice matrices to represent all 
possible pairings of the four alternative dis- 
tribution rules of interest—equality, joint 
gain, relative gain, and absolute (in-group) 
gain (cf. MacCrimmon & Messick, 1976; Mc- 
Clintock, Messick, Kuhlman, & Campos, 
1973). The matrices they used are reproduced 
in Table 2. For each pair of two-choice 
matrices, the distribution rules that are con- 
founded in the first matrix of the pair are 
opposed in the second matrix. Thus, assum- 
ing consistency of choice preferences across 
matrices, the pattern of choices for the two 
matrices in each pair combined discriminates 
perfectly among the four distribution rules, 
as indicated by the scoring key associated 
with each matrix pair in Table 2. Using this 
forced-choice format, Brewer and Silver 
found that a majority of subjects who had 
been divided into groups following an “aes- 
thetic preference" test selected point distribu- 


tions that maximized relative gain in favor of 
the in-group over choices that maximized 
absolute in-group gain or other alternatives. 
These results confirm that subjects treat in-_ 
group-out-group outcomes as competitively 
interdependent even when such an orientation 
is not required by the nature of available 
alternatives. 


Sources of Variation in Bias 


The research paradigms provided by Sherif | 
et al. (1961) and by Tajfel et al. (1971) rep- 
resent two extremes of the conditions of intet- 
group differentiation under which the occur- 
rence of in-group bias may be studied. The 
Sherif field studies created a high degree of 
interaction and cooperative interdependence 
within groups combined with explicit competi- 
tive interdependence between groups, whereas 
the Tajfel laboratory studies involved minimal; 
intragroup relationships and no predetermin 
functional interdependence between groups 
Both types of research yield evidence of the 
presence of in-group favoritism, but results 
are not directly comparable for purposes 0 
assessing variations in extent or intensity Ü 
such bias. Most of the experimental studió 
undertaken in this area since 1960 cam ‘i 
viewed as attempts to determine the on 
bution to in-group bias of various settings ! 


the Brewer 


from t 
resen- 


1 In the task booklets actually used in 
and Silver (1978) study, the matrices 
pairs were randomly intermixed in order of P 
tation. 


IN-GROUP BIAS 


between the extremes represented by the 
‘Sherif and Tajfel paradigms. 

Table 3 provides a two-way classification 
of experimental studies published since 1960, 
in which at least one aspect of the conditions 
of intergroup differentiation has been system- 
atically varied. Along with a classification of 
the major independent variables that have 
been manipulated, the studies listed in Table 
3 are categorized according to which depen- 
dent variables were assessed of the three most 
widely used types of measures of in-group 
bias: (a) subjective ratings of individual 
group members or of the group membership. 
as a whole on a series of evaluative trait 
scales, (b) ratings of the quality of group 

. process (e.g., cohesiveness and cooperative 
atmosphere) or product, and (c) behavioral 
measures involving resource distribution deci- 
„Sions (e.g., Prisoner's Dilemma Game choices 
or the Tajfel allocation task). Studies that 
manipulated more than one independent 
variable or that included more than one 


dependent measure are multiply listed in 
Table 3, 


Actual or Anticipated Competition 


, A number of the studies already reviewed 
Ee that a competitive reward structure 
d aH à necessary precondition for obtaining 
i cant in-group bias, but whether bias is 
et io] by the presence of explicit conflict 
m ni between groups remains in ques- 
uA s early laboratory studies of in-group 
b Rn in the context of manage- 
1963. BAD groups (Bass & Dunteman, 
CIE ake & Mouton, 1961), compared 
and S of the in-group obtained before 
Som ii. introduction of a problem-solving 
um ae against one or more other groups 
авї Sel os increases in positive 
tition Blas! ns during the intergroup compe- 
ho systematically controlled study 
tition dran anticipated and actual compe- 
subjects he and Wilkens (1971) divided 

groups arbitrarily into pairs of three-person 

expect á nd then led both groups either to 

engage i further interaction or to expect to 
tition ih an interactive task either in compe- 

+ on With the other group or independent of 
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Table 2 
Forced-Choice Allocation Matrices" 
Payoff 
Matrix pair DUI Scoring key 


Al 
In-group member 


7 : Equality 
Out-group member 9 
A2 


0 
1: Joint gain 
, 0: Relative gain 
In-group member 7 1: In-group gain 
Out-group member 9 12 


B1 
In-group member 6 7 
Out-group member 8 3 
B2 1, 
1 


0: Joint gain 

1: Equality 

0: In-group gain 
1 


In-group member 6 5 1,1: Relative gain 
Out-group member 8 4 

сл 
In-group member 6 7 0,0: Equality 
Out-group member 4 10 0, 1: Relative gain 

C2 1, 0: Joint gain 
In-group member 6 7 1, 1: In-group gain 
Out-group member 4 1 

Di 
In-group member т 9 0,0: Relative gain 
Out-group member 5 12 0,1: Equality 

D2 1, 0: In-group gain 
In-group member 7 6 1,1: Joint gain 
Out-group member 5 7 


* Adapted from Brewer and Silver (1978). 


the other group (no competition). (Subjects 
who initially anticipated no interaction were 
later placed in the competition or no-compe- 
tition conditions.) Ratings of own- and other- 
group members on six evaluative traits were 
then obtained from each subject prior to the 
interaction phase of the experiment and were 
again obtained from subjects in the competi- 
tion and no-competition conditions after the 
interactive task. Before interaction, subjects 
who anticipated the task gave ratings of in- 
group members that were significantly higher 
than those obtained from subjects not ex- 
pecting interaction, but subjects in all condi- 
tions showed equally significant bias in the 
difference between in-group and out-group 
ratings. Following interaction, the degree of 
bias in favor of own-group members increased 
significantly for subjects in both the competi- 
tion and no-competition settings (although 
ratings of group products showed no signifi- 
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tism toward in-group members, but equally 
so for competitive and independent groups. 
Similarly, Janssens and Nuttin (1976) found 
that members of interacting groups overesti- 
mated group successes more than did members 
of noninteracting groups but that groups who 
engaged in intergroup competition did not 
overestimate significantly more than groups 
who coacted independently. However, as was 
mentioned previously, the success of initial 
instructions for creating a noncompetitive 
task structure may be questionable, since a 
manipulation check in the Rabbie and Wil- 
kens (1971) experiment revealed that felt 
competitiveness was equally high for mem- 
bers of the competition and no-competition 
groups, 

L A clearer manipulation of the structure of 
interdependence between groups is attained 
When conditions promoting intergroup compe- 
tition are compared with conditions requiring 
Cooperation between groups. One method for 
varying this feature of intergroup relations is 
through the use of instructional sets designed 
lo induce competitive or cooperative orienta- 


| lion on the part of members of one group 
| 


i in-group bias). Thus, the effect of actual 
| intragroup interaction was to enhance favori- 


toward those of another group. Such an in- 
Structional manipulation was used in an ex- 
Periment by Rabbie, Benoist, Oosterbaan, and 
Visser (1974) in which three-person groups 
Were instructed to role play a team of union 
Negotiators preparing for a meeting with a 
ee team. After a 10-minute discus- 
es Period within the union group, subjects 
P at to make ratings of the atmosphere 
3 ег 9wn group and of their expectations 

garding interactions with the management 
p. No significant differences in ratings of 
ect cohesion or satisfaction were ob- 
But between subjects in the competitive 
es orientation conditions, but 
m bers of competitive groups did report 
eo Cipating greater hostility toward the out- 
Sid than did members of groups in the 

Petative condition, 

er methods of varying cooperation- 
pation involve direct manipulation of 
cue of the intergroup task. One ex- 
Used ^t reported by Kahn and Ryen (1972) 
i ~ * Simulated game setting in which three- 
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person teams anticipated either cooperative 
or competitive interaction with another team. 
Before any actual interaction, subjects made 
ratings of their own team members and of 
out-group members on 11 evaluative semantic 
differential scales. A significant difference in 
ratings in favor of in-group members was 
obtained from subjects in the cooperative 
condition, but the size of this difference was 
significantly greater for subjects in the com- 
petitive condition. Such enhancement of in- 
group bias as a function of intergroup compe- 
tition has not, however, proved reliable across 
research studies. Doise et al. (1972) divided 
subjects into two groups based, supposedly, 
on preference for photographs, and then led 
subjects in the experimental groups to antici- 
pate a Tajfel-type money allocation task in- 
volving members of both groups. Instructions 
for allocation were varied to emphasize com- 
petitive own-gain maximization (outcomes to 
be distributed differentially between the 
groups) or joint-gain maximization (total 
outcomes to be divided equally between the 
two groups). Before the allocation task be- 

gan, subjects evaluated in-group and out- 

group members on 19 trait scales. Mean rat- 

ings from subjects in both the cooperative and 

competitive conditions showed an in-group 

bias significantly greater than that obtained 

from control groups (who anticipated no 

future task), but the bias for competitive 

teams was not significantly different from 

that for cooperative teams (even though be- 

havior afterwards, in the allocation task it- 

self, did differ in a direction consistent with 

instructions). 

One possible explanation of the difference in 
findings obtained by Kahn and Ryen and by 
Doise et al. is that the salience of the coop- 
eration-competition manipulation may be 
highly variable when its impact is assessed 
prior to actual intergroup interaction. Brewer 
and Silver (1978) obtained trait ratings of 
in-group and out-group members from some 
subjects before they completed an allocation 
task and from other subjects after the allo- 
cations had been completed. Instructions for 
the allocation task were varied to generate a 
cooperative intergroup reward structure 
(achievement determined by adding each in- 
group member’s points to an out-group mem- 
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ber's points), a competitive reward structure 
(achievement determined by the difference 
between points allocated to the in-group 
member and those allocated to the out-group 
member), or a reward structure based on total 
points allocated to the in-group member, re- 
gardless of out-group gains. As in the Doise et 
al. study, performance on the allocation task 
was significantly affected by these different 
instruction conditions. Subjects in both the 
independent and the competitive intergroup 
conditions predominantly made choices that 
maximized relative gain in favor of the in- 
group member, but subjects working under 
cooperative instructions made fewer relative- 
gain choices and more choices that maximized 
joint gain or equality between in-group and 
out-group member outcomes. However, trait 
ratings were significantly biased in favor of 
own-group members by subjects in all condi- 
tions, regardless of intergroup reward struc- 
ture or of whether ratings were obtained be- 
fore or after the behavioral measure. 
Contrary to the Brewer and Silver (1978) 
findings, Worchel, Andreoli, and Folger 
(1977) found that intergroup competition 
significantly increased differential attraction 
between in-group and out-group in compari- 
son with cooperative or independent inter- 
group settings. In the first phase of the 
Worchel et al. experiment, subjects were di- 
vided arbitrarily into two groups of four to 
Six persons each. Members of each group 
were to work together on a joint product that 
would later be evaluated either in competi- 
tion with, independently of, or in combina- 
tion with the product of the other group. 
After an initial period of interaction, subjects 
were asked to rate their liking for each of the 
members of their own and the other group. 
Only small differences between conditions 
were obtained for mean attraction ratings of 
in-group members, but liking for out-group 
members was significantly higher in the coop- 
erative setting than in the independent set- 
ting, and out-group attraction in competitive 
groups was significantly lower than for either 
cooperation or independence. On the other 
hand, Ryen and Kahn (1975) found that 
competitive interaction with an out-group 
increased in-group bias, over that obtained in 
cooperative conditions, by increasing evalua- 
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tive ratings of in-group members but having 
no significant effect on out-group ratings, 
Since competition is sometimes found to- 
enhance in-group bias effects and sometimes 
found to have no additional impact? it may 
be that intergroup competition does not affect 
intergroup attitudes directly, but only when 
confounded with other aspects of group dif- 
ferentiation. In other words, the presence of 
explicit competition may serve to clarify the | 
distinction between in-group and out-group 
under conditions in which the differentiation 
would otherwise be ambiguous. The role oí 
intergroup competition in clarifying in-group 
boundaries can be illustrated with an experi- 
ment by Goldman, Stockbauer, and McAuliffe 
(1977) in which effects of cooperation and 
competition were compared for both inter- 
group and intragroup reward structure. In 
their experiment, two-person teams interacted 
on a joint task in which achievement out- 
comes within teams were either cooperatively 
or competitively interdependent, while per- 
formance between groups was assessed either 
jointly (cooperatively) or competitively. 
Evaluative ratings of own-team members wert 
significantly higher under conditions of intra- | 
group cooperation than under intragroup | 
competition, regardless of intergroup reward 
structure. However, the effects of intragroup 
competition on task performance were 88 
nificantly less in the presence of intergroup 
competition than in the presence of int 
group cooperation. It is very likely that in 
the latter condition there was no percepi 
differentiation between the second member 
of one's own team and the members of the 
other team. Only in the presence of a nest 
tive correlation between a subject's own fin 
outcomes and those of the other team cou 
such a differentiation be made, which in tum 
moderated the effects of the competitive ta 
structure within teams. However, Rabbie atf 
Wilkens (1971) reported that intragroU? 


?]t should be noted that for those studies E 
porting no significant differences in bias ha n 
competitive and cooperative conditions, it 25 148 
matter of results falling just short of statistic ob 
nificance but rather that the in-group bias ndi- 
tained is virtually the same under the two € | 
tions. 7 
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tatus differentiation among members of three- 
on groups increased under conditions of 
as compared with 


s 
pers 
intergroup competition, 
cooperative intergroup settings. 


| Group Outcomes: Success-Failure 


One factor that is inherently confounded 
with the presence of explicit competitive in- 
' terdependence between groups is that of dif- 
ferential shared fate; that is, under condi- 
tions of a competitive reward structure, mem- 
bers of a group share (or anticipate sharing) 
а common outcome that is distinct from the 
outcome shared by members of the other 
group. Such a co-occurrence of group bound- 
aries and common fate is one of the criteria 
for perceived "entitivity" of social groupings 
discussed by Campbell (1958). The impor- 
lance of shared outcomes as a determinant of 
in-group bias was empirically verified by 
Rabbie and Horwitz (1969). In that study, 
arbitrary classification of subjects into two 
groups labeled blue and green alone produced 
int in-group bias. However, when 
Ee introduced a chance alloca- 
à tule whereby one group won a prize and 
Jithe other did not, subjects showed a signifi- 
hu m in evaluative trait ratings (made 
Dm kg had been announced) in favor 
Vd "s group, regardless of whether their 
ph P had won or lost or of what allocation 
Че had been applied. 
eral d studies of shared fate as a 
the effect of y Rr bias have focused on 
Süs failure ; p ac levement— success ver- 
fiy ma in a competitive setting. The 
TN Tree studies by Blake and 
(1963) incl 2 and by Bass and Dunteman 
qM luded ratings taken at the end of 
winning a competition phase, after the 
nounced, T losing teams had been an- 
Ben. both cases, the -self-ratings of 
fated н remained significantly in- 
but the s еу were during competition), 
nes s groups' self-ratings dropped, 
iller EE Similarly, Wilson and 
comes were ound that when win-loss out- 
tative t ceecimentally manipulated, eval- 
d gs of teammates and out-group 
| Were affected. In comparison with 
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ratings made prior to competition, subjects 
on winning teams showed a significant in- 
crease in bias in favor of their own team 
members, whereas subjects on losing teams 
showed a smaller difference in ratings of in- 
group and out-group members in favor of 
the winning out-group. Ryen and Kahn 
(1975) also found that winning under com- 
petitive conditions significantly enhanced in- 
group bias in evaluative trait ratings but that 
feedback indicating one's own group had 
lost reduced perceived in-group-out-group 
differences to nonsignificance. 

One experiment reported by Kahn and 
Ryen (1972) extended the range of win-loss 
outcomes studied by having groups of sub- 
jects engage in a series of simulated football 
plays and then giving each group feedback 
information that they had won either 100%, 
50%, or 0% of the plays in comparison with 
another team. After this feedback, subjects 
made ratings of in-group and out-group team 
members on 11 evaluative scales. Mean in- 
group ratings increased (and, to a lesser ex- 
tent, out-group ratings decreased) as a func- 
tion of the percentage of in-group success. 
The resulting differences between mean rat- 
ings of in-group and out-group were not sig- 
nificantly different from zero for groups with 
no wins (mean in-group-out-group difference 
= —1.5), were significantly biased in favor 
of one's own group for those with 50% wins 
(mean difference — 8.3), and were signifi- 
cantly more biased for those with 100% suc- 
cess (mean difference — 14.5). 

In the preceding experiment, in-group bias 
occurred even when outcomes for the in- 
group were the same as those obtained by the 

wins), as long as the in- 


out-group (50% 
group attained some degree of success. Ina 


second experiment, Kahn and Ryen (1972) 
tested whether differentiation between in- 
group and out-group outcomes was an impor- 
tant factor in in-group bias when competitive 
interdependence between groups was removed. 
Groups of three subjects worked indepen- 
dently on selected IQ test items, and then 
each group was given feedback indicating 
whether their performance resulted in a high 
proportion of successes or a low proportion 
of successes (high failure) and also whether 
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the other group's performance was high or 
low, with in-group and out-group results ma- 
nipulated independently. After this feedback 
was provided, evaluative ratings were ob- 
tained of in-group and out-group members. 
Under these noncompetitive conditions, sub- 
jects in all conditions showed a bias in favor 
of their own group, but the degree of bias 
was significantly enhanced only when in- 
group success was combined with out-group 
failure. 

Across these studies involving group per- 
formance outcomes there appears to be a 
consistent tendency for subjects to exaggerate 
the difference between in-group and out- 
group qualities when the in-group does well 
in comparison with the other group but to 
reduce the perceived difference when in-group 
and out-group performed the same or when 
the in-group does more poorly. Such a pat- 
tern serves to maximize favorable compari- 
sons and to minimize unfavorable ones and 
may be typical of responses to single, or one- 
time, intergroup comparisons. Responses to 
Success and failure may change, however, if 
interactions are extended in time and further 
comparisons between in-group and out-group 
are anticipated in the future. Worchel, Lind, 
and Kaufman (1975) found that anticipa- 
tion of further competition interacted with 
outcome feedback in determining relative 
evaluation of in-group and out-group prod- 
ucts. Members of winning groups overevalu- 
ated their group product less when they ex- 
pected competition between the groups to 
continue than when they expected it to dis- 
continue, whereas members of losing groups 
devaluated their group product more under 
discontinuing than under continuing condi- 
tions. Worchel et al. interpreted these re- 
sults in the context of ongoing competition 
to avoid “complacency” on the part of win- 
ning groups and to avoid “giving up” on the 
part of losing groups. 

Continued failure or deprivation of the in- 
group relative to a particular out-group across 
a long period of time may lead to compensa- 
tory overevaluation in favor of the in-group 
wherever possible (LeVine & Campbell, 
1972). Branthwaite and Jones (1975) looked 

at the effect of long-standing status differen- 
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tials between ethnic groups on allocations jy 
the Tajfel choice task. When subjects Wert 
divided into groups according to ethnic ide 
tity (Welsh-English), members of the minor 
ity group made more choices that maximize] 
the difference between in-group and out 
group member outcomes than did memben 
of the majority group (who tended to make 
more choices dictated by equality or joint 
gain maximization). Similar findings wer 
obtained by Gerard and Hoyt (1974) with 
experimentally created groups and a differenl 
measure of bias, Subjects in their study wett 
classified as members of a group of 2, 5, orf 
subjects, out of a total of 10 subjects partiti 
pating in a session. Each subject was 
asked to make evaluative ratings of essay 
supposedly written by two other participanti 
in the experiment—one identified by an id 
tification number of someone in the subject 
own group and one identified by an identifi 
tion number from the out-group. Ratings | 
the content of the essays produced no 
group-out-group differentiation, but eval | 
tions of the writers resulted in some differ 
ences. Subjects classified into groups of fit 
and eight showed no significant bias (it 
fact, there was some tendency in favor 0 
the out-group member), whereas subjects 
the minority group of two showed à sigal | 
cant positive bias in favor of their in-grol? 
member. | 
Results from both of these studies suggest 
that minority group status makes in-grol 
membership more salient than does me 
bership in a majority group. A similar height 
ening of awareness of group identity ita 
occur for groups exposed to repeated fai P 
ог loss, particularly when membership 
Such a group is unalterable. Whether rep% y 
failure ultimately generates greater in-8? 4 
out-group differentiation than does Ip 
success has yet to be experimentally e 
strated. However, in a survey of inter) 
perceptions among ethnic groups 2 " 
Africa, Brewer and Campbell (1976) vy 
those groups rated lowest on the 50009 
nomic status index to be higher 2 a 
centric self-regard than those groups Wi вй 
highest socioeconomic status ratings- 
fect may also be related to repeated fi 
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of “reverse discrimination" on the part of 
members of high-status majority groups in 
dealings with individual members of minor- 
ity groups (Dutton, 1976). 


Intergroup Similarity 


Although it has been established that evalu- 
ative bias occurs only in the presence of some 
meaningful distinction between groups (Rab- 
bie & Horwitz, 1969), the minimal differen- 
tiation required allows room for considerable 
variation in implied or explicit similarity 
between members of the in-group and out- 
group. A number of studies have examined 
the effect on in-group bias of variations in 
degree of similarity among in-group mem- 
bers, or of dissimilarity between in-group 
and out-group, on such dimensions as cul- 
ural, Personality, or attitudinal character- 
istics, 

Wilson and Kayatani (1968) divided sub- 


jects into two-person teams, with each team 
composed of members of the same racial 
Broup (Japanese or Caucasian). Each team 
then played a modified Prisoner’s Dilemma 
Game with another team of the same or dif- 
ferent tace. The game choice results were 
Uniform across both types of out-groups— 
choices made in the intergroup setting aver- 
aged only 43% cooperative, whereas choices 
Made within each group averaged 84% co- 
operative, Similarly, postgame evaluative trait 
ratings showed a significant in-group bias re- 
Sardless of whether the out-group was of the 
Same or different race. 
bt More recent study by Dion (1973) also 
Oked at the effect of similarity on inter- 
Soup versus intragroup Prisoner's Dilemma 
En behavior. The experimental manipula- 
E In this study, however, varied intragroup 
S i than intergroup similarity. One half 
that e dyads in the experiment were told 
d members had closely matched per- 
= ity profiles, whereas the remaining pairs 
Es told they had discrepant profiles. All 
a then played a Prisoner’s Dilemma 
E (with two experimental confederates 
an as the out-group team) and also 
"ed both in-group and out-group members 
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on 16 evaluative traits. Both high- and low- 
similar dyads exhibited the same intergroup 
game behavior (averaging 30% cooperative 
choices), but the high-similar pairs exhibited 
significantly more in-group cooperation (59%) 
than did the low-similar pairs (36%). The 
same pattern of results was obtained for the 
evaluative ratings: Out-group ratings were 
essentially the same for all groups, while in- 
group ratings were significantly higher for 
members of the high-similar dyads. 

Billig and Tajfel (1973) compared in- 
tergroup differentiation based on explicit 
similarity with categorization based on no 
similarity principle. In their experiment, in- 
tergroup similarity and categorization were 
manipulated independently. Similarity was 
varied by dividing half the groups according 
to supposed preferences in painting styles 
(Klee vs. Kandinsky) and by dividing the 
remaining groups randomly into groups la- 
beled X or W. Categorization was varied by 
including group label as part of subject iden- 
tification during the allocation task for some 
subjects and omitting group labels for others. 
Results from the allocation task showed sig- 
nificant in-group favoritism in the categoriza- 
tion conditions and no significant favoritism 
in the noncategorization conditions, regard- 
less of similarity. Brewer and Silver (1978) 
also found significant in-group bias on both 
allocation-task decisions and evaluative rat- 
ings regardless of whether groups had been 
formed on the basis of distinct preferences 
or had been formed on the basis of a random 
split after being explicitly told that all sub- 
jects were similar on the preference task. 

The similarity manipulation in the Billig 
and Tajfel and the Brewer and Silver studies 
involved both intragroup similarity and in- 
tergroup dissimilarity. In an experiment by 
Allen and Wilder (1975) these two facets 
were varied independently in a 2 X 2 de- 
sign. With painting style preference as the 
ostensible basis of categorization into groups, 
subjects were provided with further informa- 
tion indicating the percentage (high or low) 
of responses to an attitude questionnaire 
that were similar to their own responses for 
other members of their own group and for 
members of the other group. Subjects then 
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made choices on the Tajfel allocation task 
on behalf of an in-group member and an 
out-group member. Subjects in all experi- 
mental conditions showed some degree of 
in-group favoritism in allocation decisions. 
High in-group similarity produced signifi- 
cantly more bias than did in-group dissimi- 
larity, but similarity-dissimilarity of the out- 
group had no effect on degree of in-group bias. 

Results from all of these studies are con- 
sistent in indicating that explicit dissimilar- 
ity within the in-group reduces in-group bias 
but that information on similarity between 
the subject and out-group members makes no 
difference. However, it may be that perceived 
similarity within the in-group and perceived 
dissimilarity from the out-group are highly 
interdependent, as suggested by the results 
of an experiment by Hensley and Duval 
(1976). In this Study, information on the 
opinions held by 10 subjects in a discussion 
Broup was presented graphically in such a 
way that each subject's own opinion was 
depicted within a cluster of seven other sub- 
jects’ opinions, with the distance between 
the subject’s opinion and these seven held 
constant. The Positioning of the opinions as- 
sociated with the remaining 2 subjects was 
varied across five levels of distance from 
this majority cluster, Following this presenta- 
tion, each subject made ratings of the other 
9 subjects in the Session on perceived simi- 
larity to self and on liking. The results for 
Perceived similarity ratings revealed an as- 
similation-contrast effect: The greater the 
distance between the minority (out-group) 
and the majority (in-group), the greater the 
Perceived similarity within the subject’s own 
group. A parallel effect was obtained on the 
ratings of liking for members of the majority 
and minority groups, 


Salience of Categorization 


Since the grouping of subjects into major- 
ity and minority clusters in the Hensley and 
Duval study was not explicitly labeled, it is 
likely that the effect of increasing the visual 
distance between the two Clusters was to 
increase the probability that the subject 
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would perceive a boundary between the two 
groups. In fact, in the three conditions in 
which the distance between -clusters was great 
enough to insure the perception of distinct 
groupings, the ratings of perceived similarity 
and of liking for in-group as opposed to out- 
group members were essentially the same, 
the only significant differences occurring be- 
tween these three conditions and the two 
conditions involving lesser distances. Thus, 
the effect attributed to out-group dissimi- 
larity may have been due to the differential 
salience of the in-group-out-group distinc- 
tion. Other research in which the salience of 
categorization has been manipulated either 
directly (e.g Billig & Tajfel, 1973) or in- 
directly (e.g, Gerard & Hoyt, 1974) con- 
firms the importance of this factor in elicit- 
ing in-group bias, 

Results of several studies indicate that 
the same differences among individuals may 
or may not lead to bias depending on whether 
a basis for grouping has been made salient. 
For instance, in a study by Stephenson, 
Skinner, and Brotherton (1976), secondary 
School students were assessed on their at- 
titudes toward raising the age for compulsory 
schooling. Experimental sessions were com: 
Posed of four students in favor of and four 
against raising the age. At the beginning. of 
the session subjects were given information 
about the distribution of attitudes among 
the eight participants and then were asked 
to rate each of the participants on five eval- 
uative trait scales. Ratings were made both 
before and after subjects were divided into 
four-person groups (based on initial atti- 
tudes) for participation in an intergroup 
hegotiation task. Prior to the division into 
labeled groups, ratings showed no in-group 
bias, but following the group task, ratings 


changed significantly in favor of the in-group. * 


Turner (1975) also reported a complex in- 
teraction between participation in an inter- 
group task and in-group favoritism. Subjects 
in his study were divided into two groups 
and then made two sets of choices on a Tajfel 
allocation task—once making choices on be- 
half of two other subjects (one of whom 
was an in-group member and one an out-group 
member) and once making choices on behalf 


і 


+ 


у о. 


of self and one other subject (who was either 
an in-group member or an out-group mem- 
ber). For subjects who made selí-other choices 
first, favoritism toward self was moderately 
high, regardless of whether the other was 
an in-group member or an out-group mem- 
ber. However, for subjects who made in- 
group-out-group member choices before mak- 
ing self-other choices, self-favoritism was 
significantly higher when the other was an 
out-group member than when the other was 
an in-group member. Thus, prior participa- 
tion in a task that made the intergroup dis- 
tinction salient enhanced the differentiation 
between self and out-group member, but re- 
duced differentiation between self and in- 
group member. 

The mere presence of more than one mem- 
ber of a distinct social group apparently in- 
creases the salience of grouping and associ- 
ated biases. Dustin and Davis (1970) ob- 
served the effects of competition between 
two groups of three subjects when the com- 
petitive interaction took place on an indi- 
vidual (1:1) basis or on a group (3:3) basis. 
Following group competition, product rat- 
ings were significantly biased in favor of 
subjects’ in-group output, but no own-group 
bias was obtained for product ratings from 
subjects whose groups interacted on an in- 
dividual competition basis. Similar effects 
have been obtained for biases associated 
with nonexperimental social groups. Doise 
and Sinclair (1973) studied the effect of ref- 
erence group salience on accentuation of 
stereotypes associated with collegians (male 
secondary school students) and apprentis 
(vocational trainees). Members of both 
groups were brought together in either a 1:1 
or a 2:2 encounter and, following a short 
discussion period, were asked to make trait 
ratings of the respective groups. In the 2:2 
condition collegians gave ratings significantly 
more biased in favor of their own group than 
+they did in the 1:1 condition, whereas ap- 
prentis showed less derogation of their own 
group relative to the higher status out-group 
in the 2:2 than in the 1:1 condition. Similar 
effects of the presence of multiple members 
of both groups have been obtained for ac- 
centuation of stereotypes based on sex (Mc- 
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Killip, Dimiceli, & Luebke, 1977) and on 
ethnic identity (Dion & Earn, 1975). 


Summary 


The interpretation of results from all of 
the experimental studies reviewed in this 
section (entitled Sources of Variation in 
Bias) has been consistent with the following 
general conclusion: Any of the situational 
factors found to be associated with enhance- 
ment of in-group bias can be subsumed under 
the effect of the salience of the distinction 
between in-group and out-group. Factors such 
as interdependence, intergroup similarity, and 
shared fate all affect the probability that a 
respondent will be aware of a relevant basis 
for categorization into groups, which in turn 
determines the amount of in-group bias that 
is evidenced. Once a particular categorization 
has become salient, however, the degree of 
bias obtained is fairly constant despite fur- 
ther variations in out-group similarity (e.g., 
Allen & Wilder, 1975) or in opportunity for 
cooperative interaction (e.g., Brewer & Silver, 
1978; Worchel et al., 1977). 


Locus of Bias: Which Dimensions? 


Though the argument has been made that 
in-group bias is related in an all-or-nothing 
manner to category salience, the bias associ- 
ated with any particular basis for categoriza- 
tion into in-group and out-group may not 
be constant across all response dimensions. 
There are a number of sources of evidence 
for specificity of effects, or "selective bias" 
(Wilson, Chun, & Kayatani, 1965). A series 
of studies by Wilson and his associates (e.g., 
Wilson et al, 1965; Wilson & Kayatani, 
1968) indicate that following intergroup 
competition within a Prisoner's Dilemma 
Game format evaluative bias is most pro- 
nounced on game-relevant motive traits (e.g., 
cooperative, fair, and kind), whereas in- 
group bias is less pronounced on sociometric 
(e.g. likeable) or ability traits (e.g., capable 
and intelligent) and least evident on general 
personality dispositions (e.g. neurotic and 
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anxious). Dion (1973) also found that in- 
group bias after participation in an inter- 
group Prisoner’s Dilemma Game was great- 
est on the dimension of trust, and Brewer and 
Silver (1978) obtained the most bias on 
ratings of trustworthiness, friendliness, and 
cooperativeness, even after respondents had 
engaged in a cooperative intergroup alloca- 
tion task. The latter study also found a non- 
significant correlation (r — .14) between in- 
group favoritism on the allocation task and 
in-group bias on evaluative trait ratings. 
Similarly, Ryen and Kahn (1975) obtained 
no significant correlation between evaluative 
in-group bias and intergroup distancing, as 
evidenced in seating behavior, and Worchel et 
al. (1975) reported a low correlation between 
liking for the in-group and relative evalua- 
tion of in-group versus out-group products. 
Going outside of the laboratory, Brewer and 
Campbell (1976) found in a large-scale sur- 
vey of intergroup attitudes that the psycho- 
logical distance reported between a respon- 
dent’s in-group and a particular out-group 
varied depending on whether the response 
measure dealt with affective relations (e.g., 
social distance), evaluation, or respect. 

The finding that bias depends on some in- 
teraction between the categorization variable 
and the response dimension on which bias is 
assessed is consistent with a cognitive inter- 
pretation of intergroup bias (Tajfel, 1959; 
1969). The comparison with general theories 
of cognitive processing is best illustrated by 
Tversky’s (1977) feature-matching model of 
similarity judgments, In Tversky’s model, 
the perceived similarity between two objects 
is a function of some linear combination of 
their common and distinctive features; but 
the weight assigned to any particular feature 
or set of features may vary depending on the 
context or the nature of the judgment task. 
As a result, the same two objects may be 
judged to be highly similar within one frame 
of reference and highly distinct within an- 
other. The determinants suggested by Tver- 
sky for this lability of perceived similarity” 
are relevant to the judgment of objects ex- 
ternal to the respondent. It may be that 
when identification with oneself is a salient 
feature of one of the objects to be judged, 
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motivational factors enter into the selection 
of features to be attended to (cf. Christian, 
Gadfield, Giles, & Taylor, 1976). 

The interdependence of perceptual and 
motivational factors is highlighted by some 
interesting parallels between the research lit- 
erature on perceptual accentuation in social 
judgment (Eiser & Stroebe, 1972) and on 
in-group bias effects (Turner, 1975). In the 
judgment literature, enhancement of con- 
trast occurs when the judged distance be- 
tween stimuli that are members of different 
classes is exaggerated. The occurrence of 
this accentuation effect depends on the pres- 
ence of at least some minimal correlation be- 
tween the classification variable and variation 
among the stimuli on the property being 
judged (Campbell, 1956; Tajfel, 1959; Taj- 
fel & Wilkes, 1963). However, when the 
stimuli are social objects, toward which the 
judge has differential orientations, a second 
condition has been found to be necessary for 
enhancement of contrast, namely, that the 
judge's own position be located on what the 
rater perceives to be the positive side of the 
dimension of judgment (Eiser, 1975; Eiser & 
Mower-White, 1974). 

Analogous to the conditions associated with 
the enhancement of contrast effect, Turner's 
(1975) social comparison theory of in-group 
favoritism specifies two preconditions: (a) 
salience of some basis for distinction between 
in-group and out-group and (b) availability 
of “differentially valued actions relevant to 
the categorization" (p. 12). The presence of 
these conditions generates intergroup social 
competition, the aim of which is to take ad-^ 
vantage of opportunities to maximize the rela- 
tive advantage of the in-group over the out- 
group. In effect, then, in-group bias results 
from a motivated search to represent the dif- 
ferences between groups along dimensions 
that favor the in-group. If outcomes favoring 
the in-group are not available, the distinc- 
tion between them will be minimized rather- 
than accentuated, 

The presence of motivational influences 
can lead to important asymmetries in the 
ways in which the members of two social 
groups perceive the differences between them, 
such as those obtained for groups differing 
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in socioeconomic status (e.g., Branthwaite & 
“Jones, 1975; Brewer & Campbell, 1976). 
Members of Group A may perceive a major 
difference between themselves and Group B 
along dimension X, whereas members of 
Group B may focus on the common features 
of A and B relevant to X and emphasize the 
distinctive features relevant to dimension Y. 
(Imagine, e.g., members of a winning team 
following a football contest and highlighting 
the differences between the teams in agility 
and skill, while members of the losing team 
regard ability differences as marginal, but 
emphasize differences between the teams in 
“how the game was played" with respect to 
fair play and sportsmanship.) Even differ- 
ences on a single dimension can be repre- 
sented in alternative ways that favor one 
group or the other (Campbell, 1967; Pea- 
body, 1967; Vassiliou, Triandis, Vassiliou, & 
McGuire, 1972). For example, an objective 
difference between the customs of two groups 
with regard to the sharing of household 
items may be represented by one group as a 
distinction between generosity and selfishness, 
but may be defined by the other group in 
terms of responsibility-irresponsibility. Such 


£m differences can introduce considerable varia- 


^ 


tion in assessments of in-group bias. 

Although not all differences can be repre- 
sented in a manner congruent with positive 
self-image, some characteristics of groups 
lend themselves to universal bias. Across ex- 
perimental and field studies of the content of 
intergroup perceptions, the dimensions on 
which evaluative bias in favor of in-groups 
‘occurs most reliably are those associated 
with trustworthiness, honesty, or loyalty. All 
these are traits related to normative expecta- 
tions that apply to intragroup—as opposed 
to intergroup—behavior. To the extent that 
norms prescribing preferential treatment for 
members of one's own group are characteristic 
of in-group formation, they generate a set of 
reciprocal stereotypes (Campbell, 1967) that 
any two groups might have of each other and 
with which each could legitimately place the 
in-group on the positive side of the scale 
(e.g., “we are loyal; they are clannish”; “we 
are honest and peaceful among ourselves; 
they are hostile and treacherous toward out- 
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siders”). This reciprocal contrast is basic to 
the “mirror-image” phenomenon in interna- 
tional perception, as portrayed by Bronfen- 
brenner (1961). 


Locus of Bias: In-Group or Out-Group? 


The extensive literature on group cohesive- 
ness indicates that factors such as similarity 
among group members (e.g., Anderson, 1975) 
and shared success (e.g., Blanchard, Adel- 
man, & Cook, 1975) enhance attraction to- 
ward one’s own group in the absence of com- 
parison with any other groups. Since in-group 
bias research focuses on favoritism toward 
the in-group relative to an out-group, it is 
often ambiguous whether the comparison 
rests on enhancement of the in-group, de- 
valuation of the out-group, or both. In many 
studies, particularly those dealing with eval- 
uative biases, results were reported only in 
the form of net ratings or difference scores 
(eg, Doise & Sinclair, 1973; Dustin & 
Davis, 1970; Ferguson & Kelley, 1964; 
Gerard & Hoyt, 1974; McKillip et al., 1977), 
thereby losing information as to whether 
variations in bias were a function of increases 
in in-group ratings or decreases in out-group 
ratings. 

Among those studies that did report both 
in-group and out-group ratings separately, 
results are mixed as to the location of bias. 
Some studies that compared intergroup co- 
operation and competition reported no change 
in in-group attraction, but reported a de- 
crease in out-group ratings under competition 
conditions (Rabbie et al., 1974; Worchel et 
al., 1977). Other research indicates that varia- 
tions in degree of bias are a function of 
both increased in-group and decreased out- 
group ratings (Hensley & Duval, 1976; Kahn 
& Ryen, 1972; Wilson et al., 1965). The 
majority of studies, however, indicate that 
increases in bias are associated with en- 
hanced in-group evaluation, whereas out- 
group ratings remain relatively constant 
(Dion, 1973; Rabbie & Horwitz, 1969; Rab- 
bie & Wilkens, 1971; Ryen & Kahn, 1975; 
Stephenson et al., 1976; Wilson & Miller, 
1961; Worchel et al., 1975). The results in 
general, then, are consistent with the conclu- 
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sion that in-group bias rests on the percep- 
tion that one's own group is better, although 
the out-group is not necessarily depreciated. 

The above conclusion suggests that the ef- 
fect of in-group-out-group categorization is 
one of differentiating the in-group from the 
out-group rather than of differentiating the 
out-group from the in-group, as the process 
is usually conceived. This means that the 
baseline should be conceptualized as a state 
in which the self is perceived as distinct from 
an undifferentiated group of others. The in- 
troduction of an in-group-out-group boundary 
is then associated with a realignment of per- 
ceptions wherein members of the in-group are 
perceived to be less differentiated from the 
self, while the distance between the self and 
out-group members remains unchanged. This 
conceptualization of the differentiation pro- 
cess is borne out by studies that modified 
the Prisoner's Dilemma Game for group play 
(e.g., Dion, 1973; Wilson & Kayatani, 1968). 
In terms of the high percentage of competi- 
tive choices, intergroup behavior in these 
games parallels closely the game behavior of 
individual players. It is the increased pro- 
portion of cooperative choices exhibited in 
intragroup decisions that deviates from typi- 
cal interindividual play. 

Reconceptualizing the process of inter- 
group differentiation tends to shift the focus 
of attention from the negative implications of 
out-group perceptions to the positive conse- 
quences of in-group formation. The critical 
role of in-group identity in the extension of 
interpersonal trust has already been alluded 
to. Another consequence of the reduced so- 
cial distance between self and others that ac- 
companies in-group formation is that out- 
comes to other group members, or to the 
group as whole, come to be perceived as one's 
own. Indeed, there is evidence that feedback 
regarding total group outcomes can have 
more impact on the individual than feedback 
on his or her own performance (e.g., Zander 
& Armstrong, 1972) and that expected and 


perceived success is higher at the group level 


than at the individual level (Janssens & 
Nuttin, 1976). Satisfaction and identification 
with group success tend to be high even when 
the individual's contribution to that success 
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has been minimal (e.g., Kahn & Ryen, 1972) 
or nil (Cialdini et al., 1976). Ы 

The capacity of in-group identification to 
amplify feedback has important implications 
for the solution of that class of social prob- 
lems characterized as commons dilemmas 
(Dawes, McTavish, & Shaklee, 1977; Hardin, 
1968) or social traps (Platt, 1973). The es- 
sence of these problems is a “divergence be- 
tween what people are individually motivated 
to do and what they might accomplish to- 
gether" (Schelling, 1971, p. 68). The most 
critical social dilemmas derive from behav- 
iors for which rewards outweigh small costs 
at the individual level (e.g., taking an ex- 
tended shower) but that result in cumula- 
tive high costs at the group level (e.g., deple- 
tion of water supplies). The solution to such 
dilemmas requires that the collective out- 
come be real enough to the actor to overcome 
individualistic motivational dynamics (Mes- 
sick, 1973; 1974). The reduced differentia- 
tion between one's own and other outcomes 
associated with in-group formation provides 
one mechanism for increasing the weight 
given to collective outcomes in individual 
decision making. 


t 
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The idea of capitalizing on the social bene- g4 


fits of group identification raises concern 
about whether the positive consequences of 
in-group formation depend on the presence 
of a distinct out-group. To those who hold 
that the effects of social categorization are 
the result of intergroup social comparison 
(Turner, 1975; Tajfel, in press), the ex- 
istence of an identifiable out-group is es- 
sential. Although groups may function in the’ 
absence of any other groups, the mere pres- 
ence of an out-group is sufficient to signifi- 
cantly alter in-group processes (Billig, 1976). 
On the other hand, if one associates group 
identification with more general concepts of 
unit formation (Campbell, 1958; Heider, 
1958), awareness of differentiated social 
groupings may be only one potential mech- 
anism—however important—by which the 
self is included in a bounded social unit. Per- 
haps the salience of interdependence or com- 
mon fate can be enhanced among any given 
set of individuals without reference to other 
subsets. If so, the focus of research on in- 
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to intragroup contexts. 
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A Simplex Process Model for Describing Differences 
Between Cross-Lagged Correlations 
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A model that is based on the use of the diagonal method of factoring to describe 
a simplex process and that explains differences between cross-lagged correlations 
is presented. When the model is applied to the data of Atkin et al, which 
showed that Listening tapped most directly the causes of later intellectual de- 
velopment, quite good fits are obtained for both 2- and 4-year lags. The best 
fit, however, is based on an estimate of the effects of a 3-year lag between 
changes in the rank order of individual differences on the Listening test and the 
same changes on the intellectual composite. It is shown that accurate knowledge 
of reliabilities and specificities of the measures is necessary for the interpreta- 
tion of cross-lagged differences but that stationarity per se is not an essential 


assumption. 


The first reaction of most behavioral scien- 
tists to a discussion of cross-lagged panel 
correlation methodology is to question the in- 
ference of causality from a difference in the 
size of correlations. It does not matter how 
large or how statistically significant the dif- 
ference may be. Discussions of this meth- 
odology, such as those by Campbell and 


44. Stanley (1963) and by Kenny (1975), have 


= 


not provided a rationale that is sufficient to 
satisfy these skeptics. The purpose of this 
article is to describe a factor model that pro- 
vides a new and plausible basis for a pos- 
sible causal inference and to apply it to some 
published data. 


Simplex Correlation Matrices 


Since the cross-lagged methodology is used 
in the expectation that there will be change 
in the rank order of individual differences in 
the functions measured with the passage of 
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time, one should also expect that the inter- 
correlations of each of the two or more mea- 
sures over multiple occasions will show the 
simplex pattern (Guttman, 1955; Humphreys, 
1960; Rozelle & Campbell, 1969). Guttman 
used the simplex matrix originally to describe 
the intercorrelations of several measures on 
a single measurement occasion when the sev- 
eral measures seemingly varied only in their 
degree of complexity. Humphreys extended 
the applicability of the simplex pattern to 
learning and maturational data in which one 
measure was obtained on repeated occasions. 
Rozelle and Campbell discussed the simplex 
matrix with respect to time series generally. 
Both Guttman and Humphreys argued against 
the use of multiple or common factor analysis 
as appropriate for such data. 

In place of common factor analysis, Gutt- 
man recommended the use of diagonal factor- 
ing, which produces the same number of 
factors as there are variables in the correla- 
tion matrix. If the measures are ordered in 
increasing order of complexity, the first factor 
is based on. the correlations that involve the 
simplest measure, and the nth is based on 
the residual correlations that involve the most 
complex measure. For learning data, if one as- 
sumes accretion of responses, the first factor 
is based on the correlations that involve the 
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Table 1 
A Simplex Matrix of Intercorrelations and 
Its Diagonal Factoring 


Occasion 1 2 3 4 
Occasion 
1 1.000 .960 .922 .885 
2 .960 1.000 .960 .922 
3 .922 .960 1.000 .960 
4 .885 922 -960 1.000 
Factor 
1 1.000 .000 .000 .000 
2 .960 .280 .000 .000 
3 922 .268 .280 .000 
4 .885 .259 268 .280 


first trial; if learning seemingly involves the 
dropping out of unnecessary responses, the 
first factor is based on the correlations of 
the nth trial with all of the others. In both 
cases factors are extracted from successive 
tables of residuals in the serial order of the 
learning trials, either forward or backward in 
time. Similar reasoning holds for develop- 
mental data. 


An Arbitrary Example 


The applicability of diagonal factors to 
changes over time can be illustrated by an 
arbitrary example. Table 1 presents a 4 X 4 
matrix of intercorrelations of the hypotheti- 
cal variable X, which is measured with per- 
fect reliability. Also appearing in this table 
are the diagonal factors. The first occasion is 
used to define Factor 1, followed in order by 
the second, third, and fourth occasions de- 
fining Factors 2, 3, and 4. 

Although the diagonal method can be ap- 
plied to any matrix of intercorrelations, it 
has special properties when used for the sim- 
plex. For one thing, the ordering of the vari- 
ables determines the order in which factors 
must be extracted if one wishes to portray 
the special characteristics of the simplex ma- 
trix. Secondly, whether factoring starts with 
the first or the nth variable, the*factor matrix 
has one triangle of zero loadings and a second 
triangle of positive nonzero loadings. The fac- 

tor matrix directly reflects causes of change. 
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Causes operative before the first measure- 
ment occasion, extending backward in time 
over the history of the organism, are aggre- 
gated in the first factor, with the major effect 
on the first occasion but with effects on all 
subsequent occasions as well. As new causes 
of change enter the picture between measure- 
ment occasions, the effects occur only on 
subsequent occasions. New causes appear be- 
tween each pair of occasions. 

The intercorrelations over multiple occa- 
sions of each of the two or more measures 
being studied can be factored by the same 
methodology, but it is not essential to the 
accurate description of the intercorrelations 
that the first factor be defined by the first 
Occasion for each measure. Several different 
diagonal factorings of the intercorrelations 
presented in Table 1 are shown in Table 2 
for the hypothetical variable V, which is a 
variable possibly influenced by X. Although 
an infinite number of possible rotations are 
possible mathematically, the four illustrated 
in Tables 1 and 2 are the appropriate ones 
for present purposes. 

Note that the factoring for variable X 
could also be extended backward in time. 


Occasions -1, -2, and -3 would produce col- } 


umns of factor loadings parallel to those of 
Factors 3, 2, and 1, respectively, in the Lag 
3 factoring of variable Y. Diagonal factoring 
of X and Y can go on indefinitely both for- 
ward and backward in time. For purposes of 
describing the intercorrelations of each in- 
dependently of the other, the choice of the 
occasion on which one decides to start fac- 
toring is highly arbitrary. 

For purposes of describing a cross-lagged 
difference, however, choice of occasion is not 
arbitrary. Factor 1 in X, extracted at Time 
1, is identified with a factor in Y, depending 
on the lag hypothesized, at Time 2, 3, or 4. 
Depending on this decision, the factor matrix 
in Table 2 that represents the appropriate 
lag is selected to represent variable Y. This: 
matrix is then modified by deleting one, two, 
or three columns from the left-hand side, de- 
pending of course on the time lag hypothe- 
sized, and adding the same number of columns 
of zeros on the right. The factor matrix of X 
1S now postmultiplied by the transpose of the 
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Three Alternative Factorings of the Intercorrelations of Table 1 


Factors for Lag 1 


Factors for Lag 2 


Factors for Lag 3 


Occasion 1 2 3 4 1 2 3 4 1 2 3 4 
1 .280 .960 .000 .000 .280 .268 .922 .000 .280  .268 .259 .885 
2 .000 1.000 .000 .000 .000 .280 .960 .000 .000 .280 .268 .922 
3 .000 .960  .280 .000 .000 .000 1.000 .000 .000 .000  .280 .960 
4 .000 .922 .268 .280 .000 .000 .960 .280 .000 .000 .000 1.000 


modified Y factor matrix to produce the 
cross-correlations between the two variables. 
The discard of factors in Y that preceded the 
factor identified with Factor 1 in X prior to 
the matrix multiplication is justified on the 
basis that they were determined by factors in 
X that preceded the present Factor 1. 

This model can readily be related to the 
common interpretations of cross-lagged dif- 
ferences. These interpretations have been 
arranged in order from the one that is closest 
to the observations and therefore basic to all 
of the others to the one that clearly requires 
additional research to establish. The first 
interpretation, also, serves as the basis for. 
equating an earlier factor in X with a later 
factor in Y. The interpretations are as fol- 
lows: 

1. Individual differences in X anticipate in 
time individual differences in У. 

2. Variable X at Time 1 taps the causes 
for changes in У that appear subsequent to 
Time 1. 

3, The predominant causal sequence is from 
X to Y rather than from Y to X. 

4. Changes in X cause changes in Y. 

A more formal treatment of the argument 
up to this point follows. Let X be the hy- 
pothesized antecedent variable and Y the con- 


Table 3 
Cross-Correlations Between 


sequent variable. F, is a diagonal factor 
matrix determined from R,, and Е, is a di- 
agonal factor matrix determined from Ry. 


В, = EF. (1a) 
R, = EF, (1b) 


Now let Fy, Ej, :::; Fu, represent factor 
matrices in which the first column represents 
the first diagonal factor extracted whether that 
factor represents Time 1, 2, ог n. The F,, 
matrix contains no column of zeros on the 
right, but all others contain one or more 
columns of zeros up to 7 — 1 such columns for 


ут 


Ra, = F.F’,, with time lag of zero between X 
and У (no cross-lagged difference ob- 


served). (2a) 
Ra = F.F’,, with time lag of one between X 
and Y. (2b) 
Ra, = F.F’,, with time lag of n — 1 between 
X and Y. (2c) 


Three different sets of possible cross-corre- 
lations are shown in Table 3, based upon lags 
of one, two, and three occasions, respectively. 
When the lag is one interval, it is seen that 
the maximum cross-lagged difference occurs at 


X and Y for Three Lags 


Cross-Correlations Betwean X ond Y Jr m MM 


Y for Lag 1 Y for Lag 2 Y for Lag 3 
x 1 2 3 4 1 2 3 4 1 2 3 4 
1 960 1.000  .960 -922 .922 .960 1.000 .960 885  .922 .960 1.000 
2 ‘922  .960 1.000  .960 .885 922 .960 1.000 .850 .885 .922 .960 
3 2885  .922 .960 1.000 .850 .885  .922 .960 816.850 .885 .022 
4 "50  .885  .922 .960 .816 .850 .885 .922 783 .816 .850 .885 
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that interval; but there is little drop in size at 
intervals of two or three. When the lag is 
more than one interval, the maximum cross- 
lagged difference occurs at the appropriate 
interval. With zero lag, there are no cross- 
lagged differences." 


Required Additions to the Model 


The preceding discussion contains the heart 
of the model, but it is deficient in three par- 
ticulars. The matrix multiplication required 
for Table 3 assumes a one-to-one correspond- 
ence between factors in X and Y when the 
appropriate time lag is selected. This is un- 
realistic. The most obvious lack is failure to 
allow for measurement error. When the appli- 
cation is to fallible data, one must substitute 
reliabilities for the unities in the principal 
diagonal of the correlation matrix before fac- 
toring. Diagonal factoring with reliabilities 
in the principal diagonal of the R matrix can 
produce negative residuals that in turn pro- 
duce negative factor loadings. If the R matrix 
can be considered to represent a simplex pro- 
cess with little error, residuals from diagonal 
factoring that would be zero if unities had 
been used will be very small positive or nega- 
tive values that can be disregarded. 

This means that the matrix multiplications 
in Equations 1a and 1b still hold, with the 
exception that the R matrix is defined with 
reliability estimates rather than with unities 
in the principal diagonal. This also means that 
knowledge of reliabilities is an essential re- 
quirement in analyzing cross-lagged differ- 
ences by means of the present model. In this 
respect the model does not differ from other 
methods of analysis (see especially, Kenny, 
1975). 

A second problem is that practically all 
measures used in the behavioral sciences con- 
tain unique nonerror (specific) variance in 
addition to common factor and error variance. 
It is logically impossible for the specific vari- 
ance in X to be related in any way, causally 
or otherwise, to individual differences in Y. 
One way or another it is necessary to obtain 
an estimate of the specificity of both X and Y 
relative to each other. When circumstances 

are appropriate, specifics can be estimated by 
common factor analysis or by multiple regres- 
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sion analysis. At times psychometric analysis 
suffices. Again, as with measurement error, 
knowledge of specifics is an essential require- 
ment in analyzing cross-lagged differences. 
Either reliability differences or specificity dif- 
ferences from one occasion to another can 
produce completely spurious cross-lagged 
differences. In Kenny's discussion of a correc- 
tion for reliability-specificity differences, the 
two are legitimately merged in the concept of 
uniqueness. 

The presence of specificity does lead to a 
change in Equations 2a, 2b, and 2c. Two new 
matrices are required. The first, Н,, is a di- 
agonal matrix consisting of kı, Jt», . . ., An 
These values are determined from the propor- 
tions of common (4^) and specific (s?) vari- 
ance in the true score variance of X at each 
time period. The matrix H, is defined in 
parallel fashion. In contrast to Fy, which is 
modified in accordance with the hypothesized 
lag, H, is independent of lag, that is, H, 
characterizes the measures. If F, and F, now 
represent the factors extracted from fallible 
measures, Equations 2a, 2b, and 2c are re- 
written as follows: 


Ra = (H.F.) (F’,,H,). (3a) 
Ка = (H,F.)(F', H,). (3b) 
Ка = (ЊЕ) (Е',„Н,). (3с) 


The third lack in the general model is only 
a little less ubiquitous than the presence of 
measurement error and specificity, but is not 
as important in its contribution to variance 
or to model fitting. This source of variance is 
correlated error that inflates the correlations 
between variables that are measured on a 
single occasion. This third component, by 
definition, can affect only the synchronous 
correlations among measures and cannot pro- 
duce spurious cross-lagged differences. Knowl- 


1 These relationships are, of course, the effects of 


the assumption of a correlation of .96 between true ~ 


Scores on adjacent measurement occasions for both 
X and Y. This level of correlation was selected as a 
representative figure for ability measures. Its con- 
Stant size over both variables and occasions repre- 
sents the assumption of stationarity that is later 
discarded. A lower correlation or one that varied 
Írom one occasion to another or from one variable 
to another would have rather different effects. 
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~ Table 4 
Wr nlercorrelations of Listening and the Composite in Four Grades for 1,430 White Boys and Girls 


Listening Composite 
Test and 
grade 5 7 9 11 5 7 9 11 
Listening 
5 744 .679 .630 +782 .770 764. 746 
7 :751 .683 -760 .830 .820 .802 
9 .690 .658 734 .806 .162 
4 11 .637 .685 -730 +785 
Composite 
5 .928 .888 .862 
7 .938 912 
9 .930 
11 


Ў edge of this source of varance is not, there- 
fore, essential. Absence of this information, 
however, will prevent one from obtaining a 

4, ood fit of the synchronous correlations to 
empirical data, 


Aural Comprehension 
and Intellectual Development 


Atkin et al. (1977b) obtained cross-lagged 
| differences that involved a measure of aural 
ec comprehension and an intellectual composite 

of 15 separate cognitive tests. The direction 
of the differences was that the Listening test 
predicted the composite more highly than 
the composite predicted Listening. The dif- 
Ж ferences were also highly significant both 
statistically and psychologically. Four groups 
defined by race (black and white) and sex 
were studied on four occasions, at Grades 5, 
7, 9, and 11. Results were highly consistent 
ior the four groups and for the six pairs of 
Occasions, with only one small exception out 
of 24 comparisons. The size of the difference 
between the cross-lagged correlations for the 
œ four groups in most of the comparisons was 
between .10 and .20. 
, In this research the composite that was 
compared with the Listening test was formed 
from the following tests: the two tests of 
School and College Aptitude, the five tests 
other than Listening from the Sequential 
Tests of Educational Progress, and the eight 
narrow information tests that cover hetero- 
geneous areas from the Test of General In- 


formation." Similar comparisons for each of 
the 15 tests in this particular composite with 
a rotating composite formed in each case from 
the remaining members of the original set of 
16 were also made. Optimum weights obtained 
from multiple regression and canonical analy- 
sis were used to form each composite, but 
with large Ns there is little capitalization on 
chance in the obtained values. No other mea- 
sure in the set of 16 showed differences that 
even approached in size or consistency those 
shown by the Listening test. 

To apply the model to these data we de- 
cided to combine all correlations that in- 
volved white males and females. These two 
groups contained the larger number of cases 
(668 males and 762 females). It is also known 
that the common factor patterns of the 16 
tests at each grade level are highly similar 
for the two white groups (Atkin et al., 
1977a). Blacks are not known to differ, but 
there is more sampling error *noise" in their 
data. The mean correlations, based on equal 
weights for the two sexes, are presented in 
Table 4. When one compares this table with 
the cross-correlations in Table 3, the asym- 
metry characteristic of a lag between X and 
Y is readily apparent, 

At present, fitting of the model to the data 


? These tests are published by the Educational 
Testing Service (ETS). The data analyzed by Atkin 
et al. (1977b) were made available by Thomas Hil- 
ton and ETS. We owe them thanks for their gracious 
cooperation. 
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Table 5 


LLOYD G. HUMPHREYS AND CHARLES K. PARSONS 


Reliabilities and Specifics for Listening and the Composite in Grades 5, 7, 9, and 11 


ee 


Listening Composite 
Statistic 5 7 9 11 5 7 9 11 
Reliability .743 .817 153 .693 .937 .981 .957 947 
Specific 
variance .106 .108 .084 .066 .000 -000 .000 .000 


proceeds one step at a time. The first step 
requires estimates of reliabilities. In the 
absence of an experimental design that would 
make possible test-retest or parallel-forms 
estimates, it is necessary to assume that the 
intercorrelations of true scores over the four 
time periods form simplex matrices. The 
model developed by Jóreskog (1970) can 
then be used to estimate reliabilities. To ob- 
tain four separate reliabilities from six corre- 
lations, it is also necessary to assume that 
the betas that represent the regression of the 
true score at Time ¢ on the true score at 
Time # — 1 remain constant. This coefficient 
was estimated to be .955 for Listening and 
.968 for the intellectual composite. The esti- 
mated reliabilities appear in Table 5. (Also 
Shown are the specific variances that are 
required later.) With these reliabilities the 
fit of the simplex to the observed intercorre- 
lations of Listening and the composite, re- 
spectively, is excellent. The largest residual 
for the former is .008 and for the latter .001 5 
and x*(1) = .80 and .14, respectively. The 
reliabilities needed for the diagonal factoring 
appear to be highly dependable. 

Since Listening has the possible causal role 
in this analysis, the diagonal factoring for this 
measure starts with Grade 5 and proceeds 
through Grade 11. Since the lag time between 


Table 6 


Diagonal Factors in Listening and the Composite 


changes in the rank order of persons on Lis- 
tening and on the composite is unknown, and 
in the absence of a theory that could guide 
the choice, the diagonal factoring for the 
composite uses both Lag 1 and Lag 2. These 
factors are shown in Table 6. 

Although in one sense a digression, a brief 
discussion of the nature of diagonal factors 
and factor loadings when reliabilities are 


inserted in the principal diagonal of the R:3,. 


matrix may be in order. The first factor ex- 
tracted represents the true score variance of 
the variable measured on that particular oc- 
casion. The factor loadings represent correla- 
tions of the fallible measures with this factor. 
The second factor extracted represents the 
residual true score variance of the variable 
measured on the occasion once removed from 
the one that defined the first factor. The sec- 
ond factor loadings are correlations between 
residual fallible scores and the residual true 
score. The factor matrices are not identical 
with those that would be obtained from R 
matrices corrected for attenuation with uni- 
ties in the diagonal. Factor loadings of falli- 
ble measures are used to estimate the fallible 
cross-correlations between Listening and the 
intellectual composite. 

Before proceeding with the matrix multi- 
plication it is necessary to estimate the non- 


па 


Listening Composite 
Grade 1 2 3 4 1 2 3 4 1 2 3 4 
5 .862 .000 .000 .000 243 .937 .000 .000 .243 .231  .908  .000 
7 .863 .269  .000  .000 -000 .990 .000 .000 .000 .247 .959 .000 
9 .788 .264 .249 .000 -000 .947 .245 000 .000 .000 .978 .000 
11 131777228: 21 245 -000 .921  .237 .207 .000 .000 .951 .207 


3 


is 


rt 


CROSS-LAGGED SIMPLEX MODEL 331 
Table 7 
Predicted Cross-Correlations Between Listening and the Intellectual Composite in 
Grades 5, 7, 9, and 11 
Composite 
2-year lag 4-year lag 3-year lag 

Listening 5 7 9 11 5 7 9 11 5 7 9 11 

5 -764  .806 .771 .750 440 .782 .797  .775 .752 .794  .784  .763 

7 .765  .807  .834 .812 141  .782  .798 .828 153  .795  .830 .808 

9 .706 .746 .776 .804 085 272359012738 "760 .696 .735  .770 .747 

11 .662 .700 .723  .747 642 .678 .691 .718 .653 .689 .719 .742 
Note. For the 2-year lag, 24 = — .211 and 2d? = 006793; for the 4-year lag, Zd = — .041 and 2d? = 


.006089 ; for the 3-year lag, 24 = — .103 and Zd* = .003513. 


error specifics in the Listening test and in the 
composite. Specific variances for each of the 
16 tests were estimated by subtracting the 
squared multiple correlation between an indi- 
vidual test and the other 15 from the esti- 
mated reliability of the test. This was, of 
course, done at each grade level. The four 
values of s? for Listening, which are pre- 
sented in Table 5, determined their obverses, 
the four entries in the H matrix, by means 
of the relationship й = (1 — s?)*. Specific 
variances of the composite at the four grade 
levels were set at zero on the basis of the 
size of the specifics of the individual tests in 
the composite relative to their reliabilities. As 
communalities and reliabilities are increased 
to those for a test 15 times as long as any 
one component, the communality of the com- 
posite approaches very closely its reliability, 
which in turn approaches unity. Because the 
intercorrelations are high, specifics tend to 
be much smaler than measurement error. The 
result is that specific variance in the com- 
posite is trivial in size. It is seen in Table 5 
that with an increase in grade and age there 
is a decrease in the estimated specificity of 
Listening. 

The results of these multiplications are 
presented in Table 7 for Lag 1 (2 years), 
Lag 2 (4 years), and an approximation? to 
what a 3-year lag would have produced if 
the data had been available. In the note un- 
derneath the table are descriptive statistics of 
goodness of fit, namely, the algebraic sum 
of deviations and the sum of squared devia- 


tions between obtained and estimated values 


for all of the cross-lagged correlations. The 
discrepancies between obtained and predicted 
synchronous correlations in the diagonal are 
ignored in the summations because the model 
requires a correction for the correlated mea- 
surement error that inflates these correlations. 
These components cannot be estimated inde- 
pendently, at least for the present. 

In the absence of a statistical test of good- 
ness of fit one can only conclude that none 
of the three lags provides a really poor fit and 
that the approximation to a 3-year lag is 
clearly the most accurate of the three in terms 
of the size of the squared deviations. The 4- 
year lag, on the other hand, is the most ac- 
curate in terms of the size of the constant 
error, The negative sign of this quantity indi- 
cates that the model overestimated the ob- 
served correlations. If greater weight is 
placed on the squared deviations as a mea- 
sure of goodness of fit, and, as a result, the 
3-year lag is selected as the most likely one, 
there are two possible explanations for the 
constant error. Specific variances may have 
been slightly underestimated in Listening or 
in the intellectual composite. Values for the 


з The equivalent of the first factor that might 
have been extracted at a 3-year lag if the tests had 
been administered was obtained by taking the square 
root of the mean squared factor loadings on the 
initial factor extracted for the 2- and 4-year lags. 
From these all other factor loadings can be deter- 
mined. The resultant is a factor matrix of five col- 
umns from which the first two are dropped and a 
column of zeros added for purposes of the final 
matrix multiplication. 
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latter, it will be remembered, were set equal 
to zero. There is, however, a clear-cut choice 
between a zero lag and one of from 2 to 4 
years. For the former, Xd — —.24036 and 
Xd* — .015351. In all probability, then, the 
lag between changes in individual differences 
in Listening and in the intellectual composite 
is between 2 and 4 years during the develop- 
mental period represented in these data. 

We have no theory to support a possible 
lag of 3 years between individual differences 
in aural comprehension and in an intellectual 
composite that both requires reading and 
includes widely assorted information. The 
reading disability literature, briefly reviewed 
by Atkin et al. (1977b), is perhaps relevant. 
Also, if one accepts extrapolation as a low- 
level form of theorizing, it seems reasonable 
to assume that the 4- or 5-year lag between 
aural and visual comprehension of language 
between the ages of 2 and 6-7 becomes 
smaller with increasing age and education. Tt 
is surprising, as a matter of fact, to find that 
the lag remains as great as it seems to be in 
these data, which cover the period between 
age 11 and age 17. 

It is more difficult to be specific with re- 
spect to possible causal inferences that might 
be drawn from the cross-lagged difference and 
the model that describes it. It is possible that 
long-term, continuing experimental manipu- 
ation of attentive behavior would affect 
scores on the Listening test with zero lag and 
have an effect on scores on the intellectual 
composite about 3 years later. It is also pos- 
sible, however, that the causes of change in 
Listening are beyond any experimental ma- 
nipulation in any one generation and that 
these more basic causes affect test scores that 
require visual comprehension about 3 years 
after the effect on Listening. It does seem 
safe to conclude that Listening taps the causes 
of intellectual development about 3 years 
earlier than the usual printed intellectual tests 
between Grade 5 and Grade 11 whether 
changes in Listening are directly causal or 
merely anticipate other effects. 

The discrepancies in the diagonal would be 
adequately described by correlated error fac- 
tor loadings of about .20. These loadings do 
not seem large relative to the dozen or so 
possible sources that can be named offhand. 
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These sources are especially potent when 
testing is done in groups widely scattered 
geographically with several different test ad- 
ministrators and at somewhat different dates. 
The test administrators, the examinees, and 
the measurement settings represent classes of 
possible sources. Unfortunately, no indepen- 
dent estimate of their impact is available. 
For the present their assessment is com- 
pletely circular; that is, they are assessed 
through discrepancies between estimates and 
observations. 


Inferences From the Model 


The simplex process model was introduced 
by means of an arbitrary example in which 
stationarity was observed for the two varia- 
bles and for the four occasions by fixing the 
correlation between all true scores for adja- 
cent occasions at .96. When the model was 
applied to data that involved the Listening 
test and an intellectual composite, the esti- 
mate of this correlation for Listening was 
.955 and for the composite was .968, which 
represented only a small departure from sta- 
tionarity. Also, to obtain unique estimates of 
the reliabilities of the measures on each oc- 
casion, it was necessary to assume stationarity 
of true score regressions from occasion to 
occasion; that is, stationarity was forced on 
us by the limitations of the data. 

The model is, however, more general than 
the above discussion indicates. Five occasions 
would allow estimation of two regressions of 
one occasion on another. Extrapolation would 
be required for the other two, but this would 
be less arbitrary than equating all at the 
same level. Also, the near equality of the 
true score regressions for the two variables 
was not required. The model allows for dif- 
ferences. For example, if correlations of .96 
and .94 for X and Y, respectively, had been 
assumed for the original illustration, a zero 
time lag would not have produced a sym- 
metrical matrix of cross-correlations, ‘The 
necessary intercorrelations, factors, and cross- 
correlations are shown in Table 8: R, is on 
the left, F,, is in the center, and F,F’,, is on 
the right. Although there are numerical dif- 
ferences between the cross-lagged correla- 


tions, these are spurious, being produced by . 


] 
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Table 8 
« Effects of Reducing the True Score Correlation Between Occasions of Variable Y 
Cross-correlation 
Intercorrelation Diagonal factor (zero lag) 

Occasion 1 2 3 4 1 2 3 4 1 2 3 4 

1 1.000 .940 .884 .831 1.000 .000 .000 .000 1.000 .940 .884 .831 

2 940 1.000 .940 .884 .940 .341 .000 .000 960 .998 .938 .882 

3 .884 .940 1.000 .940 .884 .320 .341 .000 .922 .958 .996 .937 

4 .831  .888  .940 1.000 .831 .302 .320 .341 .885 .920 .957 .995 


* the lack of stationarity from X to У. The di- 


agonal factoring describes accurately the 
amount of change from one occasion to an- 
other for each variable independently of the 
other and therefore takes into account the 
departures from stationarity. 

Although stationarity is not required by the 
model, one must be able to estimate reliabili- 
ties and specificities. To fit both intercorrela- 
tions and cross-correlations, separate esti- 
mates are required. If one is interested only in 
the cross-correlations, only uniqueness esti- 
mates are necessary. With four occasions, 
reliabilities are estimated from intercorrela- 
tions with only one degree of freedom. With 
less than four occasions reliabilities must be 
estimated independently of the intercorrela- 
tions. Since a simplex process is presumably 
involved, correlations between any two oc- 
casions when corrected for attenuation must 
be less than unity. Estimation of specificity 
is a more difficult problem. If there are only 
two variables, estimates will likely be quite 
inaccurate. The two synchronous correlations 
furnish the only information concerning speci- 
ficity and its obverse, communality. 

Stationarity from occasion to occasion may 
be a reasonable assumption for a great many 
data, but it is probably unreasonable during 
periods of rapid development of the organism. 
Over a long enough time span off-diagonal 
correlations will probably show a pattern of 
increasing size. Stationarity from one mea- 


“sure to another is likely to be less tenable, 


since growth even in related functions can be 
at different rates. Kenny’s (1975) discussion 
of the problem, as viewed from the present 
model, misplaces the emphasis. Knowledge of 
reliability and specificity is primary; sta- 


tionarity is required for a particular method 


of correcting for changes in uniqueness, The 
correction for uniqueness he describes is en- 
tirely adequate if the simplex process is 
stationary for both variables and occasions. 
Lacking stationarity, however, residual dif- 
ferences will be in part spurious. 

The present model also provides insight 
into the selection of variables to be used in 
a cross-lagged analysis. The measures must 
not only be reliable but they must have some- 
thing in common. One is entitled to be un- 
easy about the cross-lagged comparisons in 
Eron, Huesmann, Lefkowitz, and Walder 
(1972), even though the two cross-correla- 
tions differ significantly from each other. In 
their data the synchronous correlations are 
too close to zero for any feeling of comfort, 
and the larger of the two cross-lagged cor- 
relations is still quite small. Their measures 
appear to be highly unreliable, highly spe- 
cific, or both. There is, however, a mitigating 
circumstance. Only two time periods were 
represented, and these were 10 years apart. 
One additional intermediate occasion would 
be required as a minimum to resolve these 
doubts. Although one can say with confidence 
that no correlation in a simplex matrix 
should ever be zero, over extended periods of 
time some of the correlations among measures 
of moderately stable traits will closely ap- 
proach zero. 
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x The Continuing Misinterpretation of the 
Standard Error of Measurement 


Frank J. Dudek 
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LÀ regressed toward the mean. 


The standard error of estimate, termed the 
standard error of measurement (i.€., 7120) where 
the subscript 1 indexes an observed test score 
and © indexes a presumed érue score and the 
order of the subscripts implies that X1 is 
predicted from X), seems frequently to be 
misapplied because many sources tend to 
promote the erroneous notion that an interval 
27. X1 = cy, includes the true scores of approxi- 
| mately two thirds of those obtaining scores of 

X; (Educational Testing Service, 1977, p. 16; 

Lemke & Wiersma, 1976, p. 79; McLaughlin, 

1964, p. 18). 
| A common model (Guilford, 1936, p. 413) 
for representing the variables involved when 
Observed scores, true scores, and reliability are 
„of concern can be represented by X1 = Хе +e 
and Xr = X, + E, where X; and X; indicate 
Observed scores (e.g., on alternate forms ofa 
| test), Xe indicates a true score value under- 
| lying the observed scores, and e and Е repre- 
| sent errors of measurement. It is assumed that 
| (a) true and error components are independent 

* so that res = rg, = reg = 0, (b) the expected 

value of error components is zero, and (c) 
| uU = cg. As oè and eg? are measures of error 


Requests for reprints should be sent to Frank J. 
Dudek, Department of Psychology, 229 Burnett Hall, 
University of Nebraska, Lincoln, Nebraska 68588. 


Monographs, texts, and guides designed to inform readers about the meanings and 
interpretations of test scores frequently misinform instead, because the standard 
error of measurement is misapplied. The standard error of measurement, 
| c(l — ri), is an estimate of the variability (ie, the standard deviation) 
. expected for observed scores when the true score is held constant. To set con- 
Я fidence intervals for true scores given an observed score, the appropriate stan- 
dard error is that for true scores when observed scores are held constant and 
estimated by о [71201 — r17) ]3; and the interval is around the estimated true 
| score rather than around the observed score. Except in the case of perfect 
| reliability, the estimated true score is not the observed score, but is a value 


variance, the square root of this variance is the 
standard error of measurement, сл, defined 
above. Under this model it folows that 
ср op = Gu? + cis; that is, the observed 
variance of a set of test scores is made up of 
true variance and error variance. Reliability 
is given by nu = о.2/о1? and indicates the 
proportion of observed score variance that is 
true variance. 

The point that needs to be made about 
observed scores, true scores, and reliability is 
that one must distinguish between three 
different standard errors of estimate, each 
associated with one of three prediction situa- 
tions that might be considered. Using deviation 
scores (Le, x = X — Mx) for simplicity, 
using $ to indicate a predicted value, and 
noting that rir = 71°, the three standard errors 
of estimate and the associated prediction to 
which each applies are 


ew = n -n2)!-an-nn): 


diem. (1) 


Toll 72) = ei[rui(1 — n2]: 
Bq = rne (2) 


Tol 


ed — n?) 


ou 
Ф = "ит. (3) 


The interpretations of all standard errors 
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are analogous; thus, assuming homoscedasti- 
city, (a) eis is the standard deviation of 
observed scores if the true score is held con- 
stant, (b) oo: is the standard deviation of 
true scores if the observed score is held 
constant, and (с) eir is the standard deviation 
of X; scores if Ху scores are held constant. (It 
can also be noted that си? = cis? + пат.) 

The standard error of measurement is 
commonly reported along with estimates of 
riz, and it is important to do so because ci? 
is a measure of the amount of error variance 
that obtains in a set of observed scores. 
Furthermore, cx, as an indicator of measure- 
ment error tends to stay constant across 
populations, whereas ri; varies in magnitude 
depending on the heterogeneity (ie., range 
of talent) represented in the group. Both the 
reliability coefficient and the standard error 
of measurement provide useful, descriptive 
information. 

But the interpretation that an interval that 
extends one standard error of measurement 
above and one standard error of measurement 
below an obtained score will include the true 
scores of approximately two thirds of the 
individuals who received this obtained score 
(as implied in the references cited earlier) is 
in error. The reason is that the prediction 
implied here is that of a true score given an 
obtained score, and as noted by the prediction 
in Formula 2, the predicted value of the true 
score is not the observed score itself, but is an 
estimate regressed toward the mean. The 
appropriate standard error of estimate in this 
situation would be сат, not от. 

To illustrate for a set of values in which 
из = ш = 500, o1 = от = 100, and ri; = .90, 
one finds ci, = 31.623, саз = 30.00, and 
eir = 43.589. For all persons who scored 700 
on this test, we would infer that the average 
Ge., predicted) true score for these persons 
is 680 and that about two thirds of their true 
scores lie in the interval 680 + 30, or between 
650 and 710. (On a retest of these individuals 
one would expect to find two thirds of their 
retest scores in the interval 680 + 43.6.) 

Standard reference texts (eg., Guilford, 
1936, 1954; Lord & Novick, 1968; Nunnally, 
1978) either make or imply the distinctions, 
of course; but their cautions, caveats, and 
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admonishments are often unheeded, ignored, 
or misinterpreted in the applied literature. 
Guilford (1936) included a footnote: 


Too often one finds the interpretation of a c; mis- 
stated. For a given score of 50 when ci. is 4, one is 
likely to read the interpretation that “the probability 
is two-thirds that the true score lies between 46 and 
54." The latter statement implies the prediction of X4 
from Xi. (p. 414) 


On the next page he suggested, “It is correct 
practice to speak of ал. as the standard error 
of a raw score and of с as the standard error 
of a true score” (p. 415). 

Lord and Novick (1968, pp. 67-68) provided 
formulations equivalent to our three formulae 
for the three standard errors of measurement 
and suggested naming the various standard 
errors as (a) the standard error of measure- 
ment, (b) the standard error of estimation, 
and (c) the standard error of prediction. 

Nunnally (1978), in discussing the standard 
error of measurement, emphasized that 


One can use it to set confidence zones for obtained 
scores, but in so doing one must understand that such 
confidence zones are not symmetrical about the obtained 
score. Thus, although it usually is done in practice, it is 
incorrect to set the 95 percent confidence zone as 
equaling two standard errors of measurement below 
and two above the obtained score. (p. 218) l 


The first reference to scores in this quotation 
should have been to true scores rather than 
to obtained scores. Using the standard error of 
measurement in the situation in which true 
score intervals are inferred from obtained scores 
will not lead to serious error (providing, of 
course, that regression is taken into account), 
inasmuch as the standard error in Formula 2 
will be less than the standard error in Formula 
1 although their values are reasonably close to 
one another when reliability is high, as seen 
in the illustration above. Using the standard 
error in Formula 1, then, even though the 
standard error in Formula 2 is appropriate 
will lead to a somewhat liberal interval. But if 
one desires to set confidence intervals for, 
obtained scores (say on a retest), then the 
appropriate standard error is eir, and using the 
standard error of measurement in such a 
situation could lead to a serious underestima- 
tion of the interval. 


In summary, the standard error of measure- 
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ment is an estimate of the variability (ie., 


W “standard deviation) of observed scores given 


a true score and is clearly inappropriate for 
the situation in which one sets confidence 
limits for true scores given a fallible, obtained 
score. For the latter situation one requires 
the standard deviation of true scores when 
the observed score is held constant. This 
standard error is indicated by Formula 2. 
Equally important is to recognize that the 
estimated true score, given an observed score, 


is a value regressed toward the mean, and any 


confidence interval for true scores will be 
symmetrical around this regressed value for 
the true score, not around the observed value. 


337 


References 


Educational Testing Service. 1977-78 Guide to the Use 
of the Graduate Record Examinations. Princeton, N.J.: 
Author, 1977. 

Guilford, J. P. Psychometric methods. New York: 
McGraw-Hill, 1936. 

Guilford, J. P. Psychometric methods. New York: 
McGraw-Hill, 1954. 

Lemke, E., & Wiersma, W. Principles of psychological 
measurement. Chicago: Rand McNally, 1976. 

Lord, F. M., & Novick, M. R. Statistical theories of 
mental test scores. Reading, Mass: Addison- Wesley, 
1968. 

McLaughlin, В. Е. Inter pretation of test results. Washing- 
ton, D.C.: U.S. Government Printing Office, 1964. 
Nunnally, J. C. Psychometric theory. New York: 

McGraw-Hill, 1978. 


Received December 2, 1977 в 


Psychological Bulletin 
1979, Vo . 86, No. 2, 338-348 


Detecting Cyclicity in Social Interaction 


John M. Gottman 
University of Illinois at Urbana-Champaign 


This article reviews spectral and cross-spectral analytic methods for detecting 
cyclicity, cross-cyclicity, and lead-lag relationships in continuous data derived 
from the observation of dyadic interaction. It is found that lead-lag relationships 
can be assessed using the phase spectrum. Spectral analytic methods are then 
generalized to categorical observational data, and it is shown that by these 
methods one can derive the classical information theory definition of social com- 


munication and its distribution statistics. 


Researchers who study social behavior are 
discovering that there are occasions when 
cyclical patterns characterize dyadic inter- 
action, and thus they are searching for sta- 
tistical techniques that can detect these cycles. 
The spectral analysis of time-series records was 
briefly suggested by Luce (1970) as a useful 
technique for the study of biological thythms 
such as heart rate, respiration, REM sleep, 
and other cyclic biochemical and physiological 
Processes. However, spectral analysis is not 
widely known to behavioral scientists, and 
it has yet to be used in the study of social 
interaction. A recent exception is the work of 
Hayes and Cobb (Note 1), who observed 
couples living in a laboratory setting, analyzed 
cycles of talk and silence using spectral analysis 
of time-series records, and related an observed 
cycle to human circadian rhythms. 

Researchers who study dyadic social inter- 
action are also interested in the bivariate case 
in which two time-series records are obtained, 
one from each of the two interacting orga- 
nisms; the research question often involves 
the search for cycles in cross-correlations be- 
tween the two series. For example, Kendon 
(1967) reported that when two people con- 
verse, the cycles of gaze and gaze aversion 
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interlace, much as do sine and cosine waves. 
People are out of phase in eye-to-eye contact 
as a function of who is speaking; in particular, 
when a person begins speaking he or she looks 
away from the listener and begins increasing 
eye-to-eye contact time toward the end of the 
speech, which acts as an implicit signal for 
the listener to begin looking away and speaking. 

Another example of cross-cyclicity is the 
work of Brazelton and his associates (e.g., 
Brazelton, Koslowski, & Main, 1974). Tronick, 
Als, and Brazelton (1977) studied mother- 
infant interaction and reported that the 
infants looked away following periods of 
maximum involvement with the mother and 
after a rest period became engaged again. 
Tronick et al. calculated synchrony and dis- 
Synchrony as running correlations between 
scaled scores of involvement, from maximum 
positive involvement to maximum negative 
involvement, but did not employ cross- 
Spectral time-series methods. Cross-spectral | 
analysis would have been an appropriate 
technique for studying both synchrony and 
lead-lag relationships between two time series 
in the Tronick et al. study. 

Cross-spectral analysis may have consider- 
able promise for studying interacting physio- 
logical systems within an organism. For 
example, Porges and his associates (Porges, 
Bohrer, Keren, Cheung, & Franks, Note 2). 
are using cross-spectral methods to study the 
linkage between respiration and heart rate. 
A function called coherence obtained from 
cross-spectral analysis is the equivalent of the 
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square of the correlation between the two 
Ws physiological systems as a function of their 
relative lag. Porges et al. found that the 
coherence between respiration and heart rate 
is related to cognitive attentional processes. 
Hyperactive children had low coherence be- 
tween respiration and heart rate; low doses of 
methylphenidate had positive influence on 
cognitive performance and social behavior, 
whereas higher doses often resulted in lethargy. 
Porges and his associates are testing the model 
that deficits in linkage between the respiratory 
pa oem and the cardiovascular systems are 
related to the attentional problems of hyper- 
active children and that low doses of methyl- 
phenidate mediate to increase the coherence 
between systems, thereby affecting cognitive 
# functioning. 

Because time-series techniques are not 
widely known to psychologists, this article 
3 reviews the spectral and cross-spectral analysis 
of continuous data. The present research also 
derives the new result that the slope of the 
phase spectrum of any two stationary processes 
can be used to detect lead-lag relationships. 
Lead-lag relationships are useful in making 
inferences about which series is, in some sense, 
driving the other. One application of lead-lag 
l^^^ relationships is a redefinition of the concept 
of dominance in social interaction as an 
asymmetry in predictability in the time domain 
(Gottman, in press). This definition of domi- 
nance using cross-spectral analysis subsumes а 
ø range of observations about dominance across 
species. For example, the beta male in a group 
of monkeys is more responsive to the behavior 
„об the alpha male than conversely (Maslow, 
1936); that is, the behavior of the beta male 
is more predictable from past behavior of the 

alpha male than conversely. 
Most researchers of social interaction collect 
categorical rather than continuous observa- 
tional data (e.g., Hutt & Hutt, 1970; Lewis & 
Rosenblum, 1974). There are currently no 
statistical techniques for detecting cycles in 
»опе sequence and cycles between two sequences 
for categorical data over time. Categorical 
data collected over time can always be trans- 
formed to continuous time-series data; for 
example, for every block of k time units the 
local probability of each category can be 
computed, which produces a continuous 
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variable for each category. For a discussion of 
categorical data types in observational re- 
search, see Gottman and Bakeman (in press). 

In this article, I derive extensions of spectral 
time-series methods to categorical data. One 
result of these extensions is the derivation of 
the commonly used information theory defini- 
tion of communication, summarized by Wilson 
(1975) as follows: 


Communication has been defined as the process by 
which behavior of one individual alters the probability 
of behavioral acts in other individuals . . . . In words, 
the conditional probability that act X» will be per- 
formed by individual B given that A performed X is 
not equal to the probability that B will perform Ха in 
the absence of Ха. (р. 194) 


This is an important definition for the 
study of sequences in social interaction because 
it suggests the notion that a behavior in one 
organism has social communicative value to 
the extent that it reduces uncertainty in 
predicting the behavior of another organism. 
This definition is now widely used to detect 
sequences in dyadic interaction (for reviews, 
see Gottman & Bakeman, in press; Gottman 
& Notarius, 1978; Sackett, 1977). 

Another result of the extension of spectral 
time-series methods to categorical data in this 
article is the demonstration of the validity 
(and limitations) of a statistical test of signifi- 
cance between conditional and unconditional 
probabilities recently suggested by Sackett 
(1977). After the information theory definition 
of communication is derived, spectral and 
cross-spectral methods are used to suggest 
how lead-lag relationships and cycles can be 
detected in categorical time series. 


The Continuous Case 


Granger and Hatanaka (1964) noted that 
the first time series subjected to spectral 
analysis were those that had a cycle with one 
dominant frequency, such as the 11-year 
oscillation in sunspot data and the annual 
cycle in meteorological data. They wrote, 


Tt was felt that if one could determine the amplitude 
period and phase of a sine curve sufficiently accurately 
and subtract this from the data, then the remainder 
ought to be an independent, random series. When, in 
fact, this was done and the remainder was still found 
to be somewhat too smooth, it was natural to re-use 
the current predominant idea of the cause of the 
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smoothness and to look for yet further sine curves to 
fit to the data. (pp. 4-5) 


The model for a time series, X;, was therefore 
a weighted sum of sine and cosine curves with 
an uncorrelated random remainder; if the 
number of observations, n = 2q + 1, is odd, 
one can write 


9 
X, — Ao + У; (А, соѕ 2rfit 


= 
+ B;sin2zfi) + e, (1) 


where f; = i/m is the ith harmonic of the 
fundamental frequency 1/n. Fourier analysis 
makes it possible to derive least squares 
estimates for the coefficients : 


Á-X-ix. 
curd 
A; = - Y, X: cos 2rfil; 
nu 
2 n 


B; = = У X, sin 2rfit. 
Пил 


This decomposition of a time series into 
component frequencies met with some initial 
success. For example, Whittaker and Robinson 
(1924) showed that the brightness of a variable 
star could be decomposed into two component 
frequencies, and they thus determined that 
the variable star was a binary star. 

It would be useful to have some function 
that peaked at frequency bands that made 
major contributions to the variance of the 
series. For an infinite number of observations, 
the variance of the series at each frequency, fi, 
is called the spectral density function, f. For 
a sample of n points it is called the periodgram: 
I(f) = (1/8r)(42 + Bj). Because the sine 
and cosine terms in Equation 1 form an ortho- 
gonal set of functions, it can be shown that 
the variance of the time series is partitioned 
into independent parts by the periodogram : 


1 S 
22 (X, — X) = 7,2 109. 


Early work on the spectral analysis of time 
Series suggested that the periodogram was 
precisely the function that would peak at 
frequencies that contributed major portions 
to the variance of the time series; in fact, 
Schuster (1898) suggested that the periodo- 
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gram be calculated and that its peaks be used 
to detect cycles. Subsequently, problems with 
spurious peaks led to the construction of 
significance tests for the periodogram (for a 
review of these tests, see Jenkins & Priestley, 
1957). However, these tests were not adequate 
because the periodogram has some very poor 
statistical properties. 
Tf the sample autocovariance at lag k is 
defined as 
1 n—k 
C, = - У XX, 
n tm} 
then C, is an unbiased estimator of the popu- 
lation autocovariance (Hannan, 1967), and 
it can be shown (Box & Jenkins, 1970, p. 45) 
that the periodogram is given by 


n=l 


У С, cos 2rfk), 
k=l 


where 0 X f < $, which expresses that the 
periodogram is the Fourier transform. of the 
sample autocovariance function. This implies 
that the periodogram is also easily calculated 
from the sample autocovariances; thus at 
first the problems of spectral time-series 
analysis appeared to be solved. 

Unfortunately, although the periodogram 
does converge to the spectral density function, 
J, it does not converge uniformly; that is, its 
variance around f does not decrease to zero 
as n, the number of observations, increases 
(Hannan, 1967, pp. 52-53). In fact, Bartlett 
(1948) showed that the limit of the variance 
of the periodogram as э increases is aff’, where 
о" is the variance of the series. The failure 
of the periodogram led Tukey (1967) to make 
the following reflection: 


I) = z- (Ci 2 


If we dealt with problems involving the superposition 
of a few simple periodic phenomena, as do astronomers 
interested in binary stars and related problems, we can 
learn much from the periodogram. Sadly, however, 
almost no one else has this kind of data. As a result 
the periodogram has been one of the most misleading 
devices I know. (p. 25) 


À dramatic illustration of Tukey's point is the 1 


periodogram of a series of random numbers, 
called white noise. White noise, like white 
light, 1s composed of all frequencies with equal 
intensities, and therefore its periodogram 
should be a straight line. Jenkins and Watts 
(1968) showed that the periodogram of white 
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noise is not only not a straight line but con- 

* tinues to oscillate wildly as the number of 
observations is increased. However, the spuri- 
ous peaks of the periodograms of each sample 
of white noise occur in random places on the 
frequency domain, and this provides the key 
to solving the problems of the periodogram. 
The average of many periodograms obtained 
from many samples of the same white noise 
process in fact tends toward a straight line as 
the number of observations in each sample 
increases. 

/"* This observation led Bartlett (1948) to 
suggest that the time series can be segmented 
and that a periodogram can be averaged across 
all segments. Bartlett showed that the averaged 
periodogram would coverage uniformly to the 

# spectral density. Jenkins (1967) demonstrated 

that Bartlett's suggestion is equivalent to 

estimates of the form 


ff) = zie 42 = A(f)C; cos Afi]. (2) 


The function of A(f;) is called a spectral 
window, and it weights the autocovariance 
* function to ensure uniform convergence. 
Parzen's (1967) result is important because 
4 to implement Bartlett’s suggestion would 
require an extremely long time series, whereas 
Jenkins's suggestion can be implemented with 
Shorter time series, assuming that the window 
weighting function is suitably chosen. The 
most commonly used spectral window is the 
9 Tukey-Hanning window (Blackman & Tukey, 
1958): А; = 1+ cos (тј/т), where m is an 
arbitrary integer, usually chosen so that 
= « n/3. (See Parzen, 1967, for a discussion 
of various spectral windows.) Thus a weighted 
Fourier transform of the autocovariance 
function does converge uniformly to the 
spectral density. Jenkins and Watts (1968) 
showed that the distribution of the intensity 
# estimates at each frequency of the periodogram 
"will be very nearly a Ху regardless of the 
_flistribution of the [time-series] process” 
233). For the Tukey-Hanning window, the 
equivalent degrees of freedom must be modified 
(Granger & Hatanaka, 1964, pp. 59-64; 
Jenkins & Watts, 1968, pp. 248-257). In this 
article the term spectrum refers to the weighted 
periodogram. 
An illustration of the spectrum may clarify 
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its relationship as the Fourier transform (with 
an appropriate spectral window) of the auto- 
covariance function. If the time series is a 
second-order autoregressive process, 


Xi = фХа + AXi + En (3) 


where e, is an uncorrelated, random series and 
Ф2 + 4% < 0, then the behavior of the series 
will appear periodic. The constraints on фі 
and $» occur because periodicity only occurs 
when the roots of the characteristic equation 
of the process are imaginary (Box & Jenkins, 
1970, p. 59). Note that this time series will not 
be deterministically periodic, as is a sine wave; 
there is a random component to the periodicity. 
In this case the autocovariance function will 
be a single-frequency damped sine wave.’ 
Figure 1 isa plot of the autocorrelation function 
and spectrum of a simulated second-order 
autoregressive model. The spectrum shows 
only one peak?; a fourth-order autoregressive 
process is capable of representing a process 
with two peaks, and so on. 

The relationship between the autocorrelation 
and spectrum of the process represented by 
Equation 3 is intuitively clear. If the time 
series is periodic, the autocorrelation should 
increase at multiples of the period. For 


1The expression for the theoretical autocorrelation 
function is 
_ [sgn (61) Pat sin Qnfok + F) 
а, 
where sgn = +1 if фі is positive and sgn = —1 if 
dn is negative. The factor d is called the damping factor, 


fo is called the frequency, and F is called the phase. 
These factors are related to the model parameters 


as follows: 
d = (Ф) sgn (1) J; 
c ЕН 
cos 2zfo = =) $9! 
1+а 
tan F = 1 m Info. 


2The spectrum of a second-order autoregressive 
process can be written in closed form as 


DU) = 202/1 + i + Ф — 241(1 — Ф) cos 2af 
—2¢» cos Ат] |, 


where 0 € f < à. The spectrum reflects the periodic 
behavior of the second-order autoregressive process 
when the roots of its characteristic equation are 
complex. 
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Figure 1. Autocorrelation function (rx is the autocorre- 
lation at lag k) and spectrum, /(/;), of one realiza- 
tion of a second-order pseudoperiodic time series. 
(Xe = WAX — SXi- + e) 


example, for monthly wholesale wheat prices, 
the correlation between months 12 months 
apart (June with June, July with July, etc.) 
should be higher than that between months in 
different seasons. This relationship should fall 
off across years, and so the autocorrelation 
should resemble the damped sine wave in 
Figure 1. Since there is only one 12-month 
cycle, one would expect the spectrum (the 
weighted Fourier transform of the auto- 
covariance function) to show only one peak. 
If the series had two cycles, the autocorrelation 
function would appear similar in shape (but 
more complex), and the spectrum would have 
two peaks, 

Note that one cannot reconstruct the original 
time series simply by knowledge of the spec- 
trum. This is true because very different series 
can be produced simply by adjusting the 
relative phases of the component frequencies; 
the phase of a sine wave determines its ampli- 
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tude at time zero. Phase is a particularly 
important concept in the bivariate case. 

For two time series, the generalization is not 
difficult. In fact, the Fourier transform (with 
suitable window) of the cross-covariance 
between the two time series is the cross- 
spectrum. The cross-spectrum has several 
components, a phase spectrum, and a cross- 
amplitude spectrum. The phase spectrum 
indicates 
whether the frequency components of one series lead 
or lag the same frequency components in the other 


series, and the cross-amplitude spectrum shows whether 


the amplitude of the component at a particular fre- 
quency in one series is associated with a large or small 
amplitude at the same frequency in the other series. 
(Jenkins & Watts, 1968, pp. 342-343) 


The coherence is a function similar to the 
square of a correlation coefficient and is defined 
as the ratio of the square of the cross-spectrum 


divided by the product of the spectra of the^ 


individual series*; for two series, X, and Y, 


|fev (fi) E 


МОро 9 


Distribution properties of these functions are 
discussed in Jenkins and Watts (1968, chap. 


the Tukey-Hanning window are discussed in 
Granger and Hatanaka (1964, chap. 5). A 
coherence of one means that prediction is 
perfect from one series to another for all 
frequencies; a coherence of zero means that 
it is impossible to predict one series from the 
other. The prediction is of amplitude covaria- 
tions in the two series, with no indication of _ 
lead-lag relationships, so that a complete 
description of relationships requires the phase 
spectrum as well as the coherence. If the 
coherence has one major peak, then the bulk 
of the correlation between the two processes 
is confined to a particular frequency band. 
Tf it is essential to predict correlations at 
major frequency bands of series У,, the co- 
herence can be investigated at frequencies? 
that have peaks in the spectrum of У, An 


з An alternative approach for specifying the relation- 
ship between two time series in the time domain, as 
opposed to in the frequency domain, is called transfer 
ae analysis and is discussed by Box and Jenkins 

1970). 


a 


v 


" 


9); the properties for these functions with / 
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alternative, suggested by Porges et al. (Note 
Ns 2), is to compute one statistic called the 
~ weighted coherence, which is an estimate of the 

amount of variation in one series that can be 

accounted for by variation in the other: 


E heey (afe f)/Z fea fo 


They wrote, 


+ Conceptually the coherence may ђе thought of as a 
time-series analogue of the omega-squared . . . . or 
the amount of variance accounted for by the influence 

A; Sof one series on the other. Therefore, the coherence 
times the spectral density estimate of heart rate 
activity at each frequency... . . would describe the 
amount of heart rate activity which could be accounted 
for by respiration, i.e., the shared variance of heart 
rate and respiration. (p. 5) 


ТЕ the cross-covariance is C;,(/), the un- 
weighted cross-spectrum is the Fourier trans- 
form of the cross-covariance: 


| о 
Ja (f) = др È Cali), where #= (1). 


This complex number can be written as à real 
K part plus an imaginary part: fzy(f) = C + 0. 
The phase spectrum is defined as 


esf) = arctan 9; (5) 


Cis called the cospectrum and Q the quadrature 

spectrum, and they measure the covariance 

4 between in-phase and out-of-phase com- 
ponents, respectively. 

The slope of the phase spectrum determines 

,"Athe timelag and the lead-lag relationships 

‘between the two series. For example, if one 

time series, Х() = е(ђ, is white noise with 

variance c? and the other series is У() 

= X(t + L), then Г is the lead time and Y 

leads X by L time units later. Since X(t) is 

4 white noise, the covariance of X (/) and Y() is 


Сы = ELX()Y G +9] 
1 = E[X ()X G +/+ D] 
02 аб == 
= |0 otherwise 


Тће cross-spectrum is the Fourier transform 
of the cross-covariance. Assuming с = 1, this 
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Фа (0 


slope = -27L 


f, frequency 


Figure 2, Phase spectrum, ®:,(f), when У (t) leads X (/) 
by a constant time, L. 


gives 


12 
Л = Са) 
a о? 
ee TONS 
255 = (cos 2rfL — isin 2nfL), 
where i= (—1)!. The phase spectrum is 


given by 
Фуа —sin 2zfL 
ф.,(7) = arctan (2) = arctan (ти 27 ) 


= arctan (— (ап 27/1) = — 21/1. 


Therefore the phase spectrum will be a straight 
line that passes through the origin with 
negative slope proportional to the time lag, L 
(see Figure 2). 

More generally, lead-lag relationships can 
be estimated by testing the significance of the 
slope of the least squares linear regression 
approximation to the phase spectrum.* It is 
important to note that this method does not 
give complete information; X and Y may be 
periodic at a particular frequency and have a 
constant phase relationship at that frequency 


4The phase spectrum shown in Figure 2 can be 
shown to hold for any two stationary processes that 
differ by a constant time lag. Tf Y()) = X(t + L), then 
the Fourier transform of Y (/) is 


КУ = j туей = Í 7 X ( + па. 
Tf one lets u = t + L, then 
кој = [eorex (uydu = em [ехо 


-r 
FLY@] = FEX (03. 

Hence the Fourier transform of У(Е) is the Fourier 
transform of X (f) multiplied by the phase shift e~**, 
where $ = —wL. 
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but some other phase relationship at another 
frequency. The slope of the phase spectrum 
averages the lead-lag relationship across all 
frequencies, and it may be important in a 
particular investigation to determine the phase 
relationship between X and Y at specific 
frequencies of interest. One alternative dis- 
cussed by Granger and Hatanaka (1964) is a 
two-component model in which the frequency 
domain is divided in half and lead-lag relation- 
ships are assessed separately for slow and 
rapid components. For these calculations, com- 
puter programs are available in most universi- 
ties that have the University of California, 
Los Angeles biomedical series (Dixon, 1974, 
pp. 517-582, Programs 2T, 3T, and 4T). 


The Categorical Case 


In the categorical case two series, X, and У,, 
are set equal to one if the characteristics that 
they represent are observed and equal to zero 
otherwise. The unbiased estimator of the 
cross-covariance, lagged k units in time is 


po 7 
Cuy(k) = 50 G6 - Xu — Py 
1 
For categorical data, X and Y are the un- 


conditional probabilities, р, and p,, that X, 
and Y, are one in м — k observations, so that 


1 = 
Caulk) irn: aay Y (X, 7 223 (Vers = py) 


1 т nk 
OEE X= p х 
nok 


— pz X Vise + pop, (п — k)] 


Il 


dove 
= nee Х.У — р.п — k) 
— bib (n — k) + рар, (п — 5]; 


1 n—k 


"em x XY yr 
—bhp.n-—E) (6) 


The sum in Equation 6 is simply the number 
of lagged-& (1, 1) pairs. Note that by definition, 
the conditional probability that Y is equal to 
one given that X was equal to one & time 
units ago, p+(¥ |X), is simply the number of 


Czy (k) = 
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(1, 1) pairs at lag & divided by the number of 
occurrences of X = 1 in n — k observations. 
ТЕ one denotes the number of (1, 1) pairs at 
lag k as M.,(k), then, from the definition of 
the lagged conditional probability, it follows 
that x(Y| X) = Mz, (k)/p2(n — k). Therefore, 
the number of (1, 1) pairs at lag & is 


n—k 
Malk) = X: XY = b(Y|X)(93(n — k). 
1 
E 
Substituting this back into Equation 6 gives 


C, (0) = pilpe(Y|X) re bu, (ys $ 


as the categorical equivalent of the сгоѕѕ- 
covariance. 

This function is proportional to the informa- 
tion theory definition of communication , 
assessed as the difference between conditional 
and unconditional probabilities. 

To derive the distribution of the covariance, 2 
the variance of the covariance can be com- 
puted as follows : 


Ca (k) = pel pe(¥|X) — pv]; 
Слив) — 6 = рари(7|Х) — pap (Y | X) 
= &5(Y|X) — (101; 
var [C4,(5)] = p2{var [р (¥|X)]}. i 


Under the null hypothesis of no relationship 
between the two categorical time series, X: 
and Y, рЕ(У|Х) = pn and the variance of 
the unconditional probability of a dichotom- 
ous variable that is not autocorrelated is 
P. — p,)/m (Siegel, 1956, p. 40), where 
m =the number of observations used to 
calculate фу. For the covariance C;,(E), 
m — n — k, and the result is 


var [Czy(k)] = ргр,(1 — 5))/( — k). 


Since under the null hypothesis, C2,(k)/ 
SD[C., (k) ] is normally distributed with mean 
zero and unit variance (N[0,1]) (Box & 
Jenkins, 1970), one has 


C, (E) 6 Фр) "m Ра] 
SD[C.Q)] Граф — p,)/(n — 9] 
~N (0, 1); 
Фе (у) — py 


Z - [d — 29/0 — 0 
~N(0,1). (8 


F 


] 
This is а derivation of a statistic that was 


Ex recently proposed by Sackett (1977). 

An estimate of the error introduced in 
Equation 8 by autocorrelation in each series, 
under the null hypothesis of no cross-corre- 
lation, can be obtained by using the expression 
for the variance of the cross-correlation under 
the null hypothesis given by Box and Jenkins 
(1970, p. 377): 


F аср lt ОЛ 


= AUN. 

nium ; t 9. 
Thus, the variance of the cross-correlation 
would be 1/(n — k) if there were no auto- 
correlation. To estimate the quantity à, rewrite 
the autocorrelations using 


r2 (E) = С (0/6. (0): 


# 


d k 
б= E rar) 


1 


1 = 5 E 
= Cn) Y Ca G)Cu )- 


Є Now substitute the quantity for the covariance 
from Equation 7: 


5 a Ў Срба) — pel 


a [27 
54 — рођ — 
x Lely) = ћи]. 


If one assumes that the quantity in the sum 
decreases exponentially with increasing lag 
and one denotes 


А 9 = hea bun 61») — 2 
then 

1 у 6 

а — py) p) 0—0. 


Delta is a maximum when the conditionals are 
one and a minimum when the conditionals 
equal the unconditionals: 


б = 


Ei 


^E 
ба = 0. 


N-. o 2 ш. 
m= 1— (— 294 =p)’ 
The cross-spectral density function for 
categorical data can be written as the Fourier 
transform of the cross-covariance (weighted 
by a suitable window), and this function will 
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behave in a fashion similar to the continuous 
case. The generalizations are obtained by 
applying Equation 2 to Equation i. The 
cross-spectrum is 


fal fi) = CrON 


п-1 
T 2X NC) cos 27]. 
= 
The spectrum of X, is 


flf = ара рм) 


n-l * 
+2 Z МОС.» (1) cos 27/1. 
Си 
Тће spectrum of У, is 


ћи) = һа X PN) 


2E 000 cos nfi 


The lambdas are the Tukey-Hanning weights 
(Blackman & Tukey, 1958). 

To summarize, Equation 7 is the categorical 
equivalent of the cross-correlation, and if 
X = У, of the autocorrelation. If cyclicity 
exists in a series of categorical data with one 
major cycle, then Czy (k) should behave as a 
damped sine wave of Figure 1, and the spec- 
trum should show one peak. An examination 
of the spectrum, which is the weighted Fourier 
transform of Equation 7, reveals major cycles 
in the categorical series. The coherence and 
phase spectrum are similarly generalized, and 
the slope of the phase spectrum detects lead- 
lag relationships that span all component 
frequencies. Computational, all these sta- 
tistics can be calculated simply by inputting 
each series as a binary zero-one time series. 

To illustrate the relationship between con- 
tinuous and dichotomous spectral time-series 
statistics, one example that compares statistics 
for continuous data and the same data di- 
chotomized around the mean is presented. 


Example 


The data in this example are derived from 
coding a videotape of a married couple working 
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Figure 3. Positivity of behaviors of one couple on an improvised conflict task. 


on an improvised conflict task. The coding 
system and the method for generating the 
time series from categorical data are described 
in Gottman, Markman, and Notarius (1977). 
The graphs displayed in Figure 3 represent 
a tally of positive minus negative nonverbal 
behavior coded from voice tone, facial expres- 
sions, and body cues. The unit plotted on the 


ФУ (0 30 PHASE SPECTRA 


25 


20 


abscissa is the “floor switch," that is, the set 
of utterances before one person gives up the 
floor to the other. 

These data were transformed to categorical 
data by dichotomizing around the mean of 
each series, and phase spectra and the co- 
herences were calculated for both the discrete 
and the continuous cases using Tukey-Hanning 


re 
REGRESSION 
LINE, CATEGORICAL 


CONTINUOUS CASE 


kot CATEGORICAL CASE 


Figure 4. Phase spectra for continuous and dichotomous case of couple in Figure 3; 4,,(f) = phase 


spectrum; f = frequency. 
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Figure 5. Coherence spectra for continuous and dichotomous cases of couple in Figure 3; f — frequency. 
weights and the Fast-Fourier transform pro- gorical data. The extension made it possible 
gram available at the University of Illinois to derive the information theory statistic 
(soupac programs). Figures 4 (see Equation 5) for comparing conditional with lagged un- 
and 5 (see Equation 4) present a comparison conditional probabilities and for exploring the 
of these two statistics for the continuous and limits of the z-score test as a function of 
categorical cases. The phase spectra are nearly autocorrelation. Subsequent investigations 
/ identical and have very similar regression should generate stochastic time-series data 
lines that in both cases are interpreted as the Бу using known autoregressive-moving average 
- wife leading the husband, with a constant models with seasonal components (Box & 
| lag equal to the slope of the regression line. Jenkins, 1970) and by comparing continuous 
| The slope is .31 for the continuous case and .25 and dichotomous analyses. The methods 
| for the categorical case. proposed in this article need to be applied to a 
The coherence for the categorical case is range of problems, and their ability to describe 
much lower, which is not surprising because patterns in data across time and to fail to 
so much information about strength of associa- detect patterns in known random data needs 


f 


| 


F 


tion is lost by dichotomizing. However, the 
important aspect of the coherence is the 


^ location of peaks, and one can see that the 


coherence for the categorical case has a shape 
similar to that of the continuous case. The 
two highest peaks (at f = .1 and f — .4) are 
the same for both cases, so that information 
about cyclicity in the strength of association 
across series is preserved. 


^ Conclusion 


Spectral and  cross-spectral time-series 
methods were reviewed in this article for 
Continuous data, and interpretations were 
discussed for the spectrum, the coherence, the 
Weighted coherence, and the phase spectrum. 

, These methods were also extended to cate- 


to be assessed empirically. 
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A Comparison of Linear and Monotone 
Multidimensional Scaling Models 


David G. Weeks and Peter M. Bentler 
University of California, Los Angeles 


Multidimensional scaling solutions under the linear and monotone ‘(metric and 
nonmetric) distance models were compared for several monotone and nonmono- 


tone distortions. Data were generated from random configurations, over a wide 
range of conditions. Results indicate that, when its assumptions are met, the 


linear model performs best. When linearity assumptions are not met, the mono- 


Models for Euclidean multidimensional 
scaling (MDS) of a single symmetric matrix 
can be summarized as Ay = f([X«(au = 
,)*)!}, where Љу is the observed (dis)simi- 
æ larity between stimuli i and j, and aw is the 

“projection of stimulus i on dimension k. Al- 
though in principle f may be virtually any 
real valued function, in practice there are two 
main types. In the linear MDS model, ћу = 

_ 68, c, and the parameter 6 is an arbitrary 
A — scale factor that may be ignored. This linear 
model is commonly referred to as metric 
MDS. The monotone model specifies 


k 


hy di, 


that is, the distances are required only to 

Р: Stand in the same monotone (7) rank order 

as the dissimilarities. More explicitly, the 

model requires dy < dy ћу < Аа. This 

Monotone model is generally referred to as 
< попте се MDS. 
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tone model and the linear model applied to ranked data perform equally well. 
Recommendations based on these results are offered. 


At this point it may be useful to briefly 
outline certain aspects of the history of MDS, 
as this relates to the problems investigated in 
the current article. The first method for MDS 
was developed by Torgerson (1952, 1958) 
and was based on a theorem by Young and 
Householder (1938). This method solves for 
the linear model under the Euclidean metric. 
Nonmetric MDS was introduced by Shepard 
(1962a, 1962b); this allowed solutions for 
the monotone model, Kruskal (1964a, 1964b) 
defined an explicit error function for this 
model that was minimized by the method of 
steepest descent. Cooper (1972) developed a 
method for solving the linear model using a 
method related to that used by Kruskal, and 
Bentler and Weeks (1978) developed re- 
stricted, hypothesis-testing methods for the 
linear model. Since its introduction in 1962, 
the monotone model has come to dominate the 
field. 

The continued use of the monotone model 
would imply that its superiority over the 
linear model had been established. Although 
it has been shown that the monotone model 
works quite well (e.g., Shepard, 1962b), the 
superiority of the monotone model over the 
linear model remains undemonstrated. This 
point can be made clearer by an examination 
of a classic study in the field. Ekman (1954) 
obtained similarity ratings of 14 spectral 
colors. These ratings were linearly trans- 
formed to a 0-1 scale. Then, on the assump- 
tion that these values were equivalent to 
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Figure 1. Ekman (1954) color data analyzed by the 
linear model. (The stimuli were colored lights from 
434 nm [violet] to 674 nm [red].) 


cosines or correlations between vectors repre- 
senting the stimuli, the configuration was ob- 
tained by a relatively primitive type of factor 
analysis, The resulting solution was five color 
clusters in five factors. Shepard (1962b) re- 
analyzed these data under the monotone MDS 
model with his new method. He obtained a 
solution in two dimensions, resembling the 
color circle. Although Shepard’s solution was 
clearly more appropriate than Ekman’s, it 
was not compared with the linear distance 
model. Consequently, Shepard’s solution does 
not verify the superiority of the monotone 
model over the linear model. Indeed, our own 
reanalysis of the Ekman data under the linear 
model (see Figure 1) revealed a solution in- 
distinguishable from Shepard’s monotone so- 
lution. 

One would expect the monotone model to 
be dramatically superior only when the data 
are substantially nonlinear, but it is possible 
that most data typically used for MDS are 
approximately linear on distances. Further- 
more, large deviations from “monotonicity” 


would also tend to be the largest deviations 


from linearity, “Clearly the success of such 
an undertaking [mapping a proximity matrix 
into Euclidean space] depends upon the se- 
lection of the proper distance function; that 
is, the function that will transform the prox- 
imity measures into Euclidean distances” 
(Shepard, 1962a, p. 127). Shepard (1962b) 
gave several examples of analyses of distances 
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distorted by nonlinear monotone functions. In 
all cases his method recovered the true con- í 
figuration, as well as the shape of the distort- 
ing function. However, these results were not 
compared with solutions under the linear 
model, so that it is not possible to conclude 
that monotone MDS was markedly superior 
to linear MDS. It is our purpose in this arti- 
cle to compare results under the linear and 
monotone models and to explore the robust- 
ness of the linear model. We are concerned 
only with the limited problem of comparing 
the linear and monotone models in Euclidean’ 
exploratory multidimensional scaling; no at- 
tention is paid the other important problems 
(such as how to handle missing data, analy- 
sis of non-Euclidean metrics, parameter-con- 
strained scaling methods, transformation of 
initial solutions to aid interpretation, scaling 
individual-difference data, the value of alter- 
native nonlinear optimization methods, the 
relative merit of alternative initial starting 
configurations, methods for avoiding or evalu- 
ating local minima, etc.) that have no rele- 
vance to this comparison. 


The Problem of Comparison 


Attempting to compare metric and non- 
metric MDS presents certain serious prob- 
lems. Results of the comparison should be 
useful in applied situations, and for that pur- 
pose, analysis of real data is advised. In ap- 
plied situations, however, it may not be pos- 
sible to determine which solution is “best.” 
In some cases, one solution may make more Г 
sense than another, but this is by no means? 
assured (this, of course, presumes that а 
difference will be found). In many cases, the 
data might be nearly linear on the true dis- 
tances and would thus provide no test of the 
power of nonmetric MDS. Generating dis- 
tances from a known configuration, and dis- 
torting them by a known monotone function, 
seems more promising. However, the prob- 
lems of comparison do not end there by any 
means. Stress, Kruskal's (1964a, 1964b) mea- 
sure of poorness of fit, is inadequate for the 
present purpose because it means something 
different in either case. In the linear model, 
every deviation from linearity contributes E 


MULTIDIMENSIONAL SCALING MODELS 


stress; in the monotone model, only devia- 
tions from monotonicity are counted. The 
same solution—provided it does not give per- 
fect fit—has to have a higher stress value if 
stress is measured in terms of linear rather 
than monotone regression. 

It must be acknowledged that certain im- 
portant determinants of the quality of an 
MDS solution are not amenable to Monte 
Carlo, computer simulation techniques. In 
“particular, an acceptable solution is one that 
makes sense in terms of the particular stimu- 
lus domain, Such judgments are usually sub- 
jective, often necessarily so. Criteria are also 
idiosyncratic to the particular data situation. 
Nonetheless, there is an important compo- 
nent of the quality of a solution that is 
amenable to quantification and hence to 
Monte Carlo techniques: the nearness of the 
estimated parameters to the true parameters 
of the model. For the sake of simplicity, we 
consider only the elements of the projection 
matrix as parameters. The problem reduces to 
one of finding a measure to characterize the 
discrepancy between an obtained and a true 
configuration. The squared correlation be- 
tween the distances generated by the two con- 
figurations is an appropriate measure. In a 
distance model, differences in orientation and 
location of the origin are irrelevant, because 
distances are invariant under such transfor- 
mations. Usually central dilation is irrelevant, 
affecting only the scale of the distances. The 
correlation coefficient is, of course, scale in- 
variant. The correlation of distances is natural 
in the sense that the error-free component of 
an observation used in MDS is a distance. 
The squared correlation has been used before 
in similar applications (Girard & Cliff, 1976; 
MacCallum & Cornelius, 1977; Rabinowitz, 
1976; Sherman, 1972; Young, 1970). A via- 
ble alternative would be a measure of associ- 
ation between two matrices, as developed by 
Lingoes and Schónemann (1974; Schonemann 
& Carroll, 1970). We prefer the squared cor- 
«relation in this case, because it is simpler in 
both calculation and interpretation (the pro- 
portion of variance of one set of distances 
accounted for by the other). Of course, it 
must be recognized that since the distances 
within a set are not independent, standard 
statistical sampling theory is not relevant to 
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understanding the squared correlation in this 
context. 

A critical problem in MDS is determination 
of the correct dimensionality (Shepard, 
1974). Probably the most common method 
now in use involves extracting only as many 
dimensions as can reasonably be interpreted. 
This method, of course, is not amenable to 
Monte Carlo studies. A promising objective 
method involves matching obtained stress 
curves with those previously obtained by 
Monte Carlo techniques (e.g., Spence & Graef, 
1974; Wagenaar & Padmos, 1971). Unfortu- 
nately, these studies employed only the 
monotone model, and thus their results can- 
not be used to compare the monotone model 
with the linear model. 


A Simulation Study 
Method 


Conditions were chosen to cover the range of con- 
ditions found in most applications of MDS. Three 
levels of number of points (m = 10, 20, or 30) and 
four levels of true dimensionality (t = 1, 2, 3, or 4) 
were completely crossed. Elements of the configura- 
tion matrices were obtained from a uniform pseudo- 
random number generator with a range of 14, 12, 
10, or .8 for the first through fourth dimensions, 
respectively. Within each m, ^ combination there 
were two independent replications. 

Data were obtained by generating true distances 
from the configuration and by adding random error 
to the distances. For cases in which this led to nega- 
tive values, a constant was added such that the 
smallest value was zero. These values were then 
transformed by several known functions to produce 
the sets of data that were analyzed. Error was 
drawn from a normal distribution with a mean of 
zero and variances, expressed as proportions of the 
variance of the true distances, of .25, .75, and 2.0. 
(The generator used was routine conrr [IMSL 
Library, 1975]. Five distorting functions were 
chosen, The first, Ary = du + еу, where h is the 
data, consisted of no distortion—a linear relationship 
between data and distances. This was in a sense a 
control condition by which the severity of the other 
distortions could be judged. Two distortions, ћи = 
(du--eu)* and hu = (du + eu), were intended 
to exemplify severe, monotone distortions. The fourth 
distortion, A —rank (du + е1), was particularly 
important for two reasons: First, the ranked data 
contain only that information used by the monotone 
model; second, this condition can always be ob- 
tained in real situations by ranking the data, since 
rank [/(d-d-e)] = rank (d+e), where f is mono- 
tone. The last distortion was his = |diy + eu — w|, 
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Figure 2. Effect of model, distortions, and number 
of dimensions on the squared correlation (7); d= 
distance; e — error; w =a constant. 


where w is the 20th percentile of d+e. This dis- 
tortion was included more for curiosity than for 
practical value. We wished to gain some insight into 
the behavior of the two models when monotonicity 
was violated. 
Thus, 3 (number of points) X 4 (dimensions) X 
2 (replications) X 3 (levels of error) X 5 (distor- 
tions) = 360 sets of data that were generated. All 
sets of data were analyzed under the linear model. 
Under the monotone model, the various monotone 
distortions have no differential impact. Therefore, 
only data produced by the linear function and the 
nonmonotone function were analyzed under the 
monotone model. (Actually, we performed complete 
analyses for a small subset of the conditions and 
confirmed that this was true in practice as well as 
in theory.) All analyses were done with the KYST 
program (Kruskal, Young, & Seery, 1973), which 
uses Kruskal's (1964a, 1964b) method. Starting con- 
figurations were obtained by Torgerson’s (1958) 
method. Stress 1, with the primary approach to 
ties, was specified. Solutions were obtained in from 
one to six dimensions, but the obtained dimension- 
ality was not allowed to exceed the true dimension- 
ality plus three, This resulted in 10,584 solutions, 
For each solution, stress and the squared correlation 
between true and recovered distances were recorded. 
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Of primary interest were differences in re- | 
covery of the true distances between the linear 
and monotone models. The mean squared cor- 
relation for the seven distortion-model condi- 
tions, for the cases in which recovered di- 
mensionality equaled true dimensionality, is 
plotted in Figure 2. (All correlations were 
positive.) The linear model, with no distor- 
tion, was best overall. The rank condition 
with the linear model and the no-distortion 
condition with the monotone model were vir- . 
tually identical and were nearly as good as“ 
the first condition. The two other monotone 
distortions with the linear model were clearly | 
worse than the monotone analysis of ranked 
data, with which they should properly be | 
compared. For the nonmonotone distortion, 
the monotone model was superior to the 
linear model except in the one-dimensional і 
condition. 

The effects of error, dimensions, and the 
number of points on the squared correlation 
are shown in Figure 3. Again, these data are 
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Figure 3. Effect of error, number of points, and num- 
ber of dimensions on the squared correlation (7); T 
corresponds to true dimensionality; M = number of 
points. 


for recovery in the true dimensionality of the 
configuration. The squared correlation de- 
creases as error and number of dimensions 
increase and increases with the number of 
points. The effects of number of points and 
number of dimensions reflect the degree of 
overdetermination of parameters. There were 
no systematic interactions of error with type 
of analysis. 

The effects of the various factors were ex- 
amined for the data in which the dimension- 
ality was overestimated or underestimated by 
one. There were no coherent patterns of re- 
sults in these data that were different from 
the data displayed in Figures 2 and 3. There- 
fore, detailed summaries of these data are 
omitted. 


Discussion 


Two aspects of the monotone multidimen- 
sional scaling model can be conceptually dis- 
tinguished. The first is that only the ordinal 
properties of the data are retained, and the 
second is that a best fitting monotone func- 
tion is derived. The results of the present 
study strongly indicate that only the first 
aspect has any real practical importance. The 
ordinal constraints among a matrix of dis- 
tances are strong enough to determine a highly 
metric solution (Kruskal & Shepard, 1974; 
Shepard, 1966, 1974; Young, 1970), and this 
is true whether the function relating data to 
recovered distances is linear or monotone. 
The virtual identity in the squared correla- 
tion between the linear, rank conditions and 
the monotone, no-distortion conditions bears 
this out. Further, since the squared correla- 
tion values in the linear, no-distortion condi- 
tions were only slightly higher than either 
the ranked, linear conditions or the mono- 
tone conditions, it seems that the actual in- 
terval properties of the data add little over 
the ordinal constraints. 

The other aspect of the monotone model, 
the function derived from monotone regres- 
Sion, has a serious drawback not shared by 
_ (ће linear model. It is well-known that a 
“problem arises when there are clusters of 
Points such that the within-cluster distances 
are all smaller than the between-clusters dis- 
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tances. Use of the monotone model can then 
lead to degenerate solutions, in which the 
clusters collapse into single points (Shepard, 
1974). This can happen because monoton- 
icity is preserved when all small distances 
are set to zero and all larger distances are 
set equal to one another. The linear model 
is not similarly susceptible to degeneracy, be- 
cause monotone equating of distances creates 
deviations from linearity, which add to the 
size of the loss function. Consequently, the 
rank version of the linear model (rank-linear 
model) should also be able to avoid such 
degenerate solutions. 

The generally outstanding performance of 
the rank-linear model is not particularly sur- 
prising when one considers that the use of 
ranks involves using more information from 
the original data in the rank-linear model 
than from the original data in the monotone 
model. In the rank-linear model, the ranks 
represent a certain distance function for the 
data; in the monotone model, the distance 
function is recovered subject only to the 
weaker constraint that dy < ы iff hy < Лы. 
Thus the rank-linear model operates in a 
data metric analogous to Spearman's rank- 
order correlation coefficient, whereas the 
monotone model operates in a data metric 
analogous to Kendall’s tau. In the closely 
related area of monotone principal-compo- 
nents analysis (e.g, Kruskal & Shepard, 
1974), the use of ranked data similarly yields 
excellent results at a large savings in cost 
(Woodward & Overall, 1976). 

Based on our results, the known problems 
with monotone regression, and cost consid- 
erations, one may wish to approach multi- 
dimensional scaling analyses in the follow- 
ing way. First, the scaling is performed under 
the linear model with no transformation of 
data. If there are no systematic monotone 
biases, the solution should be optimal as well 
as inexpensive. A scatterplot of data to re- 
covered distances must be examined; if sys- 
tematic nonlinearities are present, the data 
should be ranked and reanalyzed with the 
linear model. If no appreciable clustering of 
points is detected, a monotone scaling would 
be a suitable (but more expensive) alterna- 
tive to the second analysis. This approach to 
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MDS is not appropriate when there is a non- 
monotone relation between data and dis- 
tances. Nonmonotone functions are in gen- 
eral not one to one, they do not necessarily 
have inverses, and an adequate recovery of 
the function is not possible. If severe non- 
monotonicities exist in the data, it would be 
neither appropriate nor profitable to employ 
any distance model. If one were employed, 
however, the monotone model would seem to 
be a reasonable choice. It was more robust 
than the linear model to violations of mono- 
tonicity, at least in our data. 

In conclusion, the monotone model has the 
advantage of robustness over the linear model, 
since, in this study, it performed better than 
the linear model for all systematic monotone 
and nonmonotone distortions. The linear 
model, on the other hand, has the advantage 
of conceptual simplicity and greater computa- 
tional efficiency and avoids the danger of 
degeneracy. The rank-linear model appears 
to offer the advantages of both. The only 
computation it requires over the linear model 
is an initial ranking of data. The ranking 
eliminates all systematic monotone nonlin- 
earities, whereas the linear analysis avoids 
the potential of degeneracy due to monotone 
regression. 


References 


Bentler, P. M., & Weeks, D. G. Restricted multidi- 
mensional scaling models. Journal of Mathemati- 
cal Psychology, 1978, 17, 138-151. 

Cooper, L. G. A new solution to the additive con- 
stant problem in metric multidimensional scaling. 
Psychometrika, 1972, 37, 311-322. 

Ekman, G. Dimensions of color vision, Journal of 
Psychology, 1954, 38, 467-474. 

Girard, R. A., & Cliff, N. A Monte Carlo evaluation 
of interactive multidimensional scaling. Psycho- 
metrika, 1976, 41, 43-64. 

IMSL Library (Vol. 1, Ed. 5). Houston, Tex.: In- 
ternational Mathematical and Statistical Libraries, 
1975, 

Kruskal, J. B, Multidimensional scaling by optimiz- 
ing goodness of fit to a nonmetric hypothesis, 
Psychometrika, 1964, 29, 1-27. (a) 

Kruskal, J. B. Nonmetric multidimensional scaling: 
A numerical method. Psychometrika, 1964, 29, 
115-129. (b) 

Kruskal, J. B., & Shepard, R. N. A nonmetric variety 


DAVID G. WEEKS AND PETER M. BENTLER 


of linear factor - analysis. 
39, 123-157. 

Kruskal, J. B., Young, F. W., & Seery, J. B. How 
to use KYST, a very flexible program to do multi- 
dimensional scaling and unfolding. Murray Hill, 
N.J.: Bell Laboratories, 1973. 

Lingoes, T. C., & Schénemann, P. H. Alternative 
measures of fit for the Schónemann-Carroll matrix 
fitting algorithm. Psychometrika, 1974, 39, 423- 
427. 

MacCallum, R. C., & Cornelius, E. T., III. A Monte: 
Carlo га of recovery of structure by 
ALSCAL. Psychometrika, 1977, 42, 401-428. T 

Rabinowitz, G. A procedure for ordering 
pairs consistent with the multidimensional 
folding model. Psychometrika, 1976, 41, 34 

Schénemann, P. H., & Carroll, R. M. Fitting 0 
matrix to another under choice of central dila- 
tion and a rigid motion. Psychometrika, 1970, 35, 
245-255. 

Shepard, R. N. The analysis of proximities: Multi- 
dimensional scaling with an unknown distance 
function: I. Psychometrika, 1962, 27, 125-140. (a) 

Shepard, R. N. The analysis of proximities: Multi- 
dimensional scaling with an unknown distance 
function: II. Psychometrika, 1962, 27, 219-246. 
(b) 

Shepard, R. N. Metric structures in ordinal data. 
Journal of Mathematical Psychology, 1966, 3, 
287-315. 

Shepard, R. N. Representation of structure in simi- 
larity data: Problems and prospects. Psycho- 
metrika, 1974, 39, 373-421. 

Sherman, C. R. Nonmetric multidimensional scal- 
ing: A Monte Carlo study of the basic param- 
eters, осоне на 1972, 37, 323-335. 

Spence, L, & Graef, J. The determination of the _ 
ünderying dimensionality of an empirically ob- 
tained matrix of proximities. Multivariate Be- 
havioral Research, 1974, 9, 331-341. 

Torgerson, W. S. Multidimensional scaling: I. The- 
ory and method. Psychometrika, 1952, 17, 401- 
419. 

Torgerson, W. S. Theory and method of scaling. 
New York: Wiley, 1958. 

Wagenaar, W. A., & Padmos, P. Quantitative inter- 
pretation of stress in Kruskal’s multidimensional 
scaling technique. British Journal of Mathemati- 
cal and Statistical Psychology, 1971, 24, 101-110. 

Young, F. W. Nonmetric multidimensional scaling: 
Recovery of metric information. Psychometrika, 
1970, 35, 455-473. 

Young, G., & Householder, A. S. Discussion of a set 
of points in terms of their mutual distances. Ps- 
chometrika, 1938, 3, 19-22. 

Woodward, J. A., & Overall, J. E. Factor analysis , 
of rank-ordered data: An old approach revisi 
Psychological Bulletin, 1976, 83, 864-867. 


Psychometrika, 1974, 


Received December 5, 1977 


Psychological Bulletin 
1919, Vol. 86, No. 2, 355-360 


Comment on Olson: Choosing a Test Statistic 
| in Multivariate Analysis of Variance 


James Stevens 
University of Cincinnati 


This article questions Olson’s claim that the Pillai-Bartlett statistic (V) is su- 


perior to Wilks's A (W) and the Hotel 


г to ling-Lawley trace (T) for general use in 
multivariate analysis of variance because of much greater robustness against 
unequal covariance matrices. It is shown by a sampling of studies from the 


literature that the example Olson used to demonstrate superiority had extreme 


subgroup variance differences, 


the greater power of T and 
the clear choice as test statistic. 


® 

Olson (1976) argued recently, based on 
robustness and power considerations, that of 
four multivariate test statistics available 
(Roy’s largest root [R], the Hotelling-Law- 
ley trace [7], Wilks's A[W], and the Pillai- 
Bartlett trace [V]), V should generally be 
- used. Olson's argument was mainly based on 
his statement that V is much more robust 
than are the other three to violations of the 
homogeneity-of-covariance-matrices assump- 

tion in multivariate analysis of variance. 
To illustrate the much greater robustness 
of V, Olson gave the following example from 


periment with five subjects per group and six 
dependent variables at nominal а = .05. Then 
if one group is sampled from a population 
with variances 36 times as great as those of 
the other groups, the actual Type I error rate 
for У = .09; but the error rate is 49 for W, 
.58 for T, and .62 for R. This result does show 
V to be dramatically more robust than ТУ, 
or R. However, a representative sampling of 
Olson's (1973) dissertation results indicates 
that V is only dramatically more robust for 


- “Requests for reprints should be sent to James 
Stevens, Teachers College Building, University of 
Cincinnati, Cincinnati, Ohio 45221. 
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his 1976 article: Consider a three-group ех-. 


which occur very infrequently. For subgroup 
variance differences much more likely to occur, it is shown that the actual Type 
I error rates for V, T, and W are very similar. For concentrated noncentrality 
structures with covariance heterogeneity, 
three statistics be used, since the slight robustness advantage V has is offset by 
W in these situations. For diffuse structures, V is 


it is recommended that any of these 


extreme subgroup variance differences, as in 
the above example. Table 1, which presents 
the sampling of actual Type I error rates from 
Olson's dissertation for nominal а = .10 and 
.05, shows that for smaller subgroup variance 
differences the differences in error rates for V, 
T, and W are very small. The differences in 
error rates for T, W, and V for nominal « = 
.10 are less than 2%; for W versus V, they 
are less than about 1%. The differences in 
error rates for T, W, and V for the more ex- 
treme nominal value of .05 are essentially the 
same. In general, they are very small (€ 
2.3%for T vs. V and < 1.5% for W vs. V). 

Eight of the nine cases in Table 1 in which 
the differences in error rates are larger 
(marked with superscript b) for both nominal 
values correspond to very large subgroup 
variance differences on all variables (361, 
that is, the population variances of all varia- 
bles in one group are 36 times greater than 
the variances of those variables in the remain- 
ing groups). For nominal a — .10, the actual 
error rates are at least twice the nominal 
value. For nominal a — .05, the actual error 
rates are at least three times the nominal 
value. Therefore, under 36I the error rates for 
all statistics are far from the nominal value, 
although relatively speaking V is much more 
robust. 
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Table 1 


Actual Type I Error Rates for Four Multivariate Test Statistics Under Heterogeneous 
Covariance Matrices for Nominal a = .10 and .05 
e—a 


Pillai- 
Roy's largest Hotelling Wilks's Bartlett 
root R trace T Ww trace V 
No. No. T. 
variables groups М D* 10 05 10  .05 10 05 10,05 
2 3 5 4I 138 89 139 80 143 70 137 65 
2 3 5 9I 185 140 180 123 178 109 167 102 
2 3 5 361 252 198 234 176 229 166 218 151 
2 3 5 Са) nnm 129 65 128 6 124 66 
2 3 5 C(9) 139 92 141 82 144 73 139 75 
2 3 5  C(36) 156 104 149 98 159 91 157 87 
2 3 10 4I 133 87 128 73 124 67 125 63 
? 3 10 9 168 111 153 101 150 100 144 90 
2 3 10 361 200 142 182 130 174 126 168 117 
2 3 10 C(4) 14 67 16 65 12 63 11 6l 
2 3 10  C(9) 123.1, 425 120 70 118 — 69 19 71 
2 3 10 С(36) 120 78 126 76 127 @ 77 123 78 
2 3 50 361 185 129 157 100 154 99 153 — 96 | 
2 3 50 С(36) 136 81 Duty 11:172 14а 71 
2 6 5 4 160 100 138 79 1288 71 11 6 
2 6 5 9I 243 188 205 142 192 121 164 985 
2 6 5 36l 355 298 299 243 268 218 231 1705 
2 6 5 C(4) 133 77 125 79 119 76 113 73 
2 6 5 C(9 165 110 152 100 142 102 134 98 
2 6 10 4I 145 96 122 81 15 78 113 69 
2 6 10 9I 209 163 166 135 161 127 158 14 
2 6 10 361 286 238 223 191 207 184 197 162* 
2 6 10 C(4) 121 79 118 81 13 29 11 15 
2 6 10  C(9) 137 94 121 98 122 96 124 9 
" 6 10 С(36) 161 117 142 113 139 115 137 109 
3 3 10 4I 165 92 140 88 135 86 126 75 
3 3 10 9 217 150 185 130 176 125 166 111 
3 3 10 361 274 202 224 188 217 184 197 153^ 
3 3 10 C(4 108 56 110 63 111 64 108 63 
3 3 10 С(36) its 74 120 74 121 80 117 78 
3 3 20 4I 167 117 146 89 149 86 145 83 
3 3 20 9 221 162 173 128 171 123 165 109 
3 3 20 361 264 203 199 160 195 152 190 143 
3 3 20 Са) 131 89 131 84 1377 278 131 70 
3 3 20  C(36) 145 100 142 95 142 95 139 90 
3 6 10 4I 197 127 146 100 137 92 129 77 
3 6 10 361 434 373 300 261 273 224 233 186% 
3 6 10 С(4) 143 83 124 79 124 78 19 — 74 
3 6 10 С(36) 171 109 157 107 156 101 147 100 
6 3 10 361 542 462 443 358 369 299 277 1875 
6 * 6 50 361 632 550 270 223 249 206 232 1825 
10 3 10 361 830 752 752 688 673 564 336 165% 
10 3 10 С(36) 122 63 1206 67 15 58 116 63 
10 3 S0 361 487 434 268 21 234 186 212 152% 
10 6 10 С(36) 135 84 130 92 1299 91 136 73 


* D = extent of subgroup variance differences: 4I means population variances of all variables in one group 


are four times greater than the variances of tho: 


differences. Thus, C (4) means the population 


greater than 1.5%. 


se variables in the remaining groups; C refers to concentrat 


Г variance on only one variable in one group is four times greater 
than the variance of that variable in the remaining groups. Е Е 


^ Cases in which the differences in error rates for T versus V аге greater than 2.5% and for W versus V are 
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COMMENT ON OLSON 


Since V is only much more robust for ex- 
treme subgroup variance differences, the fol- 
owing question arises: “Ноу often can one 
expect to find subgroup variances that differ 
by a magnitude of 36 to 1?" Table 2 presents 
the subgroup variances for a small sampling 
of studies from the literature. Although the 
sampling is small, it includes nine different 
sources and considers a variety of criterion 
variables (achievement, personality, biochem- 
ical, and dental). The first point these stud- 
ies demonstrate is that subgroup variance 
differences near 36 to 1 occur very infre- 
quently, Only in the Finn (1974) example are 
there differences of this magnitude, and even 
there the large differences are confined to 
certain variables and are not of the I type. 
The second point these studies illustrate is 
that the I type of heteroscedasticity Olson 
considered (in which one group has large 
variances on all variables and all other groups 
have equal and smaller variances) is one that 
probably does not occur very often in prac- 
tice. Rather, the groups differ in a variety of 
ways. For example, in the French, Brownell, 
Graziano, and Hartup (1977) and Smith, 
Gnanadesikan, and Hughes (1962) studies the 
largest variances for the three variables in 
each case fell in three different groups. In 
Meichenbaum's (1975) study, although the 
self-instruction group had the largest vari- 
ances on all three variables, the variances of 
the variables in the other two groups were far 
from equal. 

To further document that extreme sub- 
group variance differences (especially of the 
36I type) occur rarely in educational and 
psychological research, a check was made of 
three major journals: American Educational 
Research Journal for 1972-1977 and Journal 
of Educational Psychology and Psychological 
Reports for 1975-1977. This more extensive 
sampling confirmed the results of the smaller 
sampling reported in Table 2. 

A secondary part of Olson’s argument con- 
cerned the power of the four statistics. Which 
is most powerful depends on how the null 
hypothesis is false. With a concentrated non- 
centrality structure, the power ordering (from 
most to least powerful) is R, T, W, and V. 
With a diffuse structure (the groups differing 
along several dimensions), the power ordering 
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is reversed (V, W, T, and R), and power dif- 
ferences among V, W, and T are typically 
small (Olson, 1973, p. 73). 

Therefore, if assumptions are met, the 
choice of test statistic depends on the degree 
of concentration of the noncentrality struc- 
ture. If assumptions are not met, then I 
would argue as follows: For concentrated or 
nearly concentrated structures, use V, W, or 
T as the test statistic, because the discrepan- 
cies in actual Type I error for these three, 
for subgroup variance differences likely to 
occur in practice, are so small that which of 
the statistics is used makes no practical dif- 
ference, Although V will generally be slightly 
more robust, the gain in power obtained by 
using T or W instead of V will offset or more 
than offset V’s slight robustness advantage. 
There is evidence to suggest that concentrated 
structures are quite prevalent in psychological 
research (Bock, 1975, p. 154). For diffuse 
structures, however, it must be acknowedged 
that V is the preferred choice, since in these 
cases it is slightly more robust and somewhat 


more powerful. 


References 


Bock, R. D. Multivariate statistical methods in be- 
havioral research, New York: McGraw-Hill, 1975. 

Calsyn, D. А., Spengler, D. M., & Freeman, C. W. 
Application of the somatization factor of the 
MMPI-168 with low back pain patients. Journal 
of Clinical Psychology, 1977, 33, 1017-1020. 

Finn, J. D. A general model for multivariate analy- 
sis. New York: Holt, Rinehart & Winston, 1974. 

French, D. C., Brownell, C. А., Graziano, W. G., & 
Hartup, W. W. Effects of cooperative, competi- 
tive and individualistic sets on performance in 
children’s groups. Journal of Experimental Child 
Psychology, 1977, 24, 1-10. 

Gardner, E. T., & Schumacher, б. M. Effects of 
contextual organization on prose retention, Jour- 
nal of Educational Psychology, 1977, 69, 146-151. 

Meichenbaum, D. Enhancing creativity by modify- 
ing what subjects say to themselves, American 
Educational Research Journal, 1975, 12, 129-145. 

Novince, L. The contribution of cognitive restruc- 
turing to the effectiveness of behavior rehearsal in 
modifying social inhibition in females. Unpub- 
lished doctoral dissertation, University of Cincin- 
nati, 1977. 

Olson, C. L. A Monte Carlo investigation of the 
robustness of multivariate analysis of variance. 
Unpublished doctoral dissertation, University of 
Toronto, 1973. 


360 JAMES STEVENS 


Olson, C. L. On choosing a test statistic in multi- Multivariate Behavioral Research, 1972, 7, 499- - 


variate analysis of variance, Psychological Bulle- 522. 
lin, 1976, 83, 579-586. Wright, R. J. The affective and cognitive conse- 


Smith, H., Gnanadesikan, R., & Hughes, J. В. quences of an open education elementary school. 
Multivariate analysis of variance. Biometrics, American Educational Research Journal, 1975, 12, 
1962, 18, 22-41, 449-468. 


Stevens, J. P. Four methods of analyzing between 
variation for the k-group MANOVA problem. Received December 12, 1977 m 


росы cal Bulletin 
1979, Vol. 86, No. 2, 361-370 


Confirmatory Inference and Geometric Models 


Lawrence J. Hubert 
Department of Education 
| University of California, Santa Barbara 


| Та the literature on data analysis over the 
last 20 years, a distinction between explora- 
tory and confirmatory procedures has become 
very popular (Hildebrand, Laing, & Rosen- 
thal, 1977; Kaiser, 1970; Tukey, 1962). An 
exploratory strategy typically involves the 
use of an analysis technique on a given data 
set with the aim of identifying interesting 
relationships, patterns, and the like. Alterna- 
tively, a confirmatory approach requires the 
test of an a priori conjecture that is generated 
from a source distinct from the data to be 
used for the purposes of validation, This lat- 
ter test in the present context is correlational, 
and thus the term confirmation is given a 
limited meaning that does not imply the ab- 
solute correctness of a hypothesis. Since a 
correlational analysis can never exclude all 
competing explanations, we argue when it is 
justified that the pattern of data is not unre- 
lated to the conjectured pattern. 

It may be obvious that confirmatory analy- 
ses would be desirable adjuncts to many of the 
current exploratory methods used in the study 
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A confirmatory technique is discussed that is appropriate for comparing a given 
geometric model with supplementary data available on the same objects used in 
the representation. The inference procedure is based on relatively straightforward 
distribution-free principles and requires the comparison of one proximity matrix, 
possibly reconstructed from a particular geometric model, with a second struc- 
ture matrix obtained from the supplementary information. Two examples are 
presented that illustrate the generality of the same statistical approach. 


of proximity matrices (such as clustering and 
multidimensional scaling), but very few tech- 
niques have been proposed that could help 
carry out such a program with any degree of 
rigor. Users of the newer data reduction pro- 
cedures lack confirmatory techniques even of 
a correlational nature and must rely on in- 
tuitive arguments based on whatever addi- 
tional information is available for the objects 
being studied. Although this practice is com- 
mendable given the current state of the art, it 
is now possible to proceed one step further 
using the correlational methods presented in 
this article and incorporate the same informa- 
tion relevant to a post hoc explanation more 
directly in a confirmatory manner. To provide 
a complete illustration and also to limit the 
scope of the discussion, our emphasis is on 
geometric models, or more generally, on data 
representations that can be given some type of 
explicit geometric interpretation. In short, the 
sections to follow illustrate how confirmatory 
data analysis problems that are phrased geo- 
metrically can be approached through a rela- 
tively straightforward concept of correlation. 
Depending on the context, it is conceivable 
that the method presented here might be 
used in conjunction with a geometric model; 
in extending an existing analysis strategy 
based on geometric notions; as a means of 
interpreting a given model with respect to 
supplementary information; or finally, as a 
preliminary to the construction of a desired 
geometric representation. The first example 
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Figure 1. Patterns used by Glushko (1975) in testing 
Garner's (1962) pattern goodness hypothesis; each 
pattern is characterized by one of three inferred 
equivalence class sizes: 1 and 2 by a size of one, 
3-10 by a size of four, and 11-17 by a size of eight. 
Glushko's patterns were as wide as they were high. 


below formalizes the basic ideas that are pre- 
sented here. 


Example 1: Figural Goodness 


In a recent study concerned with the 
“goodness” of patterns, Glushko (1975) at- 
tempted to verify Garner's (1962) basic hy- 
pothesis regarding what makes one pattern 
better than another. To be more specific, each 
of the 17 patterns used by Glushko, listed in 
Figure 1, can be characterized by the size of 
an inferred equivalence class, The term 
equivalence is used to label the set of patterns 
that contain a single figure plus all other 
configurations that result from reflections or 
from 90° rigid rotations. As indicated in 


LAWRENCE J. HUBERT AND MICHAEL J. SUBKOVIAK 


Figure 1, 2 of the Glushko patterns construct 
the same configuration under all of these op- 
erations, 8 patterns have four associated fig. 
ures, and finally, 7 patterns produce 8 dif 
ferent members. According to Garner, the 
subjective judgment of pattern goodness is a 
direct function of the size of a configuration's 
inferred equivalence class, with the smaller 
size classes corresponding to the better pat- 
terns. 

To test Garner's hypothesis using the 17 
patterns of Figure 1, Glushko first obtained a 
symmetric measure of proximity between each! 
pair of patterns by using a choice task, All 
136 different pattern combinations were pre- 
sented to 20 subjects, who were asked to indi- 
cate their preference. These choices were then 
summed over subjects and subtracted from 
an expected preference frequency of 10. Due 
to the subtraction of 10, the absolute values 
of these differences, given in the lower tri- 
angular portion of Table 1, form a symmetric 
measure of proximity defined for all pattern 
pairs and provide data in a form that can be 
subjected to a variety of data reduction 
techniques. In particular, Glushko attempted 
to represent the structure of the proximity 
function by first placing the 17 configurations 
in a two-dimensional space using Shepard and 
Kruskal’s multidimensional scaling routin 
(see Kruskal, 1964a, 1964b). Given this g 
metric representation, Johnson’s (1967) di 
ameter clustering results were then superi 
posed, producing a representation similar 
that we give in Figure 2 (here, we only indi 
cate the clustering result defined by thr 
subclasses). Clearly, one strong dimensi 
(the vertical) can be identified as that 0 
equivalence class size. In addition, the clu 
ters themselves correspond fairly well to 
grouping on the basis of the same criterio 
except for the minor misplacement of the t 
configurations numbered 10 and 11. 

The process of verifying Garner's hypoth 
sis through a multidimensional scaling ап 
clustering might be considered rather circ 
tous, especially since the equivalence cl 
hypothesis implies a definite structure f 
the original proximity measure. Although 0 
dimension is very strong in this example ani 
the clustering and scaling results are cl 
cut, unambiguous outcomes of this type 
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Symmetric Proximity Matrix Obtained by Glushko (1975) for the Patte Fi, 
| Structure Matrix Generated by Equivalence Class — Боа 
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somewhat rare. In general, when a strong 
hypothesis is not reflected as dramatically in 
the scaling or clustering results, it may be 
difficult to decide whether the hypothesis is 
inadequate or whether the data reduction 
techniques are at fault. In the typical applica- 
tion, the researcher may be able to identify 
portions of his or her theory in a scaling or 
clustering solution, but may lack a strategy 
for measuring in any precise manner the ac- 
peal degree of confirmation or nonconfirma- 
ion, 

As an alternative approach, it should be 
Possible to test directly whether the pattern 
goodness hypothesis is reflected in the origi- 
nàl proximities and bypass the scaling and 
clustering solutions altogether. To introduce 
some notation, suppose the patterns are de- 
noted by 01, 02, . . ., On (where й = 17 in our 
example). Furthermore, let q(0,0;) refer to 
the symmetric? proximity between patterns 
о and оу, and let Q refer to an organization 
of these measures into a 17 X 17 square 
‘matrix with rows and columns labeled by 
the objects or patterns 01, 02, . - ., 0». By CON- 
vention, the diagonal of Q is assumed to con- 
sist entirely of zeros, In addition to the em- 
pirical proximity matrix Q, the stated hy- 
pothesis is represented numerically by a sec- 
Ond "structure" matrix C with elements 
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Q2) t9 O — KOCH d И RERUM MI 
KOM KOCCH dg eRe RE M ME 
н KOCCOMSP PR ER EPR RIA 
-»Mooo0o02O0.»aibd PREITY 
Мооооооњь Fe PhP RPI 


ORO UIo-PRÓEOOOOOOuu 
OR оњ н со DA Pre RR Ee PR MI Y 


* Symmetric proximity matrix is in lower triangle; structure matrix is in upper triangle. 


с(0,0у). Explicitly, suppose N(o,) denotes 
the size of the inferred equivalence class for 
object o; and let f be some monotone func- 
tion on the integers, for example, f(x) > f(y) 
if and only if x > y. Then as a formal defini- 
tion, c(0:,0;) = SUN (a) — М(0;) |], where it 
is assumed that с(0,,0;) = 0 for o; = 0). Al- 
though many functions can be used and the 
actual choice depends on the researcher’s 
judgment as to the most appropriate relative 
size of the structure values, for the purposes 
of an illustration ј is taken as the identity, 
that is, f(x) — x. In other words, the sym- 
metric function values c(o50,), given in the 
upper triangular portion of Table 1, are 
merely the absolute values of the differences 
in equivalence class sizes associated with the 
objects o; and 0j. 

As an operational interpretation, the theory 
used to generate the function с(01,0;) is given 
empirical support if the two sets of elements, 
c(0,0) and q(0,0;), have a similar patterning 
of high and low entries. Although many for- 
mal indices for this relationship can be de- 
fined, the pairing of a proximity q(0,0;) with 
a structure value c(0;,0;) suggests that the 


1 Extensions to asymmetric measures are possible 
using the generalizations discussed in Hubert and 


Schultz (1976). 
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Figure 2. Two-dimensional scaling 


simple Pearson product-moment correlation 
may be a reasonable measure to consider; it 
is thus our major choice for the sequel? Once 
this index is calculated, the next problem 
concerns its significance and, specifically, 
whether the size of the observed correlation 
between the values for q(0;,0;) and c(o;0;) 
is sufficient to reject some appropriately de- 
fined null hypothesis. 

To generate a reasonable reference distribu- 
tion for the observed correlation, suppose 
one assumes a randomness hypothesis that 
can hopefully be rejected. More specifically, it 
is conjectured that the partition of the objects 
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of the 17 Glushko (1975) patterns. 


(or patterns) 01,05, . . . , On occurred randomly 
or was chosen at random from the set of all 


2Since the conditional assumption of fixed func- 
tions c(040;) and 4(0:,0;) is made, several other 
indices are formally equivalent to the Pearson sta- 
tistic, at least from an inference point of view. Бог 
instance, either the raw sum of cross-products; | 
Iis с(040;)4(01,0;), or the sum of squared dif- 
ferences, 21,3 [с(040)) — q(04,05)]*, could be used. 
Alternatively, a transformation on the original func- 
tion values could be carried out and the comparison 
performed on the transformed values, Thus, if thé 
ranks of c(0,0;) and q(0:,0;) are used, then Spear- 
man’s rank correlation index is the measure 4 
correspondence. 
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partitions of the same form. In our case, the 
“conjectured partition contains three classes 
with two, eight, and seven objects in each, 
and thus, the null hypothesis of interest as- 
serts that this particular partition occurred 
randomly and consequently does not reflect 
the patterning of entries in the proximity 
matrix Q. Moreover, any such partition of Q 
of the same form (i.e., number of classes) 
Will produce a correlation index and when 
completely enumerated will generate an exact 
Aeference distribution for the null hypothesis. 
Since it is assumed that the partition is con- 
jectured prior to observing the data, the hy- 
pothesis of randomness is an active candidate 
to consider, and its rejection is not a foregone 
conclusion. 
From an inference perspective, the observed 
correlation for the conjectured partition can 
m compared with the distribution generated 
by complete enumeration, and if the observed 
correlation is at a suitably extreme percentage 
point, the null hypothesis of randomness can 
be rejected. Moreover, when the correlation 
actually obtained for the conjectured parti- 
tion is large enough, this index can be assumed 
pto reflect a value that was obtained nonran- 
domly; that is, at least to some extent, the 
functions 4(0,0;) and с(о0,0у) have a com- 
Mon patterning of high and low entries. In 
Short, a permutation (or randomization) test 
of the type discussed in detail by Bradley 
(1968), Edgington (1969), or Lehmann 
(1975) is applied. For instance, if the func- 
tions q(0,0;) and с(0,,0;) are specialized ap- 
loropriately (Hubert & Baker, 1977), then 
this same strategy includes the usual ran- 
domization analysis of a one-way design. 
Although complete enumeration is gener- 
ally prohibitive because of computational 
Costs and although an exact reference distribu- 
tion is thus typically too expensive to obtain, 
Monte Carlo approximations are relatively in- 
expensive (cf. Hubert & Schultz, 1976; 
pue & Hubert, 1976). For instance, Table 
E oue frequency results of selecting 
>00 partitions of the desired form at ran- 
dom and with replacement and should provide 
an approximate distribution that is fairly ac- 
а for this application. In particular, using 
i d Figure 1 data, the observed correlation 
T the Garner (1962) hypothesis is .64, which 
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Table 2 

Approximate Distribution for Comparison of 
Structure and Proximity Matrices in Table 1 
2 EEEEE—E_ EE 


Sample cumulative 


Correlation proportion 
—.193 .001 
= ап .005 
—.162 .010 
—.117 .050 
— .098 .100 
—.070 .200 
—.046 .300 
—.025 .400 
—.009 .500 

.010 .600 
.033 .700 
.068 .800 
115 -900 
162 .950 
3 .990 
297 .995 
.396 .999 
420 1,000 
Note. N = 1,000. 


is greater than any value observed in the 
Table 2 distribution. Thus, the null hypoth- 
esis of a random partition can be rejected at 
an approximate significance level of, say, .001, 
suggesting that the equivalence class hypothe- 
sis is supported by the patterning of the prox- 
imity values. 

Using the previous example as a guide, the 
salient features of a confirmatory analysis 
should be evident. Given the proximity mea- 
sure q(0,0;) and some conjecture specified in 
terms of a structure function с(0,0;), the ob- 
served correlation between q(0;,0;)and c(050;) 
is compared with a reference distribution gen- 
erated under a hypothesis of randomness. If 
the obtained correlation is at an extreme per- 
centage point, the correspondence between 
9(0,0;) and с(0,0;) is declared significant, 
with the added implication that the conjec- 
ture leading to the construction of c(050;) 
may help explain some of the variation pres- 
ent in the empirical proximity measures? As 


з different functions c(01,0;) are used on the 
same data set, our confirmatory strategy (as well as 
any other) could turn out to be an exploratory 
analysis. In this case, however, the diffüculty of 
multiple significance testing arises. 
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usual, the size of the correlation can be con- 
sidered an index of the degree of correspon- 
dence or confirmation. 

Although the example given above implies 
that a randomness hypothesis should be de- 
fined in terms of selecting a partition of a 
given form at random, a more general hy- 
pothesis that will generate exactly the same 
distribution can also be considered. Explicitly, 
if the values assigned by the proximity func- 
tion are organized, as before, into an п X n 
square matrix Q and, similarly, the values 
of the structure function into a second n X м 
square matrix, C, both with rows and columns 
labeled as 01,02, . . . , ду, then each reordering 
of the rows and simultaneously of the cor- 
responding columns of Q in relation to the 
fixed C matrix will induce a specific partition 
of the n objects 01, . . . , On. In other words, for 
our C matrix of Table 1, any reordering of Q 
produces a partition defined by subsets con- 
taining two, eight, and seven objects. The 
first two rows and columns of the reordered 
Q matrix define the objects in the class of 
size two, the next eight rows and columns 
define a class of eight objects, and the remain- 
ing seven rows and columns define the last 
object class of size seven. Morover, if a re- 
ordering of Q is chosen at random, that is, 
if all n! possible reorderings are considered 
equally likely, then this assumption induces 
a random selection of a partition of the 
same general form used in the original con- 
struction of the C matrix. In short, the ran- 
dom reordering of Q and the random selection 
of a partition will generate exactly the same 
distribution of correlations, and thus, either 
concept can be used in producing an approxi- 
mate reference table through Monte Carlo 
simulation. This generalization will prove im- 
portant when a confirmatory approach is 
necessary but cannot be identified by a spe- 
cific partitioning of an object set. 

Although we suggest carrying out a con- 
firmatory test through the use of an approxi- 
mate distribution obtained through Monte 
Carlo simulation, it is also possible to find 
the exact mean and variance of the complete 
reference distribution by formulas, given only 
the matrices C and Q. Specifically, the mean 
of the Pearson correlation r is zero, and its 
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variance is equal to 
1 
V(r) = 0/01 |; ar а — 2)GH; 


xX Е — Gs) (H, — Н.) 


(261 — С.) (2H, — H») || 
De» ] 


where 


G = У Ра [9(0:,0;) — 41); 


ix 
Сб = У У [0(0ь0)) — 41; 
id 
Н; = Y x [e(oi,0;) — €]; 
ixj 
Н, = У У [c(o50)) — €T; 
54 
2 
PX IT rr 4(050); 
2 
t= cdi ZE с(0:,0;). 


As an example of how the variance calcula- 
tion can be used for the data of Figure 1 an 
the structure function of Table 1, we find 
[V(r)]# = .0879. Converting to a Z score 
for the observed correlation of .641, a value 
of 7.28 is obtained, which would indicate 4 
rather significant result if it were possible (0 
assume even a crude normality (see Mantel; 
1967, for the appropriate moment deriva 
tions) .* 

Although the Pearson product-moment co! 
relation is used as a measure of correspon 
dence between c(0;0;) and q(0;0;), it is 1 
legitimate to test the index using the usu 


+ Unfortunately, since the few normal convergen 
theorems that are available are also very specialize 
little general information is available on the al 
quacy of a normal approximation. Several Mon! 
Carlo studies, however, suggest that normality m 
provide an adequate approximation in some appli 
tions (e.g, Schultz & Hubert, 1976). Thus, un 
more complete information is available, it may 
more appropriate to rely on random sampling fr 
the complete permutation distribution or to use 
exact moments to obtain a conservative significan 
level, as discussed in Hubert and Levin (1976a). 


formulas presented in most elementary tests. 
k In particular, the permutation procedure dis- 
cussed above preserves the internal linkages 
among the function values с(0,0;) and 
q(0,0j), whereas an application of even the 
usual permutation test on a correlation, as 
discussed in Bradley (1968), does not. In the 
latter case, all (2)! reorderings of one set of 
function values would be considered equally 
Yikely and would be compared with some 
fixed ordering of the second set of (3) func- 
la tion values. Thus, dependencies among the 
function values would be destroyed and a 
different variance of 1/[(2) — 1] for r would 
result. These same comments apply to the 
use of the well-known parametric hypothesis 
test of no correlation based on the ¢ distribu- 
tion, 


Example 2: Multidimensional 
Scaling Applications 


Geometric configurations that are generated 
as a result of a multidimensional scaling (Car- 
roll & Chang, 1970; Kruskal, 1964a, 1964b) 
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represent another context in which the con- 
firmatory paradigm could be used to test a 
priori conjectures. To give an illustration, 
consider the application of Carroll and 
Chang's individual differences scaling pro- 
cedure to data collected by Wish, Deutsch, 
and Biener (1970). The objects of study for 
this analysis were 12 nations, and each of 18 
subjects rated the proximity of all pairs of 
nations on a 9-point scale (large numbers in- 
dicated a greater degree of similarity). The 
resulting 18 proximity matrices, all of size 
12 Х 12, can be analyzed by the Carroll- 
Chang procedure. The group result selected 
for our discussion is a two-dimensional con- 
figuration, shown in Figure 3, in which each 
nation is represented by a point and in which 
the interpoint distances reflect the degree of 
similarity between the corresponding nations 
as judged by the group; for example, the dis- 
tance between the United States and China is 
large, since they are perceived on the average 
as being very dissimilar. (It should be noted 
at the outset that two separate two-way analy- 
ses are performed below, and the interlocking 
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| Figure 3. Two-dimensional configuration of 12 nations. 
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weights between what are called the group and 
subject spaces are not considered.) 

Instead of attempting to label and interpret 
dimensions per se, suppose the researcher 
wishes to test the a priori hypothesis that an 
outside variable, such as political alignment, 
may account in part for the distances between 
nations. In other words, the researcher is in- 
terested in confirming the conjecture that na- 
tions close together subscribe to similar politi- 
cal philosophies and, conversely, that those 
far apart have different political systems. In 
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alignment. For instance, if political alignment 
were simply dichotomized as communist ver- 
sus noncommunist, then c(0;,0;) might be 
defined as zero if о; and 0; were both com. 
munist or both noncommunist and as one 
otherwise. With this notation, a large posi- 
tive correlation between q(0;0;) and c(00j) 
would indicate that nations of similar politi- 
cal persuasion were located close together in 
Figure 3. As it turns out, the observed cor 
relation between the interpoint distances in 
Figure 3 and the dichotomous variable of; 


this case the proximity function q(0;,0;) would political alignment is .50, which is significant 
merely refer to the distance between nations at, say, the .001 level (approximately) ) when 
o; and o; in Figure 3. ава to the distribution of correlations for 
The structure function c(0,0;) would be 1,000 random reorderings of matrix О. In 
obtained from the outside variable of political short, there is statistical support for the hy- 
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Figure 4. Two-dimensional configuration of 18 subjects. (Н = hawk; M = moderate; D = dove.) 
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pothesis that the information provided by 
4 the variable of political alignment (or some 
dosely related variable) is reflected in the 
"arrangement of points. The question regard- 
| ing other possible competing hypotheses, how- 
ever, is unanswered. 
| In addition to the geometric configuration 
of nations given in Figure 3, the Carroll- 
| Chang procedure also produces a configuration 
Yof the particular subjects that supplied the 
similarity data, as shown in Figure 4. The 
4 horizontal and vertical axes of Figure 4 are 
exactly the same as those of Figure 3, repre- 
senting economic development and political 
alignment, respectively. Numerical coordinates 
(пуно) again locate a subject о; in Figure 
3 and, furthermore, indicate how much em- 
phasis subject o, gives to political alignment 
and economic development when rating the 
A similarities of nations. Thus, subject 10 gives 
_ Primary emphasis to the economic develop- 
ment dimension, subject 11 gives primary 
emphasis to political alignment, and subjects 
in the center of the configuration weight both 
dimensions about equally. 
As indicated in Figure 4, Wish et al. fur- 
ther classified each subject either as a hawk 
(Н), a moderate (M), or a dove (D) accord- 
ing to the person's stance on the Vietnam 
War and descriptively argued that subjects in 
the same class tend to weight the two dimen- 
Sions similarly. Since it is hypothesized that 
hawks, Moderates, and doves will form rea- 
Sonably homogeneous clusters in Figure 4, the 
confirmatory paradigm can provide a statisti- 
„cal test for the conjecture that subjects 
EU dimensions differentially according to 
RE dx opinions. Again, the proximity 
incen is defined as the Euclidean distance 
the E ee o; and o; in Figure 4 (i.e, in 
plicity js Space), and for the sake of sim- 
Eo r e structure function is defined as 
es % and о; belong to the same class 
lise ^ ese or dove) and as one other- 
Васко ^R positive correlation between the 
the со Values q(2,0) and c(040,) supports 
Bt cone that hawks, moderates, and 
E 4: to form separate clusters in Fig- 
Which | 11с6 the observed correlation is .19, 
eim = significant at an approximate .009 
|^ the hypothesis is given statistical sup- 
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port. Wish et al. noted specifically that hawks 
tend to cluster above the diagonal in Figure 4 
and give relatively more emphasis to the 
political alignment factor, whereas moderates 
and doves cluster below the diagonal and give 
relatively more weight to economic develop- 
ment.* 


Discussion 


As should be evident in the examples given 
above, the confirmatory approach developed 
in this article has a number of applications 
that are related to the use and development 
of geometric models, either those that occur 
naturally or those derived from some inter- 
mediate data reduction process. In addition 
to the illustrations provided, a number of 
other correspondences to the methodological 
literature of the behavioral sciences could be 
developed that the reader may be interested 
in pursuing further; see, for example, Schultz 
and Hubert (1976), Hubert and Baker 
(1977), Hubert and Levin (1976a, 1976b), 
Hubert and Schultz (1976), Hubert (1978), 
Carroll and Chang (Note 1), Althauser, Bur- 
dick, and Winsborough (1966), Campbell, 
Kruskal, and Wallace (1966), Cliff & Ord 
(1973), Geary (1954), Mielke, Berry, and 
Johnson (1976), Royaltey, Astrachan, and 
Sokal (1975), and Winsborough, Quarantelli, 
and Yutzy (1963). 


5Since strong algebraic dependencies exist among 
the interpoint distances, it is not legitimate to per- 
form a simple one-way analysis of variance on the 
Euclidean distances. 
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Nonparametric Large-Sample Pairwise Comparisons 


Kenneth J. Levy 
State University of New York at Buffalo 


Tukey's procedure for making pairwise comparisons among means is discussed 
within the context of three nonparametric models. Examples are presented in 
which Tukey's procedure, in accord with Hartley's results, is employed to make 
comparisons associated with a Kruskal-Wallis one-way analysis of variance test 
for ranked data, Friedman's two-way analysis of variance test for ranked data, 
and Cochran's test of change for dichotomous data. 


Marascuilo and McSweeney (1967) dis- 
cussed methods for testing post hoc hypotheses 
concerning trends associated with the Kruskal- 
Wallis (1952) one-way analysis of variance 
(ANovA) test for ranked data, the Friedman 
(1937) two-way АМОМА test for ranked data, 
and the Cochran (1950) test of change for 
dichotomous data. 

Within the framework of these three non- 
parametric tests, an investigator might be 
primarily interested in testing hypotheses 
associated with the set of all possible pairwise 
comparisons that arise from the А treatments 
associated with each of these three non- 
„Parametric test procedures. The purpose of 
the present article is to illustrate the applica- 
tion of a Tukey-type procedure for controlling 
the joint significance level associated with 
Such comparisons. Marascuilo and Mc- 
Sweeney’s general multiple comparison ap- 
Proach could be employed in the present 
Context if an investigator were only interested 
In making pairwise comparisons; however, the 
Present procedure produces more powerful 
tests because the set of pairwise comparisons 
5 only a subset of the set of all possible 
Comparisons for which Marascuilo and Mc- 
Sweeney’s approach is most applicable. 


The Kruskal- Wallis Test 


nul Marascuilo and McSweeney 
1967), consider & independent samples, each 


L Requests for reprints should be sent to Kenneth J. 
vy, Department of Psychology, State University of 


1 RE 4230 Ridge Lea Road, Buffalo, New York 


of size n, drawn from & continuous probability 
distributions. Let N = nk and let the original 
observations be replaced by ranks (1, 2, ..., N) 
in accord with the Kruskal-Wallis test. Let 
Ку Rs, ..., Ry be the rank sums of each of the 
samples, and let Ai, Ёз, ..., Rx be the respec- 
tive average ranks. 

From Marascuilo and McSweeney, when 
E(R) = Е(Ё) = --- = Е(Ё), where Е() 
is the expected value operator, it should be 
noted that if n is sufficiently large, the К, will 
be approximately multivariate normal with 


5 N+1 
Еф) - S ED 
B N + 1)(N — n) 
var (B) = op = MED) 
and 
Б B —(N+1 
cov (Rj, Ву) = ejr = awe) 


agt as] 

== gy [y wo n) " 
Consider the following result from Hartley 
(1950). If у = (у ...,9x) is a multi- 
variate normal vector with zero means and 
dispersion c*B (where B is a square matrix 
whose diagonal elements equal a and whose 
off-diagonal elements equal 0 for some a and b), 
then the range of the уз is distributed as the 
range of k independent and identically dis- 
tributed normal variates with zero means and 
variances o?(a — b). In the present context, 
in accord with Hartley's result, the range of 
the Ба is distributed as the range of k in- 
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dependent and identically distributed normal 
variates with zero means and variances 
(а — b), where c?(a — b) = N(N + 1)/12n. 
So 
P{(Rmax — Виза) 

< qo EN (N + 1)/12n]}}} = 1 — о, 


where Rmax — Rmin is the range of the Rjs and 
(аА, is the upper æ point of the studentized 
range with infinite degrees of freedom. From 
this probability statement follows the usual 
Tukey result that the probability is 1 — o 
that all of the (1/2)&(k — 1) pairwise differ- 
ences [E(R;) — E(R;) simultaneously satisfy 


(Ry — Ry) — обиља < LE(G) — E(R;)] 
€(R; — Ry) + одак 


where о = [N (N + 1)/12n]}. 

For data involving r sets of tied observations, 
let ¢, be the number of tied observations in set s. 
Kruskal and Wallis (1952) suggested that for 
any group of /, tied observations, the tied 
ranks could be replaced by their mean. From 
Dunn (1964, p. 249), when E(Ei) = E(R;) 
= +++ = E(R,), it can be demonstrated that 
if n is sufficiently large, the 10; will be approxi- 
mately multivariate normal with 


z(t) = +=, 


var (Rj) = [ SES 


з=1 


~ 12(N — 1) 
C) 
S nN fJ] 
п 


соу (R;, Ry) = (= ;) var (R;). 


X (2-1) 
] 


and 


Thus, in accord with Hartley's result, when 
ties occur in the data, the range of the Rs is 
distributed as the range of k independent and 
identically distributed normal variates with 
zero means and variances: 


N e 
ж)а = Е 
(@*) (s = ;) var (R;). 
Therefore, for tied data, it follows that the 
probability is 1 — а that all possible pairwise 
comparisons ГЕ(Ё;) — E(Ry)] simultaneously 
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satisfy 


(R; — Ry) — оь < [E(R;) = E(R;)] 
€ (Rj — Ry) + o". 


where 
ieee”) 
Eanes 12n 12n(N — 1) |" 


A Kruskal-Wallis-Type Example 


Consider the example discussed in Mara- 
scuilo and McSweeney (1967, p. 404). This 
example may also be found in Hays (1973, 
p. 684). The hypothetical example involves 
noise intensity as a treatment variable with 
six levels. The dependent variable is a subject's 
score obtained in a complex performance task 
under one of the noise intensity levels. The 
data for this example appear in Table 1. 

An investigator wishes to test the 15 
different hypotheses associated with pairwise 
comparisons of the form [E(R;) — E(R;)] 
= 0. Since there аге r = 19 sets of tied 


observations, w* should be calculated. For | 


these data, w* = 5.52. If the joint significance 
level of the 15 tests is to be controlled at 


а = .05, one also needs to obtain the value of | 


Q.0,0,». This value may be found in many 
statistical texts; and две = 4.03. Those 
comparisons for which 
|B; — Ry| > обаве 

should be declared significant. In the present 
example, w*qa,k,0 = 22.25, and 

\R:—R,|=17.75 |R:—R,|= 7.05 |R,—R.|-33.00* 
|, —£.|-24.80* |R;—R.|- 8.70 |R,— Rs | - 43.00* 
| Бај = 9.05 |2,—0,|=25.95* |R.—R;|-17.25 
| —R,|- 820 |R.—R,|=35.95* |R,—Rs| - 27.25* 
|2,-#,|=18.20 |B.—2.|-15.75 |Rs—Ro| =10.00. 
Thus, 6 out of the 15 comparisons should be 


declared significant; these 6 significant com- 
parisons are designated with asterisks. 


Friedman's Test 


Following Marascuilo and McSweeney 
(1967), consider n individuals or matched 
groups observed in a repeated measures design 
in which each subject or group is tested under 


4 


Table 1 
Data for the Kruskal- Wallis- Туре Example 


NONPARAMETRIC LARGE-SAMPLE PAIRWISE COMPARISONS 
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Noise intensity level 


1 2 3 4 5 6 
18 (10.5) 34 (40.5) 39 (49.5) 37 (44.5) 15 (8) 14 (7) 
24 (20.5) 36 (43) 41 (51) 32 (34.5) 18 (10.5) 19 (12.5) 
20 (14.5) 39 (49.5) 35 (42) 25 (22.5) 27 (25.5) 5 (1) 
26 (24) 43 (54.5) 48 (58.5) 28 (28) 22 (17) 25 (22.5) 

^ 23 (18.5) 48 (58.5) 44 (56) 29 (30.5) 28 (28) 7 (2) 
29 (30.5) 28 (28) 38 (47) 31 (33) - 24 (20.5) 13 (5.5) 
27 (25.5) 30 (32) 42 (52.5) 34 (40.5) 21 (16) 10 (3) 
33 (37.5) 33 (37.5) 47 (57) 38 (47) 19 (12.5) 16 (9) 
32 (34.5) 37 (44.5) 53 (60) 43 (54.5) 13 (5.5) 20 (14.5) 
38 (47) 42 (52.5) 33 (37.5) 23 (18.5) 33 (37.5) 11 (4) 


k conditions. Let the original observations for 
each subject be replaced by ranks (1, 2, ..., Ё) 
in accord with Friedman’s two-way ANOVA 
procedure. Let Ri, Rs, ..., Е, be the rank sums 
of each of the conditions, and let Ri, Rs, . .., Re 
be the respective average ranks. 

From Marascuilo and McSweeney, when 
Е(Ку) = E(R,) = --» = E(R,), it should be 
noted that if э is sufficiently large, the Ẹ; will 

be approximately multivariate normal with 


ве) = EHD, 
"TESTES 
12n 
and 
cov (È, nos ie (2+1) 
1 12n 


From Hartley's (1950) result follows the usual 
d €y result that the probability is 1 — а 
ES all of the pairwise differences [E(R;) 
(5,7) ] simultaneously satisfy 

О-В) — ais < ГЕФ) — E] 

< (R; — By) + oda ko 
358 o = [k(k + 1)/12n7). 
in wh data involving r sets of tied observations 
1 Which tied ranks are placed by their mean, 


[D = 
$3 be demonstrated that when E(fi) 
A5) = ... = E(R), the №, wil be 


Note. Numbers in parentheses are the ranks associated with the original dependent variable scores. 


approximately multivariate normal if n is 
sufficiently large, with 


Е(К) = ee) 
var (R;) = E ) 
p cUm A 
12 Ua 1) | 
and 


соу (R5, Ву) = @-) var (Rj). 


Thus, for tied data, in accord with Hartley’s 
result, it follows that the probability is 1 — a 
that all possible pairwise comparisons [E(R;) 
— E(R;:)] simultaneously satisfy 


(B; — Ry) — аак < LER) — Е(Ё;)] 
< (Bj — Ry) + об ако 


where 
2053 2^" 
o> | ian :2896—20] 


A Friedman-Type Example 


Consider an example discussed in Hays 
(1973, p. 785). This hypothetical example 
involves a treatment variable with four levels. 
The treatments are applied to 11 groups of 
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Table 2 
Data for the Friedman-Type Example 


Treatment level 


Group 1 2 3 4 
1 1 (2) 4 (3) 8 (4) 0 (1) 
2 2 (2) 3 (3) 13 (4) 1 (1) 
3 10 (3) 0 (1) 11 (4) 3 (2) 
4 12 (3) 11 (2) 13 (4) 10 (1) 
5 1 (2) 3 (3) 10 (4) 0 (1) 
6 10 (3) 3 (1) 11 (4) 9 (2) 
7 4 (1) 12 (4) 10 (2) 11 (3) 
8 10 (4) 4 (2) 5 (3) 3 (1) 
9 10 (4) 4 (2) 9 (3) 3 (1) 
10 14 (4) 4 (2) 7 (3) 2 (1) 
11 3 (2) 2 (1) 4 (3) 13 (4) 


Note. Numbers in parentheses are the ranks as- 
sociated with the original dependent variable scores. 


four matched subjects. The data for this 
example appear in Table 2. 

An investigator wishes to test the six 
different hypotheses associated with pairwise 
comparisons of the form [E(R;) — E(R;)] 
— 0. Since there are no ties within any rows 
of the original data, w should be calculated. 
For these data, е = .39. If the joint significance 
level of the six tests is to be controlled at 
о = (05, one needs to obtain the value of 
4.05,4,0; this value is 3.63. Those comparisons 
for which E Ў 

|B; — Ry| > одак 


should be declared significant. In the present 
example, одао = 1.42, and 


|B — |] = 55 | — | = 127 
|6, —R|- 72 |R;—Ru| = 54 
|Ri — Ва] = 1.09 |R; — В4| = 1.81*. 


Thus, only one out of the six comparisons 
should be declared significant ; this comparison 
is designated with an asterisk. 


Cochran's Test 


Following Marascuilo and McSweeney 
(1967), consider individuals observed in a 
repeated measures design in which each subject 
is tested under & conditions. In this case, the 
observations are dichotomous, for instance, 
success and failure. Let S; be the number of 
successes for the ith subject. Let Та, T», ..., Tx 
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be the sums for each condition, and let Ty 
Ts, ..., Tr be the respective averages. 

From Marascuilo and McSweeney, when 
E(T;) = Е(Т.) = --- = E(x), it should be 
noted that if n is sufficiently large, the 7’; will 
be approximately multivariate normal with 


2» S; 
А ici 
Е 
~ TuS: S 
var (Tj) = ар = p r£ = 3! 


and 
DS -1 , 
соу (Tj, Ту) = оу = y — 7^ 


From НагЏеу (1950) result follows the usual 
Tukey result that the probability is 1 — а 
that all of the pairwise differences [E(T) 
— E(T ;)] simultaneously satisfy 
(T, — Ty) — одаље < [Е(Т) — ECT] 

< (T; — Ty) + одак 


where 


СЕЗЕ 


ТаЫе 3 
Data for the Cochran-Type Example 


Problem 


= 

~ 

[^] 
> 


Subject 


\ 00-10 Ci pO о 


= 
oOocoo-o-oooc-oo-coocooc 
oO-oc-o-o-ooooooocceoc- 
Re ROR OCOCORROOROROREHEE 
"oooccoo--ooo-cooo-o-o 


[ 


| 
| 
| 
| 


NONPARAMETRIC LARGE-SAMPLE PAIRWISE COMPARISONS 


A Cochran-Type Example 


Consider an example discussed in Hays 
(1973, p. 774). This hypothetical example 
involves 20 randomly selected subjects who 
were each given four problems in a random 
order. A 1 was recorded for a successful 
solution and a 0 for a failure. The data for this 
example appear in Table 3. 

An investigator wishes to test the six 
different hypotheses associated with pairwise 
comparisons of the form [E(T;) — E(T;)] 


^ = 0. For these data, w = .11. If the joint 


significance level of the six tests is to be 
controlled at а = .05, one needs to obtain 
the value of 0.05,4,; this value is 3.63. Those 
comparisons for which 


I7 — y| > ода 


‘should be declared significant. In the present 


example, оу..к, = .40, and 


|f. 7,| = 05 |T,— 7, = .20 
|f.— 4-25 |7, — Та = .05 
|J. 7,1 = 00 |f, — Та = 25. 


Thus, none of the six comparisons should be 
declared significant. 


A Comment Concerning Sample Sizes 


The analytic results presented in the present 
article obtain for large values of n. A further 
Point that should be addressed concerns the 
question of what constitutes a large sample 
Size for each of the cases considered here. 
Future empirical results would surely be 
helpful in answering such questions ; however, 
there appears to be some information presently 
available that could serve as a useful guide. 
Siegel (1956) pointed out that the distribution 
of the usual Kruskal-Wallis test statistic 
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can be closely approximated by an appropriate 
chi-square distribution when n> 5; Hays 
(1973) pointed out that the distribution of 
the usual Friedman test statistic can be 
closely approximated by an appropriate chi- 
square distribution when z > 10 and k > 4; 
and empirical results reported by Levy and 
Narula (1976) suggest that the distribution 
of the usual Cochran test statistic can be 
closely approximated by an appropriate chi- 
square distribution when n> 18. Although 
further work on these questions is needed, I 
suggest that the preceding statements provide 
useful information concerning the minimal 
values of large sample sizes for each of the 
cases considered in this article. 
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Interobserver Agreement, Reliability, and 
Generalizability of Data Collected in 
Observational Studies 
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University of Washington 


Research in developmental and educational psychology has come to rely less on 
conventional psychometric tests and more on records of behavior made by human 
observers in natural and quasi-natural settings. Three coefficients that purport 
to reflect the quality of data collected in these observational studies are dis- 
cussed: the interobserver agreement percentage, the reliability coefficient, and the 
generalizability coefficient. It is concluded that although high interobserver 
agreement is desirable in observational studies, high agreement alone is not 
sufficient to insure the quality of the data that are collected. Evidence of the 
reliability or generalizability of the data should also be reported. Further ad- 
vantages of generalizability designs are discussed. 


Almost everyone engaged in research recog- 
nizes the need for reliable measuring instru- 
ments. Reliability is a central topic particu- 
larly for courses and textbooks concerned 
with the behavioral sciences. In spite of vary- 
ing theoretical derivations, its definition is 
remarkably uniform: A reliable instrument is 
one with small errors of measurement, one 
that shows stability, consistency, and depend- 
ability of scores for individuals on the trait, 
characteristic, or behavior being assessed. 
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Historically, the study of reliability has 
been linked to the study of individual differ- 
ences and has been largely restricted to stan- 
dardized tests of intelligence, achievement, 
and personality. These tests, however, are in- 
creasingly being replaced in developmental 
and educational psychology research by ob- 
servations of subjects made in natural and 
quasi-natural settings. Although these ob- 
servational studies vary widely in content 
and method, they all use human observers to 
record (and in some cases to summarize and 
abstract) the behavior of the subjects. Sur- 
prisingly, the reliability of these observa- 
tional methods has not received the same at- 
tention as has the reliability of the more tra- 
ditional methods (Johnson & Bolstad, 1973). 

There are at least three different ways to 
think about the reliability of observational 
data. First, the researcher could focus on the 
extent to which two observers, working inde- 
pendently, agree on what behaviors are 0С- 
curring. A coefficient that reflects the extent 
of this agreement has often been used to re- 
port reliability in observational studies. Sec 
ond, the observational measure could ђе 
considered a special case of standardized psy- 
chological test, and the definitions of reliabil- 
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INTEROBSERVER AGREEMENT 


‘ity that come from classic psychometric the- 


ory (e.g, test-retest and alternate forms) 
could be used. Finally, an observational mea- 
sure could be thought to provide data that 
are under the influence of a number of dif- 
ferent aspects of the observation situation 
(eg., different observers or different occa- 
sions), including individual differences among 
subjects. This third viewpoint was developed 
in Cronbach’s (Cronbach, Gleser, Nanda, & 
Rajaratnam, 1972) theory of generalizability. 
The purpose of this article is to examine the 
appropriateness and correct interpretation of 
these three coefficients when they are used 
to reflect the quality and dependability of 
data gathered in observational studies. 


Observer Agreement 


To insure that data collected by human 
observers are objective, researchers typically 
obtain and report coefficients that demonstrate 
that two or more observers watching the same 
behavior at the same time will record the 
same data. These coefficients are offered as 
the reliability of the instrument being used. 


Interobserver Agreement Percentage 


The most common index of the quality of 
the data collected in observational studies is 
E interobserver agreement percentage. In 
its simplest form, this coefficient is just what 
its name implies: the percentage of time units 
during Which the records of two observers are 
In agreement about the record of behavior. 

Two comments can be made about the use 
of observer agreement percentages. First, the 
Majority of studies report data only in this 
fashion. Consider, for example, the field of 
developmental psychology, in which about 
One third of recently published research ar- 
ticles use observational techniques. In Vol- 
мше 47 (1976) of Child Development, 33 
ull-length articles reported observational mea- 
Sures. Of these studies, just under half (49%) 
а only observer agreement figures as 
1 vod of the quality of their data. Simi- 
o Y, of 21 observational studies reported in 

olume 12 (1976) of Developmental Psychol- 


377 


ogy, 5796 reported observer agreement per- 
centages only. 

Second, the amount of variability among 
the subjects in a study has very little im- 
pact, at least in theory, on the size of an 
interobserver agreement percentage. In actual 
practice, however, the degree of variability 
can make quite a difference: In very homoge- 
nous groups, observer agreement percentages 
are necessarily quite high because all scores 
given to all subjects are very close together. 
Thus, a measure that shows high agreement 
may, in some populations at least, have done 
a poor job of differentiating among subjects. 


Other Problems With Observer Agreement 


The interobserver agreement percentage has 
several important shortcomings. It is, first 
of all, insensitive to degrees of agreement, 
that is, it treats agreement as an all-or-none 
phenomenon, with no room for partial or in- 
complete agreement. In this sense, the percent- 
age underestimates the actual extent of agree- 
ment between two observers. Second, some 
agreement between independent observers can 
be expected on the basis of chance alone. For 
many observational studies, especially those 
that use frequency counts of individual be- 
haviors, the extent of this chance agreement 
is dependent on the rates at which the target 
behaviors occur. Behaviors with very high and 
very low frequencies can have extremely high 
chance levels of agreement (Johnson & Bol- 
stad, 1973). In this sense, the percentage ap- 
pears to overestimate the real agreement. 

These difficulties in the use of observer 
agreement percentages are not unknown, and 
considerable effort has been expended to de- 
velop indices that overcome them. The al- 
ternative coefficients have been reviewed in 
detail elsewhere (Tinsley & Weiss, 1975). In 
general, they have been designed to overcome 
the mathematical shortcomings of the agree- 
ment percentage, such as chance levels of 
agreement. 

A serious question remains, however, about 
the utility of these alternatives. In spite of 
their mathematical superiority, they deal with 
only one source of error (observer disagree- 
ment), and they deal with it without regard 


378 


to the magnitude of the individual differences. 
These alternatives may give a more accurate 
picture of the level of observer agreement 
in a study than does the simple percentage, 
but they do not otherwise describe the stabil- 
ity, consistency, and dependability of the data 
that have been collected. 


Influences on Observer Agreement 


Interobserver agreement has been experi- 
mentally studied as a phenomenon in its own 
right. In these experiments, observer ac- 
curacy, that is, agreement with a predeter- 
mined correct behavioral record, is the de- 
pendent variable. 

Reid (1970) compared the accuracy of ob- 
servers during overt and covert assessment of 
their reliability and found that they were sig- 
nificantly more accurate when they were 
aware that they were being checked. Roman- 
czyk, Kent, Diament, and O'Leary (1973) 
found a similar drop from the overt to the 
covert assessment situation. They also found 
that observers recorded behavior differently 
depending on which of several researchers" 
records they thought their own records would 
be compared with. Taplin and Reid (1973) 
found that observer accuracy decreased be- 
tween the end of a training period and the 
beginning of data collection and that it in- 
creased on days when "spot checks" were ex- 
pected. Mash and McElwee (1974) reported 
that observers who had been trained to code 

"predictable" behavior (ће. conversations 
with redundant information) showed a de- 
cline in accuracy when they later coded 
"unpredictable" behavior, whereas observers 
trained with unpredictable sequences showed 
no such decline in accuracy. Taken together, 
these studies imply that differences in experi- 
ence, mental set, and training of observers 
can influence the accuracy with which behav- 
ioral records are made and scored. 

Interobserver agreement has also been used 
as the dependent variable in studies that com- 
pare different methods of observation. Mc- 
Dowell (1973) found comparable observer 
agreement in time sampling and continuous 
recording of infant caretaking activities in an 
institution. Lytton (1973) found the inter- 
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observer agreement of ratings, home observa. 
tions, and laboratory experiments to vary 
only slightly, but the amounts of time and 
effort necessary to achieve these levels were 
quite different. Mash and McElwee's (1974) 
rating system with only four categories was 
used more accurately than was an eight-cate- 
gory form. It appears, then, that there are 
differences in the interobserver agreement 
that can be expected from different methods 
of behavioral observation. These differences 
are reflected both in different levels of agree- 
ment for equal amounts of training and in- 
equal levels of agreement for different amounts 
of training. 

It is difficult, however, to put these agree- 
ment diiierences into perspective without 
knowledge of the overall variability among 
subjects in the studies. If between-subjects 
variability is low, then the reported differ- 
ences in agreement due to observer instruc- 
tions or observational methods may consid- 
erably influence the outcome of the study. If 
between-subjects variability is high, on the 
other hand, then these differences will prob- 
ably have little influence. The studies that 
used agreement as a dependent variable have 
identified some problem areas in data collec- 
tion, but they have not evaluated the rela- 
tive importance of these problems. 


Psychometric Theory of Reliability 


The classical method of determining the 
reliability of a test is for the researcher to 
obtain two scores for a group of subjects on 
the test. These two scores may come from 
two separate scorings of the instrument, from 
administration of two parts or forms of the 
instrument to the subjects, or from two ad- 
ministrations of the same instrument to the 
subjects. The correlation between the two 
scores is the reliability of the instrument. 

The central theoretical concept that under- 
lies this psychometric view of reliability 4s 
that every test score is composed of two parts: 
a true score, which reflects the presence oF 
extent of some trait, characteristic, or be- 
havior, plus an error score, which is random 
and independent of the true score (Nunnally, | 
1967). The proportion of variance accounted 
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* for by each of these parts is estimated from 
the correlation between the two scores ob- 
tained on the instrument. 

The variance attributable to individual dif- 
ferences is usually given the same interpreta- 
tion, regardless of how the two scores used to 
compute it were obtained. It reflects stable 
differences among individuals—the true score 
part of the data. The variance that is at- 

* tributable to measurement error, however, is 
subject to varying interpretations, depending 


А on how the two scores were obtained. 


The error always includes, of course, the 
real error—those random fluctuations of the 
myriad factors that may affect the behavior 
being measured. These include such variables 
as the health or mental state of the subjects, 
the lighting or temperature in the testing 

. room, and so forth. But the error also includes 


Á other sources of variation, depending on the 


method used to obtain the two scores. These 
other sources include differences within and 
between scorers, differences between different 
sections or forms of a test, and changes in 
subjects’ behavior between two administra- 
tions of a test. 
Thus, the three most common procedures 
i for determining reliability involve (a) obtain- 
Ing two separate scorings of the same instru- 
Ment (intrascorer or interscorer reliability), 
(b) obtaining scores on two parts of the same 
Instrument or on two very similar instruments 
(split-half or alternate-forms reliability), and 
(c) obtaining two scores from two separate 
administrations of the same instrument (test— 


LES retest reliability). 


The researcher who wishes to use one of 
these classical methods of determining relia- 
bility in an observational study must some- 
how make his or her observations fit into the 
Same general pattern as psychological tests. 
However, instead of one test with many items, 
all intended to measure the same trait, charac- 
teristic, or behavior, the observational re- 
searcher has a tool with a relatively small 
number of categories, with each category in- 
ы to measure a different trait, charac- 
eristic, or behavior. For each of these cate- 
8ories, which are generally mutually exclusive, 
data are usually collected during many dis- 
tinct time units. 
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The most satisfactory way of making such 
data fit into the classic pattern seems to be 
to consider each mutually exclusive category 
(or type of behavior) a separate test with its 
own reliability. Each time unit is considered 
to be an item, since all time units are intended 
to measure the same trait, behavior, or charac- 
teristic. For example, a behavioral code 
might record a child’s proximity to the teacher 
during each 10-sec unit of an observation. 
Each 10-sec unit would be an item in a test 
that measured proximity. If the measure con- 
sisted of a single summary score (such as in 
a rating), then there would be, in effect, no 
individual items at all, just one score. 

This analogy between test reliability and 
the reliability of observational data can be 
extended to apply to each of the traditional 
ways of obtaining scores: intrascorer or inter- 
scorer reliability, split-half or alternate-forms 
reliability, and test-retest reliability. 


Intrascorer or Interscorer Reliability 


A clinical psychologist interested in self- 
directed aggression might listen twice to tape 
recordings of patients’ responses to a projec- 
tive test, each time counting the number of 
self-destructive statements. The correlation 
between the two counts for the group of pa- 
tients would be the intrascorer reliability of 
the self-destructiveness score. The true score 
implied by this correlation would reflect real 
differences in self-destructiveness among the 
patients. The error would include not only the 
random error but also any inconsistencies in 
the psychologist’s use of the self-destructive- 
ness scale. 

In actual practice, it is more likely that 
two psychologists would listen to the tape 
recordings. The correlation between their sepa- 
rate counts would be an interscorer reliability 
coefficient. The true score would again reflect 
real differences, but the error would reflect 
differences between the psychologists in their 
use of the scale, along with random error. 

A similar situation exists, of course, when 
two or more observers record the behavior of 
subjects in other natural and quasi-natural 
settings. The correlation between the scores of 
two observers who kept track of how much 
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individual attention each child received from 
the teacher would be an interobserver relia- 
bility coefficient. This coefficient, once again, 
should not be confused with the observer 
agreement percentage. 


Split-Half or Alternate-Forms Reliability 


One way of determining the reliability of a 
standardized test is to compare scores on two 
subdivisions of the test (odd- and even-num- 
bered items, frequently) or scores on two 
very similar versions of the test. In an ob- 
servational study, the corresponding compari- 
sons would be between subdivisions of one 
observation (e.g, odd- and even-numbered 
minutes during a tennis lesson), or between 
two very similar observations (first and sec- 
ond halves of a lesson, perhaps). This is an 
example of how time units can be considered 
analogous to test items. 

Just as with interobserver reliability, the 
true score component of the variance in split- 
half or alternate-forms reliability reflects con- 
sistent individual differences among subjects. 
The error component, however, has a differ- 
ent interpretation. Along with random fluctua- 
tions in the behavior of the subjects, real dif- 
ferences in subject behavior between the two 
observed subdivisions are included as part of 
the error. 


Test-Retest Reliability 


Perhaps the most straightforward way to 
obtain two scores in a reliability study is to 
administer the same instrument at two differ- 
ent times. An observer might, for example, 
visit classrooms on different days to record 
the teacher's use of a particular instructional 
technique. As before, the true score is assumed 
to reflect some stable trait, characteristic, or 
behavior. In this case, the error includes not 
only random fluctuations of subject behavior 
but also whatever real changes in subject be- 
havior have occurred between the two admin- 
istrations of the test. | 

ТЕ is interesting to note that there is little 
difference between alternate-forms and test- 
retest reliability for observational measures. 
Since time units serve as items, observations 
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made on different days can be considered 


either as alternate-forms or as test-retest ' 


conditions, depending on the situation. 


Use of Reliability Coe ficients 


Three comments apply to all of these ver- 
sions of the reliability coefficient. First, al- 
though the examples given are from hypo- 
thetical observational studies, real observa- 
tional studies do not make use of all of the 
possible coefficients. Interobserver reliability 
or agreement is reported to the virtual exclu- 
sion of split-half and test-retest coefficients. 
Once again, developmental psychology can 
serve as an example. In Volume 47 (1976) of 
Child Development, 49% of the full-length 
research articles that used observational meth- 
ods reported only observer agreement, and 
39% more reported interobserver correlation 
coefficients. Only three of the studies (12%) 
reported a reliability coefficient that reflected 
the stability of subject behavior over time, 
that is, a split-half or test-retest reliability. 
Similarly, in Volume 12 (1976) of Develop- 
mental Psychology, 57% of the studies re- 
ported agreement only, 38% reported observer 
reliability coefficients, and only one study 
used a measure based on more than one sam- 
ple of behavior per subject. 

Second, although this discussion has em- 
phasized the sources of error in these coef- 
ficients, the variance of true scores is as im- 
portant in determining the size of the reliabil- 
ity coefficient as the variance of error scores is. 
Recall that the reliability of a test score can 
be expressed 


true score variance 
true score variance + error variance 


For a given level of error variance, then, an 
instrument will have a lower reliability when 
it is used on a homogenous group of subjects 
(low true score variance) than it will when 
it is used on a more heterogenous group (high 
true score variance). For instance, if the error 
variance is 10 and the true score variance 15 
also 10, the reliability of the instrument is 
10/20 or .50. But if the error variance is 10 
and the true score variance is 40, the reliabil- 
ity of the instrument is 40/50 or .80. This is 
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in contrast to the observer agreement per- 
“centage, which is highest for homogenous 
groups of subjects. 

Third, it should be repeated that reliability 
and observer agreement are not the same. It 
is possible, as illustrated by Tinsley and Weiss 
(1975), to have high interobserver agreement 
and a low reliability (correlation) coefficient, 
and vice versa. For instance, two observers 
might have perfect agreement about the color 
of shoes worn by children in a classroom, 
but if all the children wore red shoes, shoe 
* color would not differentiate among the chil- 
dren. On the other hand, there might be a 
high correlation between two observers’ rec- 
ords of the duration of a teacher's attention 
to a particular youngster, but if one observer's 
watch ran slower than the other's watch, they 
would probably never agree on the actual 
* duration of the attention. 

The differences between agreement and re- 
liability are based on the way the two indices 
are defined. Reliability coefficients partition 
the variance of a set of scores into a true 
Score (individual differences) and an error 
component. The error component may include 
Tandom fluctuations in the behavior of sub- 
Jects, inconsistencies in the use of the scale, 
differences among observers, and so forth. In- 
terobserver agreement percentages, on the 
other hand, carry no information at all about 
individual differences among subjects and con- 
tain information about only one of the pos- 
sible sources of error—differences among ob- 
Servers. In other words, a reliability coefficient 
reflects the relative magnitude of all error 
With respect to true score variability, whereas 
an agreement percentage reflects the absolute 
Magnitude of just one kind of error. 

All in all, there is no perfect reliability co- 
efficient, nor is there one that is even gen- 
E best. Coefficients that use two scorings 
ud same instrument (interobserver and 

Ee reliability) confound random 
wr error with differences within and be- 
ihr i Hag Coefficients that use scores 
Paine ав or alternative forms of the 
ены S (split-half and alternate-forms 
m Er, confound random subject error 
= 1 erences between the subdivisions or 

S. Finally, coefficients that use scores 
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from the same instrument administered on 
two occasions (test-retest reliability) con- 
found measurement errors with real changes 
in subject behavior that occur between the 
two administrations. The methods described 
cannot, then, separately estimate variance in 
test scores attributable to scorers, subtests 
(or forms), or occasions, nor can they con- 
sider these sources of error simultaneously. A 
more inclusive, multivariate theory is needed. 


Generalizability Theory 


Cronbach and his associates (Cronbach et 
al., 1972) have developed a theory that they 
call the theory of generalizability. (For a 
brief introduction to the theory, see J. P. 
Campbell, 1976, pp. 185-222.) Instead of 
assuming, as does classical test theory, that 
individual differences constitute the only law- 
ful source of variation in test scores, general- 
izability theory assumes that there may be a 
number of sources of variation. These sources 
of variation other than individual differences 
are called facets. Different scorers, alternate 
test forms, or administration on different oc- 
casions are examples of facets that might be 
studied. A particular combination of facets 
makes up the universe to which test scores 
may be generalized. 

A generalizability study (G study) is more 
reminiscent of a factorial study in experi- 
mental psychology than of a reliability study. 
In a G study, the researcher must collect 
data by systematically sampling conditions 
from each facet in the universe. For instance, 
two scorers might each score two alternate 
forms of a test given on different days to a 
group of subjects. Using an analysis of vari- 
ance, it is then possible to independently esti- 
mate the contributions of each of the facets— 
scorers, forms, occasions, as well as subjects— 
to the overall variation in the set of test scores. 
Besides looking at the conventional F sta- 
tistic to establish whether each facet makes 
a significant contribution to the scores, it is 
possible to compute what Cronbach calls 
variance components. These variance com- 
ponents reflect the size rather than the sta- 
tistical significance of the contribution of 
each facet to the observed scores. 
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In examining the quality of observational 
data, though, not only are the absolute sizes 
of the variance components of interest but 
the relative sizes of the components are also 
important. The relative sizes, therefore, are 
the focus of this discussion. 

Recall that a reliability coefficient reflects 
the partitioning of variance into true and 
error components and that the coefficient is 
the ratio of true score variance to obtained 
score variance. It represents, in other words, 
the proportion of the total variance that is 
accounted for by individual differences. In 
the same way, generalizability coefficient re- 
flects the partitioning of variance into com- 
ponents that correspond to the facets sampled 
in the G study. The coefficient itself (an in- 
traclass correlation) combines these com- 
ponents in a ratio that also represents the 
proportion of variance attributable to indi- 
vidual differences for a particular universe 
(set of conditions). 

It should not be assumed, however, that a 
G study generates one coefficient that is ap- 
propriate for all applications of the instru- 
ment. On the contrary, one G study can gen- 
erate several coefficients, each corresponding 
to a different universe of conditions. This fact 
points to an important distinction between 
the psychometric theory of reliability dis- 
cussed earlier and the theory of generalizabil- 
ity. In psychometric theory, conditions of 
testing (or otherwise obtaining data) are as- 
sumed to influence only measurement error, 
not the true score on the instrument. In gen- 
eralizability theory, on the other hand, the 
conditions of testing are assumed to influence 
the score itself. What Cronbach and his col- 
leagues have shown is that while true scores 
all contain a common component, they also 
contain additional different components de- 
pending on the design; that is, it is not just 
the error variance that differs among the sev- 
eral reliability coefficients. This relationship 
can be illustrated by returning to the earlier 
example of interscorer reliability—two psy- 
chologists counting self-destructive statements 
from tape recordings. In generalizability 
terms, this is а one-facet study, that is, it 
samples observations from one facet (in this 
case, scorers) in addition to observations of 
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different subjects. Analyzed as a G study, one 

can estimate variance components for sub.’ 
jects, for scorers, and for the interaction of 
subjects and scorers (which in this case is 
confounded with the residual error). The 
generalizability coefficient from this G study 
would reflect the dependability of a score for 
a subject generalized over scorers. In other 
words, it would indicate the proportion of 
variance accounted for by individual differ- 
ences in subjects, above and beyond any ef- 
fects accounted for by differences between 
scorers. [ 

Suppose that a second facet—occasions— 
were added to this study, so that each scorer 
would count self-destructive statements for 
each tape two times. This study combines 
aspects of the interscorer and the intra- 
scorer reliability studies. The analysis of this 
two-facet G study would yield variance com- 
ponents for subjects, for scorers, and for oc- 
casions (which in this case would be inter- 
preted as intrascorer change). Further, vari- 
ance components could be computed for each 
of the possible interactions of these facets: 
Subjects X Scorers, Subjects X Occasions, 
Scorers X Occasions, and Subjects x Scorers 
X Occasions (which in this case is confounded | 
with residual error). The generalizability co- 
efficient from this study would reflect the 
proportion of variance accounted for by indi- 
vidual differences in subjects apart from the 
effects between and within scorers. 

Suppose further that this study were ex- 
tended to three facets by having each psy- 
chologist use two different scoring methods 
for each tape recording he or she listened {0 
(perhaps a count of self-destructive statements 
and a global rating of selí-destructiveness). 
Observational methods would then be a facet 
in the universe of generalization. 

Clearly each facet that is added to the study 
makes the information available from the | 
analysis more complete. But there is a SIE 
nificant cost for the extra information pro 
vided by each facet: The number of observa- 
tions required of each subject is multiplied 
by the number of conditions sampled in tbe 
facet. In the present example, the one-factt 
study would have two scores, the two-facet 
study would have four scores, and the three- | 
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facet study would have eight scores for each 
' subject. 

Described below are some three-facet G 
studies that parallel the intraobserver-inter- 
observer, split-half, and test-retest reliability 
studies that were discussed earlier. All of 
these G studies use the same three facets (ob- 
servers, observational methods, and осса- 
sions) sampled for all subjects. The studies 
differ in their definitions of occasions of mea- 
surement and thus in their interpretations of 
the resulting coefficients. The relationships 
among the reliability studies and the pro- 
posed G studies are summarized in Table 1. 


Duplicate Generalizability 


One study with this basic design might use 
audiotaped or videotaped recordings of be- 


* havior, which would be scored on more than 


one occasion by more than one observer, using 
More than one form of an observational instru- 
ment. In this G study (which I call duplicate 
generalizabilit y), the occasions of observa- 
tion actually consist of exactly the same be- 
havior by the subjects. It is, then, an exten- 
Sion of the traditional intrascorer or inter- 
Scorer reliability study. A reliability study 
Uses two scores for each subject (usually from 
two different Scorers) and confounds measure- 
Ment error with differences within and be- 
tween scorers. A duplicate G study, on the 
other hand, has many scores for each subject 
and Separately estimates the contribution of 
differences within and among observers. Al- 
though occasions is a facet in this G study, 
Variance attributable to occasions cannot be 
interpreted as within-subject change, since 
the same behavior occurs on each occasion of 
observation. In this study, occasions variance 
should be interpreted as a measure of within- 
observer stability. A duplicate G study would 

© appropriate for demonstrating the depend- 
ability of an instrument used in a study in 
Which the stability of the behavior over time 
was not an issue, 


Session Generalizability 


E. Second G study with the basic design 
ght be called session generalizability. Yt 
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Table 1 
Correspondence Between Reliability Studies 
and Generalizability Studies 


Measurement Reliability ^ Generalizability 
Occasions study study 
Separate scorings Intrascorer or Duplicate 
of the same be- interscorer 
havior or 
instrument 
Scores on two Split half or Session 


alternate 
forms 


subdivisions of 
an instrument 
or behavior 
sample, or two 
very similar 
instruments or 
behavior 
samples 


Scores from sepa- 
rate adminis- 
trations of the 
same instru- 
ment or sepa- 
rate samples of 
behavior 


Test-retest Developmental 


would use as measurement occasions two sub- 
divisions of some behavioral sequence (i.e., 
first and second halves or odd and even 
minutes) and would be an extension of the 
traditional split-half reliability study. In a 
split-half reliability study, recall that errors 
of measurement are confounded with differ- 
ences between the two halves of the test or 
observation. In a session G study, differences 
between subdivisions of the behavioral se- 
quence are estimated by the variance com- 
ponent for occasions. A session G study would 
be used to estimate the dependability of 
scores reflecting traits and behaviors expected 
to be stable during the course of the behavioral 
sequence being observed, although perhaps 
no longer than that. 


Developmental Generalizability 


A third G study with the same basic de- 
sign might be called developmental generaliz- 
ability. It would use as measurement occasions 
two or more administrations of the same in- 
strument, perhaps at different ages or develop- 
mental stages. It is, then, an extension of the 
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traditional test-retest reliability study, which 
confounds measurement error with true 
changes in behavior that have occurred be- 
tween the two administrations of a test. In 
this developmental G study, these changes in 
behavior over time would be estimated by 
the occasions facet of the design. The develop- 
mental G study is best suited to measure of 
traits or characteristics believed to be rela- 
tively enduring. 


Use of Generalizability Coefficients 


The comments made about the reliability 
studies discussed earlier can be repeated for 
these G studies. First, observational studies 
very rarely report data in G-study terms. The 
method does appear occasionally in disserta- 
tions (see, e.g., Leler, 1971; Mitchell, 1977), 
and it surfaces now and then in educational 
psychology research (Medley & Mitzel, 1963; 
McGaw, Wardrop, & Bunda, 1972). But to 
return again to developmental psychology as 
an example, there were no studies published 
in 1976 in either Child Development or De- 
velopmental Psychology that reported gen- 
eralizability coefficients. Second, regardless of 
the sizes of the variance components for the 
facets, it is necessary to have a relatively 
large variance component for subjects to ob- 
tain a large generalizability coefficient. All 
other things being equal, a sample of sub- 
jects with greater variability on the trait 
being measured will yield a higher generaliz- 
ability coefficient than will a sample of sub- 
jects with lesser variability on the trait. 

The three G-study designs outlined here 
(duplicate, session, and developmental) all 
sample three important facets that may in- 
fluence scores in observational studies: ob- 
servers, observational methods, and occasions 
of observation. They differ in the nature of 
the occasions that are sampled, and these oc- 
casions tell us something about the nature 
of the universe to which scores can be gen- 
eralized. 

One way to contrast these universes is to 
imagine that the three types of studies were 
conducted so that exactly the same subjects, 
observers, and observational methods were 
used for all three. When only the nature of 
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the occasions facet is different, one can ћу- 
pothesize certain relationships among the 
sizes of the coefficients derived from the three | 
studies. When duplicate G studies are con- 
ducted, it is expected that variance due to 
occasions will be the smallest and hence that 
the generalizability coefficient will be the 
largest. Further, when developmental G stud- 
ies are conducted, it is expected that variance 
due to occasions will be the greatest and that 
the generalizability coefficients will be the 
smallest. Finally, when session G studies are 
conducted it is expected that the variance due : 
to occasions and the resulting generalizabil- 
ity coefficients will be intermediate. 


Other Uses for Generalizability Theory 


Although measures of interobserver артее- 
ment and reliability have important uses in 
observational research, it should be clear from 
the earlier discussion that it is the generaliz- 
ability coefficient that is potentially the most 
useful source of information about the qual- 
ity of such data. Generalizability theory, 
however, has many applications of interest for 
the developmental psychologist other than 
the computation of a coefficient, Some of ' 
these applications are discussed below. 


Multitrait-Multimethod Matrix 


D. T. Campbell and Fiske (1959) intro- 
duced the notion of determining the validity 
of psychological measurement instruments by 
using а multitrait-multimethod matrix. As 
the name suggests, this matrix consists of. 
scores for an individual on several traits, each 
trait assessed by two or more different meth- 
ods. Such a design is clearly an instance of 4 
G study in which traits and methods are the 
two facets. The data analysis proposed by 
Campbell and Fiske uses a matrix of cof” 
relations, but other authors (i.e., Kavanagh, | 
MacKinney, & Wolins, 1971) have used anal- 
yses of variance with the multitrait-mult 
method design. In this form, such a matri 
closely resembles a G study. 

The multitrait-multimethod matrix Was 
used by Wicker (1975) to examine the 16 | 
liability of observational records generated | 
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from transcriptions of conversations. In this 
‘study observers were treated as “methods,” 
and behavior samples were treated as “traits.” 
By applying Campbell and Fiske’s criteria to 
the correlational matrix, Wicker concluded 
that his data showed both convergent and 
discriminant validity. 


Attribution of Variance 


Traditionally, psychological studies have 

sought either to demonstrate mean differences 
between groups of subjects or to show con- 
sistent individual differences among subjects. 
Another kind of study, far less common, tries 
to systematically apportion the variance in 
a set of research data among several inde- 
pendent variables. This approach has been 
Popular in efforts to resolve the issue of 
whether individual differences (personality) 
or situational differences (environment) are 
the most important determinants of human 
behavior. In such studies, data are gathered 
on a group of individuals in several situations. 
The relative importance of individual differ- 
ences and of situational differences is esti- 
mated by the use of the statistic known as 
Omega-square. As Golding (1975) has ably 
demonstrated, this experimental design is 
More profitably viewed as a G study; general- 
lability coefficients answer questions about 
the relative importance of different facets 
(here, individuals and situations) more suit- 
ably than does omega-square. 

The logic involved in this kind of study is 

, Rot limited to the person-situation contro- 
Versy, of course. A similar question might be 
asked about a study in which several raters 

Tate the behavior of a number of subjects. As 
pon and Goldberg (1966) pointed out, 
ed data can be interpreted as reflecting the 

avior of the ratees (subjects) or the be- 
avior of the raters. Once again, this study 

. *Ppeats to be a straightforward, one-facet 
| Bi amily study in which raters is the 

m Although Norman and Goldberg did 
t of variance to analyze their 
tion p her, it is clear that the conceptualiza- 
Bs Similar to that presented earlier: There 

Bu veral sources of meaningful variation in 
of data, and only a multifacet study can 


h 
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illuminate the relative contributions of differ- 
ent facets. 


Observer Generalizability 


The use of generalizability coefficients to 
estimate the contributions of facets other than 
individual differences to a set of test scores 
has a particular application to observational 
studies. Specifically, it allows a researcher to 
look at the proportion of variance in scores 
that is attributable to the consistent behavior 
of the observers. 

On the surface, the function of the gen- 
eralizability coefficient sounds much like the 
function of the interobserver agreement per- 
centage, but in fact it is not. Recall that the 
agreement percentage did not take into ac- 
count the extent of overall variability in a 
set of data, whereas a generalizability coef- 
ficient does. It should be noted that the 
G study necessary to compute this coefficient is 
exactly the same study as that used to com- 
pute the more conventional coefficient based 
on the behavior of the subjects. Nothing has 
changed except one's point of reference. 

Mitchell (1977) computed both subject 
and observer generalizability coefficients in a 
study in which 67 observers made repeated 
observations of 10 mother-infant pairs dur- 
ing the first year of life. She found that the 
coefficients reflecting the variability accounted 
for by observers were, in this study at least, 
greater than the coefficients reflecting subject 
differences. Although this result is specific to 
this particular set of data, the study is an 
example of the usefulness of observer general- 
izability coefficients. 


Single-Subject Studies 


Studies of individual subjects have had few 
ways of reporting reliability in the traditional 
sense. However, it is possible to conduct G 
studies using only a single subject. For ex- 
ample, several observers, several occasions, or 
several methods of observation might be used 
with a single subject. In this case, the general- 
izability coefficient would reflect the gener- 
alizability of a score recorded by a single ob- 
server under the circumstances sampled in the 
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study; that is, it would be an observer gen- 
eralizability coefficient. 

Such single-subject studies may be appro- 
priate even when many subjects are part of 
a research project. It is commonly assumed 
that the behavior of all subjects is recorded 
with equal accuracy, that is, measurement 
errors are approximately equal for all subjects. 
If this assumption is true, then the data for 
all subjects are presumably equally “good.” 
On the other hand, if this assumption is incor- 
rect, so that subject behavior is recorded with 
variable accuracy, then data for different sub- 
jects may have different meanings. 

The possibility of systematic differences in 
measurement error among subjects has been 
explored for traditional psychological tests 
(Ghiselli, 1963). Berdie (1969), for example, 
found that intraindividual variability (i.e., 
measurement error) was a stable trait for 
some  pencil-and-paper performance tests. 
Ghiselli (1960) examined the prediction of 
“predictability.” He was able to predict the 
errors of measurement for two groups of sub- 
jects on a reaction time test. Reliability for 
the high group was .97, compared with .82 
for the low group. 

A similar result was found in a quite dif- 
ferent study by Gorsuch, Henighan, and 
Barnard (1972). Their interest was the in- 
ternal consistency of a children’s pencil-and- 
paper test of locus of control. They found 
that the reliability of the scale differed sig- 
nificantly according to the reading ability of 
the children. The errors of measurement were 
quite small for the good readers, but were 
large for the poor ones. 

Observational studies, however, rarely have 
enough subjects to permit analysis of this 
kind. An alternative is to make use of ob- 
server generalizability coefficients. Suppose 
that the data collected for each subject were 
considered to be a mini-G-study. If the basic 
G-study design outlined earlier were used, 
each mini-G-study would have two facets 
(methods and occasions) that would be sam- 
pled for each observer. From each of these 
mini-studies it would be possible to compute 

an observer generalizability coefficient. Using 
this design, Mitchell (1977) found that al- 
though there were differences in observer 
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agreement for different subjects, subjects did 
not differ in their observer generalizability 
coefficients. 


Summary 


Three different coefficients that purport to 
reflect the quality of data gathered in ob- 
servational studies have been discussed. The 
first and most commonly used of these was 
observer agreement. Coefficients of observer 
agreement are a source of important informa- 
tion about the quality of observational data: 
the objectivity of different observers using 
the same method to record the same behavior. 
Determination of interobserver agreement is 
a necessary part of the development and 


use of observational measures. Interobserver | 


agreement is not, however, sufficient by itself. 
The second coefficient discussed was the 


reliability coefficient, obtained by fitting ob- 


servational data into the pattern used by de- 
velopers of standardized psychological tests. 
There are really many different reliability 
coefficients, each defined by the way the 
scores are obtained for its computation. Re- 
liability coefficients provide useful informa- 
tion about the stability and consistency of in- 
dividual differences among subjects, but con: 
found measurement error with other sources 
of variability. 

The third coefficient was the generalizabil- 
ity coefficient, as defined by Cronbach et al.'s 
(1972) multivariate theory. In one sense, 3 
generalizability coefficient supersedes à re- 
liability coefficient, because it too provides 


information about the stability and consist; 


ency of individual differences among subjects: 
Its superiority to the reliability coefficient lies 
in its ability to account for variance from. 
sources other than individual differences ал 
measurement error. Besides giving information. 
that can be reported as the generalizability 
coefficient, a G study also permits innovative 
ways of looking at results from observation? 


studies. These ways include variations 0n thei 


multitrait-multimethod matrix, the attrib 
tion of variance to independent variables, ob- 
server generalizability, and studies of single 
subjects. It is therefore especially unfortunate 
that G-study designs are not used more 69 
quently in observational research. 
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3 Recommendations 
| Researchers doing observational studies are 
obliged to show that their measuring instru- 
ments are reliable—that they have small er- 
rors of measurement and that the scores of 
individuals show stability, consistency, and 
dependability for the trait, characteristic, or 
behavior being studied. The reason for this 
obligation is practical: If the measure is not 
Yeliable, it cannot be expected to show lawful 
relationships with other variables being stud- 
Кей. It is well-known that the reliability of a 
standardized test sets the limits of its validity 
(Nunnally, 1967). Similarly, the predictive 
usefulness of observational measures is limited 
by the stability and consistency of the scores 
obtained from the observational instruments. 

Observer agreement coefficients alone, re- 
gardless of their mathematical sophistication, 
аге inadequate to demonstrate this stability 
and consistency. What alternative or addi- 
tional information ought to be reported in 
observational studies, and how should it be 
collected ? 

First, and most basically, the coefficients 
computed should be based on the same scores 
that are used in the substantive analysis of 
the study. If a composite score (such as a 
total of several categories or time units) is to 
be used for analysis, it is this composite—and 
Not its component individual categories or 
time units—that should be examined for 
agreement, reliability, or generalizability. Of 
Course, during the training of observers it is 
extremely helpful to compare the records of 
different observers on а trial-by-trial (or 
time unit by time unit) basis, but such a com- 
aaa does not suffice for reporting in a pub- 
ina research article. It is possible, albeit 
uve acceptable agreement on a time 
i asis and yet unacceptable levels of agree- 

Оп a total score. It is also possible, and 
much more common, for observers to be in 
аб agreement for small time units, 

à | show good agreement for a total score. 
E S case, an analysis of the trial-by-trial 

OWN would underestimate the agreement 
study € measures actually employed in the 


- Similarly, if several different scores are to 
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be analyzed, coefficients should be computed 
for each of them. Good agreement or reliabil- 
ity on the frequency of particular behaviors, 
for example, does not insure good agreement 
or reliability on their duration. The data that 
are to be analyzed are the data that should be 
scrutinized for their stability and consistency. 

Second, the coefficients should be com- 
puted from data that are part of the actual 
study being reported. The studies of observer 
agreement cited earlier (i.e, Reid, 1970; 
Romanczyk et al, 1973; Taplin & Reid, 
1973) show clearly that the quality of data 
collected during a study may not be the same 
as the quality collected during reliability as- 
sessment or training. This difference can also 
be expected for data collected during a pilot 
study or during a previous study that used 
the same instrument. 

This means that the researcher must plan 
to collect data that can be used to compute 
coefficients of agreement, reliability, or gen- 
eralizability at the same time the rest of the 
data are gathered. Although this does entail 
some additional data collection, the addition 
need not generally be enormous (see, e.g., 
Rowley, 1976). 

Third, interobserver agreement should al- 
ways be obtained and reported. In most ob- 
servational studies, observer disagreement is 
an important source of error, and it should 
be carefully and systematically monitored, 
This monitoring needs to be regular and un- 
obtrusive for the most accurate results. Re- 
searchers also need to be alert to the possi- 
bility that interobserver agreement may differ 
for different subjects, therefore observations 
from many—preferably all—subjects should 
be used when determining the level of agree- 
ment. 

Although the observer agreement percentage 
is the most widely used and most easily com- 
puted index of agreement, it may be desirable 
in some cases to substitute other indices, such 
as that suggested by Lawlis and Lu (1972). 
Whatever the exact form of the coefficient, 
however, both the researcher and the reader 
should remember that it reflects only one 
source of error and that it reports this error 
in absolute rather than in relative terms. 

Fourth, a reliability or generalizability 
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coefficient that uses two or more measure- 
ment occasions should be presented for any 
score that is used to predict other behavior. 
The researcher is obliged to demonstrate that 
the individual differences among subjects are 
stable over different occasions as well as over 
different observers. This stability can be re- 
ported as a split-half, alternate-forms, or 
test-retest reliability coefficient, or as a ses- 
sion or developmental generalizability coef- 
ficient. The single exception to this rule is a 
study that focuses on some behavior not ex- 
pected to show stability over time (e.g., first 
response to a new stimulus). In this case only, 
an interscorer reliability coefficient or a dupli- 
cate generalizability coefficient is appropriate. 

It is impossible to overemphasize the im- 
portance of using two or more measurement 
occasions to compute coefficients of reliability 
or generalizability. The purpose of reporting 
such a coefficient is to demonstrate that the 
data being analyzed reflect stable, consistent, 
and dependable individual differences among 
subjects. If, however, a single measurement 
occasion has been used, then the coefficient 
can demonstrate only the competence and 
consistency of the observers. Since many, if 
not most, observational studies obtain re- 
peated measures as part of the experimental 
design, it is seldom necessary to collect ad- 
ditional data. What is necessary is to analyze 
and report these observational measures in 
terms of their stability over time. 

Fifth, a generalizability study is usually 
preferable to the computation of a reliability 
coefficient. First, a G study provides more 
useful information about sources of variabil- 
ity in a set of data than does a reliability co- 
efficient. Leler (1971), for instance, used the 
variance components from a G-study analysis 
midway through her research to help refine 
the observation instrument and to retrain 
observers on some items. Second, the G-study 
design makes other kinds of analysis possible 
for most observational studies. These include, 
particularly, observer generalizability coef- 
ficients and coefficients for individual subjects. 

Finally, the design of the generalizability 
study should correspond to the overall de- 
sign of the research. The G study for most 
applications does not need to be complex, 
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Two facets—observers and occasions—are 
usually sufficient. There are some studies, 
however, that require three-facet designs. 

Studies that measure behavior before and 
after some intervention (experimental treat- 
ment) should include a third facet in the G 
study—before versus after the intervention, 
This is especially important if the interven- 
tion is likely to reduce the variability of the 
observed behavior. For example, suppose à 
study were undertaken to reduce the amouni 
of aggressive behavior exhibited by school 
children on the playground. At the start of the 
study, the children would be quite variable 
in their playground aggression. If the inter- 
vention were successful, however, the variabil- 
ity after intervention would be quite low (all 
kids showing low aggression). A measure that 
would be quite reliable (in the classical sense) 
in differentiating among the children before 
the intervention might be inadequate after- 
wards. The G-study design allows the n 
searcher to evaluate this possibility. 

In the same way, studies that compare (м0 
groups of subjects should include group men 
bership as a third facet in their G study; that 
is, it is necessary to demonstrate that tht 
scores for both groups show approximately 
equal stability and consistency over different 
occasions and different observers. 


Conclusions 


These recommendations have an important 
empirical implication: Studies that follow 
them will report coefficients that are Јоже! 
perhaps substantially lower, than the coeff 
cients reported by studies that do not fok 
low them. The procedures suggested here att 
stringently conservative, and the coefficient 
that they yield should be considered lowe 
limits of the true dependability of the ob 
servational data that are collected. Reviews 
and readers, who are used to seeing reports o 
observer agreement in the .80s and .90s, vil 
have to change their expectations for reliab! 
ity and generalizability coefficients, whi 
will often be in the .50 and .60s. In fat! 
these new coefficients are not low; rather у 
old ones were inappropriately high. Obse i 
agreement percentages, interobserver E 


түй 


ity, and reliability or agreement determined 
Aduring pretests or previous studies are all 
spuriously high estimates of the quality of the 
data that are collected. Although we may 
have to revise our standards downward con- 
cerning the size of reported coefficients, we 
will be revising our standards upward con- 
cerning the ways in which the data for the 
coefficients are gathered and analyzed. 
There is a methodological implication of 
these recommendations as well. Most observa- 
tional studies currently being published are 
retty straightforward in experimental design. 
Their sophistication is usually based on the 
nature of the observations themselves: the 
complexity of the behavioral record, the length 
of time included in the record, or the specific 
nature of the situation or setting in which the 
data are gathered. Future studies that follow 
the recommendations given here will be far 
more complex in design than is now typically 
the case. 

One final question concerns whether it is 
teally worth the additional time and money 
necessary to determine and report on the 
quality of the data in the ways suggested 
here, If including a G-study design in an ob- 
Servational study has no benefits except the 

Computation of a generalizability coefficient, 

the answer is probably no. In fact, though, in- 
cluding a G study usually provides a great 
deal of substantive information to the re- 

Searcher. Are there differences in the behavior 
Teported by different observers? Do subjects 
act differently on different occasions? Is the 
Variance of the data different before and after 
oe €xperimental treatment? Are all groups in 
it 4 Y represented by data of equal qual- 
| ЊЕ ll these important questions can be ad- 

Tessed by including a G study in the overall 
Тезеатсћ plan, 
eda oie пом meno 
Of the research 2 both 
ihe ind “Searcher (and the reader) on Бо 
‘on ndividual differences among subjects and 
A e influence of other (usually environ- 
ü TN factors on behavior. As Cronbach 

zie), loquently pointed out, psychologists 

| istorically tended to focus on one ог 

E OÍ these two aspects: individual 
\ = (using correlational methods) or 
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group differences (using experimental meth- 
ods). Fittingly, it is now Cronbach's theory 
of generalizability that makes it possible to 
combine these two viewpoints. And observa- 
tional research is particularly well suited to 
the task of looking at individual differences 
in behavior in the context of systematic en- 
vironment variation. 
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The Dichoptic Viewing Paradigm: 
Do the Eyes Have It? 


Gerald M. Long 
Stanford University 


The two-part logic underlying the use of the dichoptic viewing procedure as a 
“psychoanatomical” tool by which to infer the retinal or cortical locus of critical 
processes for the perception of some visual phenomenon is reviewed and 
analyzed. Serious logical weaknesses are identified in both sides of the dichoptic 
argument: the inference of dominant retinal processes from unsuccessful 
dichoptic viewing or dominant cortical processes from successful dichoptic view- 
ing. Specific examples from the visual literature are used to demonstrate the 
potential confounding of each of the variables suggested in this critique. The 
dichoptic viewing procedure tends to be employed in too uncritical a manner, 
and the usual interpretations of studies that have used this procedure as the 
sole technique by which to infer the general locus of a given visual phenomenon 


may be seriously in error. 


Interest in localizing the physiological struc- 
tures or levels within the visual system that 
are most directly involved in various percep- 
tual phenomena has been directing research 
and theory since some of the earliest experi- 
mental work in vision (cf. Helmholtz, 1909/ 
1962). However, this interest has taken on a 
Tenewed vitality in recent years with regard 
0 the currently dominant information-pro- 
cessing conceptualization of perceptual func- 
tioning (e.g., Haber, 1969, 1974; Lindsay & 
Norman, 1972). In this now prevalent view, 
Various stages and processes, usually in some 
form of hierarchical arrangement, by which 
Sümulus information is transformed and 
transmitted in the visual system are hypoth- 
6sized. Whether such stages purely represent 
ЩЕ variables and thereby serve simply 
Suristic and shorthand functions for con- 
tee a cluster of logical processes or 
iu er they actually refer to potential struc- 

65 or groups of structures in the physio- 
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logical system is frequently unspecified. 
Hence, the ultimate utility of these hypo- 
thetical processing stages in terms of a real- 
istic conception of the physical system mod- 
eled is unclear. Julesz (1971) has recently 
provided an extremely lucid and detailed ex- 
position of the value of a program of research 
aimed specifically at localizing within the 
visual system the physiological level(s) of 
processing involved in the perception of all 
manner of phenomena—from vernier acuity 
to eidetic memory to visual illusions. Julesz 
was particularly concerned with the potential 
roles random-dot stereograms and related 
techniques can play as psychoanatomical pro- 
cedures by which the retinal or cortical locus 
of various visual percepts can be inferred from 
nonphysiological manipulations. However, a 
much older and in some sense related tech- 
nique that is still very popular among re- 
searchers for ostensibly assessing the periph- 
eral versus central locus of a visual percept 
is that of dichoptic viewing. 


Dichoptic Viewing Paradigm 

Since at least the time of Helmholtz (1909/ 
1962) it has generally been believed that the 
ability or inability of the visual system to 
combine information presented separately to 
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the two eyes can reflect directly the locus in 
the visual system of the processes involved. 
This dichoptic viewing procedure, which is 
sometimes also referred to as binocular or 
stereoscopic viewing, typically consists of the 
independent stimulation of the two eyes, 
either simultaneously or successively, by two 
aspects of the stimulus array that define 
some perceptual phenomenon. Two rather 
well-known examples should clarify the pro- 
cedures involved as well as the usual conclu- 
sions reached from this experimental manip- 
ulation. For a case of simultaneous dichoptic 
viewing, consider the interesting and some- 
times controversial phenomenon known as 
binocular yellow. Under appropriate condi- 
tions, if a green stimulus is presented to one 
eye and an identical red stimulus is presented 
to the other eyes, the observer will frequently 
report a sensation of yellow. This is basically 
the same subjective sensation that occurs 
when both colored stimuli are presented to 
the same eye. The implications of this demon- 
stration were clearly appreciated by early 
theorists in color vision. For Hecht (1928), 
this: simple result exposed a critical weakness 
of the Hering (1964) and Ladd-Franklin 
(cited in Boring, 1942) theories, both of which 
required special retinal receptors for yellow 
light. On the other hand, the Young-Helm- 
holtz three-receptor theory (cf. Boring, 1942) 
was sufficient to account for the phenomenon 
with the simple assumption that under the 
experimental conditions, “the red sensitive 
fibers and the green sensitive fibers are both 
active and the brain synthesizes yellow” 
(Hecht, 1928, p. 238). The seriousness of this 
attack for the other color theories was readily 
apparent to their proponents, and heated re- 
buttals to the dichoptic demonstration, stress- 
ing alleged procedural and stimulus weak- 
nesses, were raised (Murray, 1930, 1939). 
Based on subsequent, more controlled experi- 
ments (e.g., F. H. Thomas, Dimmick, & 
Luria, 1961) and, of course, with the wisdom 
of hindsight bolstered by 40 intervening years 
of research, (e.g., P. K. Brown & Wald, 1964), 
it is not difficult to side now both with Hecht’s 
argument for the authenticity of binocular 

_ yellow and the underlying three-pigment basis 
to normal color vision. 
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For an example of successive dichoptic, 
viewing, there is the extensive literature on 
dichoptic visual masking (e.g., Turvey, 1973). 
In the usual dichoptic masking experiment, 
the target or test flash is presented to one 
eye, and the mask is presented, following a 
variable interstimulus interval, to the other 
eye. Perceptual performance (e.g. probabil- 
ity of detection or percentage of letters cor- 


we 


rectly reported) is then compared under the, * Ast 


same temporal conditions with that obtained 
when both parts of the stimulus array, target 
and mask, are presented to the same eye 
(monocular viewing) or to both eyes (binoc- 
ular viewing).? If there is very little differ- 
ence between the dichoptic and monocular 
(or binocular) masking conditions, postretinal 
processes are usually inferred to underlie the 
masking effect. Hence, the masking of letters 


with complex patterns has been hypothesized" 


to*involve cortical processes, because of the 
general comparability of monocular and dich- 
optic demonstrations (cf. Breitmeyer & Ganz, 
1976). On the other hand, if it is not possible 
to mask a target presented to one eye with 
a mask presented to the contralateral eye, 
retinal processes are inferred to dominate in 
any masking effects with the same stimuli 
obtainable under normal viewing conditions. 


1A slight variation on this usual dichoptic ap- 
proach, which is discussed at a later point in the 
article, should perhaps be mentioned here. Basically, 
it involves the presentation of the complete visual 
phenomenon to one eye (eg., flickering light for 
critical flicker frequency determination) while some 
other stimulus is presented to the other eye (e.g, 
variable luminance background). If perceptual per- 
formance with the phenomenon is affected by the 
latter stimulus to the contralateral eye, central or 
cortical interactions are assumed to be involved (cf. 
J. L. Brown, 1966b). 

2 In some of the older literature, the term binocular 
was used in the same sense that dichoptic is used in 
the present context (e.g, Ammons & Weitz, 1951; 
Kahneman, Norman, & Kubovy, 1967). Correspond- 
ingly, phrases like binocular interaction or binocular 
fusion frequently referred to the visual system’s pro- 
cessing of separate (ie. dichoptic) stimuli presented 
to the two retinas. This usage of the word binocular 
appears subject to unnecessary confusion. Therefore, 
in the present article, binocular viewing is reserved 
for those conditions in which a single optic array is 
presented simultaneously to the two eyes, that is, 
normal viewing conditions. 
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~ у jen homogeneous field masking (by а 


bright, blank field) has been relegated to in- 
traretinal factors (cf. Breitmeyer & Ganz, 
1976). 

From these two examples, the reasons for 
the popularity of the dichoptic viewing para- 
digm are probably apparent. Theoretically, the 
two possible outcomes of such a procedure 
permit distinct conclusions: (a) If the visual 


"S4phenomenon in question has remained essen- 


tially unchanged (although perhaps somewhat 
re weakened in the dichoptic viewing condition), 
it is assumed that central (i.e., cortical) pro- 
cesses largely underlie the phenomenon, since 
only in postretinal centers is the separate in- 
put from the two eyes known to be combined; 
(b) on the other hand, if dichoptic presenta- 
tion of a phenomenon destroys the usual per- 
cept, it is generally argued that the critical 
"processes that determine the usual visual ef- 
fect must be retinal or at least precortical. 
Peripheral processes are inferred, because the 
combination of the two half stimuli must ap- 
parently occur prior to the level of binocular 
innervation for adequate perception of the 
phenomenon in question. Needless to say, the 
latter conclusion that a given visual phe- 
> homenon is retinal does not mean that central 
stages are unnecessary; the perception of any 
event requires central structures. (In the ex- 
treme case, regardless of how perfect the func- 
tioning of the two eyes, if the cortical and 
subcortical structures are inoperative, one is 
blind to all visual phenomena.) Rather, the 
dichoptic viewing procedure has been em- 
ployed to determine whether the central pro- 
cessing of the separate retinal signals from the 
half stimuli presented to different eyes is suf- 
ficient or whether peripheral structures must 
process certain aspects of the stimulus event 
in combination prior to the involvement of 
the higher centers. In other words, if a given 
visual event is not able to be perceived dich- 
optically, a nonlinear system has been inferred 


^w in which the result of the processing of A 


(input to left eye) + B (input to right eye) 
is not equivalent to the processing of A + B, 
the combined input to either eye alone. 
Although the dichoptic viewing paradigm 
has had a considerable history, it was em- 
ployed extensively by the Gestalt psycholo- 


gists as the basis for an interocular criterion 
(Pastore, 1971) with which to demonstrate 
that more was involved in the perception of 
many phenomena than met the eye. Wert- 
heimer (1912), for example, demonstrated 
that apparent motion was easily obtained 
even if the sequential stimuli were presented 
to separate eyes, thereby excluding interac- 
tions between adjacent retinal regions as a 
necessary basis for the phenomenon. Simi- 
larly, Köhler (1940; Köhler & Wallach, 1944) 
used the dichoptic paradigm in his work with 
figural aftereffects to argue that "the effect 
is located in the brain, where processes which 
are due to stimulation of corresponding parts 
of the two retinae occur in a common area" 
(1940, p. 86). However, these early refer- 
ences should not give the impression that the 
dichoptic procedure is in any way a para- 
digm of the past or that it is restricted to a 
very limited category of visual phenomena. 
The popularity of this psychoanatomical pro- 
cedure by which one is theoretically able to 
infer the locus of critical processes for a large 
number of visual effects has remained quite 
strong up to the present day. Employing the 
two-part logic outlined in the previous para- 
graph, other researchers have used the dich- 
optic viewing paradigm to investigate the 
locus and nature of afterimages (e.g., Hansen, 
1954), flicker discrimination (eg, С. J. 
Thomas, 1955), apparent motion (e.g., Ship- 
ley, Kenney, & King, 1945), temporal sum- 
mation (e.g., Kahneman, Norman, & Kubovy, 
1967), Ganzfeld effects (e.g., Hochberg, Trie- 
bel, & Seaman, 1951), masking and metacon- 
trast (e.g., Alpern, 1953; Turvey, 1973), geo- 
metric illusions (e.g., Papert, 1961; Schiller 
& Wiener, 1962), McCollough and related con- 
tingent aftereffects (e.g, Skowbo, Timney, 
Gentry, & Morant, 1975), short-term visual 
storage (e.g., Meyer, Lawson, & Cohen, 1975), 
frequency- or size-specific visual channels 
(e.g., Harter, Towle, & Musso, 1976), and 
many other visual phenomena. However, it 
should be noted that this technique's popu- 
larity has continued in spite of specific criti- 
cisms that have been raised against various 
aspects of the argument on which the dich- 
optic procedure is based. The present article 
attempts to outline these generally over- 
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looked weaknesses in the dichoptic procedure 
that serve to question subsequent conclusions 
concerning the locus of the critical processes 
underlying certain phenomena. 

As a final comment before presenting these 
proposed weaknesses in the experimental pro- 
cedure, it is of interest to note that previous 
critics of the dichoptic viewing paradigm ap- 
pear to have focused on difficulties with one 
or the other of the two conclusions described 
above, while allowing the other to remain. 
For example, as is treated in more detail be- 
low, Julesz (1971, p. 6) tended to accept as 
probably “well-founded” the inference of criti- 
cal cortical processes if the dichoptic viewing 
of some phenomenon is unimpaired, while re- 
jecting the inference of dominant retinal pro- 
cesses if dichoptic viewing fails. Kolers 
(1972), on the contrary, argued that the 
dichoptic procedure only permits one to make 
the negative case: 


Failure to obtain a percept dichoptically proves only 
that most of the processing occurs early in the visual 
channel; but the occurrence of a percept dichopti- 
cally never proves the reverse, that all of its process- 
ing is normally cortical. (p. 182) 


In other words, these two researchers will 
only accept the opposite conclusions from a 
dichoptic viewing experiment. In light of 
these differences of opinion concerning the 
legitimacy of either side of the dichoptic argu- 
ment, it is the purpose of the present article 
to outline the rather strong case that can be 
made against drawing either of the usual con- 
clusions from the dichoptic viewing experi- 
ment. Furthermore, to illustrate the points 
raised in this article, research examples from 
the current vision literature that appear po- 
tentially subject to some of the logical flaws 
to be described are employed. 


The Negative Result: Does the Inability to 
Obtain a Percept Dichoptically Indicate 
Critical Retinal Processes? 


Binocular (Retinal) Rivalry 


As mentjoned above, Julesz (1971) criti- 
cized the logic of inferring exclusive retinal 
processes as the basis of a given visual phe- 
nomenon if dichoptic viewing fails (i.e, if 
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the normal percept cannot be obtained dich- ha a 
optically). The reason for this involves the Ж 
unknown role of binocular rivalry under dich- 
optic viewing conditions. It is well-known 
that if quite different information is presented 
to the two eyes, rather than simple fusion of 
the dichoptic images, the input from one eye 
may be partially or entirely suppressed while 
that of the other eye dominates (cf., Hoch- Pra 
berg, 1972). Since it is very difficult to pre- (^ 
dict under which conditions fusion as op- 
posed to suppression of separate retinal in- a 
puts will occur, binocular rivalry remains a 
serious confounding variable in the dichoptic 
viewing paradigm. Any change observed under 
dichoptic conditions may result from inter- 
ocular suppression rather than from an as- 
sumed absence of critical intraocular process- 
ing presumably necessary for the usual per- 
cept. А 
Several indications of the potential involve- 
ment of binocular rivalry under conditions 
of dichoptic viewing can be found in the ex- 
tensive masking literature. For example, 
Schiller and Smith (1968) reported greater 
masking effects in a metacontrast paradigm 
with dichoptic viewing than with monocular 
viewing. They interpreted this rather unex-, 
pected result in terms of binocular rivalry ef- 
fects between the “test” eye and “mask” eye 
in addition to classic metacontrast suppres- 
sion effects. Typically, influences from inter- 
ocular rivalry are not considered in such 
work, but in this particular case, the addi- 
tional hypothetical component of binocular 
rivalry was deemed necessary because the 
dichoptic suppression effect was larger tham 
that of monocular suppression. Nevertheless, 
this does serve to emphasize the unknown 
contribution of binocular rivalry in those 
studies that report any effect of dichoptic 
viewing. Moreover, further evidence for ocu- 
lar suppression during dichoptic viewing has 
been demonstrated convincingly in a recent 
study of masking effects by Monohan and 7^ 
Steronko (1977). In an attempt to control 
for binocular rivalry effects resulting from 
chronic ocular dominance of one eye over 
the other within subjects, these investigators 
preselected subjects on the criterion of equal 
masking effects across the two eyes before de- 
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pattern-masking experiment. Only 9 of 24 sub- 


Я t марина luminance effects іп а dichoptic 


jects tested met this criterion, but with these 
subjects results were obtained that were at 
odds with those previously reported in the 
masking literature. On the basis of these re- 
sults, Monohan and Steronko argued for the 
reinvestigation of all reported dichoptic mask- 
ing effects in which binocular rivalry, espe- 


"^w, cially as reflected in ocular dominance, has 
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not been controlled. It is proposed that this 
same argument should be given serious con- 
sideration with regard to other visual phe- 
nomena for which dichoptic viewing in any 
way alters performance. 

In the context of the binocular rivalry argu- 
ment, Julesz (1971) described the value of 
the random-dot stereograms that contain little 
monocular form information and thereby ap- 
pear to eliminate or at least reduce binocular 
rivalry effects. To emphasize this point, 
Julesz criticized those studies investigating 
geometric illusions that inferred the domi- 
nance of retinal processes in these illusions 
because of the reduced illusory effect under 
dichoptic viewing conditions (e.g, Spring- 
bett, 1961). Following Day's (1961) argu- 
„ment that such findings may have resulted 
from binocular rivalry effects, Schiller and 
Wiener (1962) found that when brief tachis- 
toscopic flashes of the geometric illusions were 
employed to reduce rivalry effects in the 
dichoptic situation, the magnitude of the il- 
lusory effects approached that obtained under 
binocular presentation. Using random-dot 
stereograms, Papert (1961) and Hochberg 
cited in Julesz, 1971) reported these same 
illusions to be basically identical under clas- 
sical (i.e, binocular) and cyclopean (ће. 
stereogram) stimulation. Such results serve to 
confirm Julesz’s argument that the earlier in- 
vestigations were confounded by unknown 
effects of binocular rivalry under the dich- 
optic viewing conditions. However, it should 
be mentioned that even with the random-dot 
stereograms, the absence of successful stereo- 
scopic viewing of some phenomenon only al- 
lows one “to conjecture its peripheral origin 
until it is proven to be central” (Julesz, 1971, 
p. 4). This caution is necessary because, al- 
though unimpaired dichoptic viewing may in- 
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deed reflect basic superposition of the retinal 
signals at central levels, the possibility of even 
partial binocular rivalry cannot be completely 
eliminated. The need for this caution is fur- 
ther supported by a recent study by Blake 
(1977) that demonstrated temporary sup- 
pression of an above-threshold target pre- 
sented to one eye by a target at its own 
threshold contrast in the other eye. Hence, 
the extreme sensitivity of the mechanism 
that underlies ocular suppression appears to 
strengthen the argument against assuming an 
absence of binocular rivalry under any dich- 
optic viewing conditions. 


Monocularly Innervated Cortical Effects 


A second possible criticism of inferring 
retinal processes from a failure of dichoptic 
viewing is of a somewhat more hypothetical 
nature, but is nevertheless consistent with 
current physiological evidence. Hubel and 
Wiesel (1962, 1968) reported that the ma- 
jority of cortical cells from which they re- 
corded with microelectrodes were binocularly 
driven cells. However, a good proportion of 
their sample (up to 30%) appeared to be 
innervated only by one or the other eye. This, 
in turn, raises the possibility of some visual 
effects that may involve these monocularly 
driven cortical cells alone. Such visual effects 
would not be observable under conditions of 
dichoptic viewing, but would nevertheless be 
critically dependent upon postretinal pro- 
cesses. In support of this contention, it has 
recently been argued that color-contingent 
motion or orientation aftereffects may well 
fall into this category (Favreau & Corballis, 
1976; Skowbo et al., 1975). In a typical dem- 
onstration of the color-contingent motion 
aftereffect (e.g., Mayhew & Anstis, 1972), a 
red clockwise-rotating spiral is alternated 
with a green counterclockwise-rotating spiral 
during the adaptation phase. In the subse- 
quent test phase, if a stationary red spiral is 
viewed, it will appear to rotate counterclock- 
wise; a stationary green spiral will appear to 
rotate clockwise. However, if during the 
adaptation phase the color is presented to the 
left eye and the rotating spiral to the right 
eye, no aftereffect is observed when the spiral 
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is subsequently viewed by the right eye 
(Murch, 1972). Hence, even ‘though there is 
considerable evidence that these aftereffects 
are nonretinal (cf. Skowbo et al., 1975), it 
does not appear possible to innervate the 
appropriate cortical centers with the dichop- 
tic presentation of the stimulus components 
during the adaptation phase. Whether future 
research will support these results with the 
dichoptic viewing of certain color-related 
aftereffects is not critical to the argument 
proposed here. The possibility of visual effects 
dependent even only in part on monocularly 
innervated cortical cells serves to weaken 
inferences of critical retinal processes that 
are based purely on the failure of dichoptic 
viewing. 


The Positive Result: Does the Finding of 
No Difference Between Normal and Dichoptic 
Viewing Indicate the Relative Dominance 
of Central Processes? 


Just as the previous two arguments have 
been raised against inferring dominant retinal 
processes from a failure of dichoptic viewing, 
it is also possible to find fault with the other 
side of the dichoptic paradigm by which 
cortical processes are inferred to play the 
critical role in the perception of some phe- 
nomenon if dichoptic viewing leaves a per- 
cept relatively unaffected. In this section 
various arguments, with examples from em- 
pirical work, are developed that seriously 
question this other half of the dichoptic 
paradigm. 


Phenomenal Overlap of the Monocular 
Visual Fields. 


This first criticism can perhaps best be 
explained by reference to the controversy 
that existed for several years concerning the 
central versus peripheral locus of simple 
afterimages. As early as Newton, it was known 
that if an intense light source is viewed for a 
few seconds with one eye that is then closed 
and the nonstimulated eye is used for view- 
ing, a negative afterimage of the source can 
sometimes be seen (e.g., Day, 1958; Terwilli- 
ger, 1963; Walls, 1953). This apparent inter- 
ocular transfer of the afterimage was cited by 
some as evidence for central involvement in 
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such afterimages (cf. Day, 1958). However, » 
the bulk of both empirical and theoretical 
work favors the notion that the retina is the 
locus of classic afterimages (J. L. Brown, 
1966a; Craik, 1940). If it is assumed that 
afterimages are due entirely to retinal fa- 
tigue, the problem arises as to the basis of 
the phenomenal transfer of such retinally 
based effects. In a footnote to his well-known 


study that investigated adaptation or after- ~“ 


effects to curved lines, Gibson (1933) sug- 
gested the potential involvement of binocular 
rivalry in the apparent transfer of classic 
afterimages such that the stimulated but 
closed eye may dominate the open eye under 
some conditions. Sumner and Watts (1936) 
developed the argument more explicitly and 
found empirical support for this proposal by 
investigating differences in the apparent trans- 
fer of afterimages under various stimulus and 
background conditions, Day (1958) extended 
the argument even further and generalized 
its significance to all dichoptic viewing situa- 
tions that involve aftereffects by stressing the 
possible confounding effect of the phenomenal 
overlap of the visual fields from the two eyes. 
In other words, Day argued that it is ques- 
tionable to infer central processes from “suc- 
cessful” dichoptic viewing because it is not 
generally possible for the observer to sepa- 
rate truly centrally transferred effects from 
continued input from the closed but still sig- 
naling eye because of the overlap of the visual 
fields from the two eyes. This overlap of the 
two monocular fields has been proposed as 
the basis of the apparent transfer of nega- 
tive afterimages, since the retinal pattern 
outlasts the physical stimulus whether the 
stimulated eye is open or closed following 
termination of the stimulus. Delabarre (1888) 
expressed strongly this same weakness of the 
dichoptic procedure 12 years before the turn 
of the century: 


A serious difficulty in settling the question [of the 
locus of afterimages] lies in the well-known impos- 
sibility of separating the visual fields of the two 
eyes. Whether one eye or both are open, whether 
they are focused on the same point or are held 
parallel, or squinted, or even jammed into all sorts 
of relative positions by fingers inserted into their 
sockets, the field of each will appear to coincide 
with the field of the Corresponding portion of the 
retina of the other. (p. 326) 
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Much later, Terwilliger (1963) stressed the 
implications of these findings with afterimages 
for the study of other visual phenomena for 
which dichoptic viewing has proved success- 
ful. He argued that if a phenomenon with 
such a clearly retinal basis as afterimages 
exhibits introcular transfer, it does not appear 
theoretically sound to exclude categorically 
the involvement of retinal factors in any visual 


są effects simply on the basis of successful dich- 


optic viewing. Terwilliger was specifically 
concerned with the possibility of an impor- 
tant retinal contribution to figural after- 
effects, a notion rejected previously by Köh- 
ler and Wallach (1944) on the basis of their 
findings with dichoptic viewing described 
above. His methodological cautions, however, 
appear generally valid for all studies that 
employ the dichoptic procedure. Nonethe- 


y less, they are perhaps given further credibil- 


ity by the subsequent emergence of alternate 
theories of figural aftereffects that stress the 
role of dominant retinal processes in such 
perceptual distortions (e.g., Deutsch, 1964; 
Ganz, 1966a, 1966b). Hence, as with after- 
images, the use of the dichoptic viewing pro- 
cedure may have retarded the development 
of an adequate theory of figural aftereffects 
*by the uncritical inference of a retinally inde- 
pendent basis to the phenomenon solely on the 
basis of successful dichoptic viewing. 

The potential confounding of the phenom- 
enal overlap of the visual fields may seem 
rather obvious in retrospect, but it should be 
remembered that the question of the locus of 
afterimages was debated for years because of 

. such apparent interocular transfer of the phe- 
nomenon, Furthermore, although it may ap- 
pear implausible that this same problem could 
arise in current research, a much more subtle 
form of this criticism is less difficult to ap- 
preciate. 


Hierarchical Cortical Processing of 
Retinal Signals 


This less obvious—and therefore perhaps 
more dangerous—form of criticism against 
the inference of dominant cortical processes 
from successful dichoptic viewing was alluded 
to by Day (1958), but has been more recently 
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and explicitly raised by Julesz (1971) and 
Sakitt (1976). It is a criticism based on the 
notion of successive stages within the visual 
system (i.e., levels of information processing) 
and rests on two logical premises. First, it 
must be remembered that one's ability to 
perceive a single unified (and three-dimen- 
sional) world from two separate retinas with 
overlapping visual fields results to a large 
degree from the eventual combinations of the 
two inputs at higher brain centers. This 
cyclopean conceptualization appears consis- 
tent both with current neurophysiological 
(e.g., Barlow, Blakemore, & Pettigrew, 1967; 
Pettigrew, 1972) and with psychophysical 
(e.g Blake & Fox, 1973; Ono, Angus, & 
Gregor, 1977) evidence. Second, as men- 
tioned earlier, it should be noted that one's 
awareness of some visual stimulus, no matter 
how completely dependent on retinal pro- 
cesses, results from the involvement of post- 
retinal centers in the visual system. In the 
context of postulating the physiological basis 
for iconic memory, Sakitt (1976) made the 
distinction between the locus of the percep- 
tion of some visual effect and the physical 
locus of the effect itself. Postretinal processes 
are necessarily involved in the former, yet a 
researcher frequently attempts to infer the 
latter from various experimental procedures 
such as the dichoptic viewing paradigm. To 
clarify further this distinction between the 
locus of a visual effect and the locus of the 
perception of that effect, consider the classic 
demonstration by Craik (1940) involving 
afterimages. He was able to show that al- 
though afterimages could be established with- 
out subjective awareness by temporarily pres- 
sure blinding the stimulated eye, higher cen- 
ters were necessary for the awareness of the 
afterimage. Even retinal fatigue requires post- 
retinal centers for subjective appreciation. 
These two points simply reflect the basic 
fact that one's normal two-eyed perception 
of the world typically results from the even- 
tual involvement of binocularly innervated 
cortical centers that in some way combine the 
inputs from the two retinas. Now, with re- 
gard to the dichoptic viewing paradigm, sup- 
pose a particular visual manipulation re- 
sults in some change (unspecified) in the 
retinal signals in just one eye. These signals 
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will be transmitted to the higher centers in 
the visual system, where they may then inter- 
act with whatever signals are arriving from 
corresponding retinal regions of the other 
eye. Electrophysiological recording at the 
single-cell level in the striate cortex of the 
cat has demonstrated the existence of binocu- 
lar interaction fields of excitatory and inhibi- 
tory regions that describe the activity of a 
simple cortical cell as a function of the rela- 
tive stimulation of both eyes (Bishop, Henry, 
& Smith, 1971). The point is that dichoptic 
viewing of some visual phenomenon may ap- 
pear to reflect critical central processes even 
though retinal factors may dominate and 
largely define the phenomenon. This results 
from the basic fact that one's awareness of the 
visual world necessarily involves postretinal 
structures, and in these successive postretinal 
stages of information processing, monocular 
inputs are eventually combined. Hence, an 
experimental procedure that can tap directly 
only postretinal processing (i.e., observers’ 
verbal reports) may overestimate the cortical 
contributions to some visual phenomena be- 
cause of the interaction between incoming 
retinal signals, This is a serious weakness of 
the dichoptic viewing procedure. 

To illustrate the potential confounding of 
this relatively late combined processing of 
retinal inputs, it may be best to focus on 
those studies that have employed a slight 
variation of the usual dichoptic procedure (see 
Footnote 1). In these studies, some complete 
visual pattern is presented to one eye, and 
the effect of introducing some other stimulus 
to the other eye is determined. If this latter 
stimulus has some effect on the perception of 
the stimulus to the first eye, purely central 
processes are inferred to be involved. For 
example, Jacewitz and Lehmann (1972) pre- 
sented a typical Sperling (1960) partial- 
report task to an observer's left eye while 
varying the input to the right eye from that 
of a blank field to a train of either blank or 
grid flashes. For the partial-report task, nine 
letters in a 3 X 3 array were presented for 
50 msec, followed at some brief delay by a 
variable-pitch tone that directed the observer 
to report the top, middle, or bottom row of 
letters only. The average number of letters 
available to the observer (i.e., the percentage 
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of the three letters correctly reported per trial 
times the total number [nine] presented per 
trial) was then estimated for each time delay 
condition and as a function of the three visual 
conditions for the contralateral eye. The de- 
crease in recall performance with increasing 
complexity of the contraocular signals was 
interpreted in terms of reduced central pro- 
cessing capacity available; and, hence, the 
cortical locus of the iconic memory theoreti- 
cally assessed by the partial-report procedure 
was concluded. Similar logic underlies the 
use of an analogous procedure in the study 
of critical flicker frequency (CFF). Varia- 
tions in the values of CFF for a given inter- 
mittent stimulus presented to one eye when 
the input to the other eye is varied "are 
presumed to reflect central interaction pro- 
cesses" (J. L. Brown, 1966b, p. 259). 

In some cases, the conclusion of dominant 
cortical processes from these studies may well 
be correct, However, the above brief quote by 
Brown reveals a potential flaw. What is cen- 
tral, the phenomenon itself, or just the inter- 
action with the contraocular signals? Did 
Jacewitz and Lehmann in the study described 
previously demonstrate the cortical locus of 
the persistence required on the iconic mem- 
ory task or the cortical locus of the inter- 
ference from the other retinal signals? These 
uncertainties can be stated more formally 
within an information-processing framework 
(cf. Haber, 1974). Given a hierarchical 
arrangement of processing stages (both reti- 
nal and cortical) within the visual system, 
the dichoptic procedure cannot clearly dis- 
tinguish between visual effects whose domi- 
nant processes are cortical and those whose 
processes are only partly cortical or even 
those for which retinal processes are most 
critical. Some examples from the current 
visual literature should help to clarify this 
interpretive difficulty. 

Example 1: Spatial frequency channels in 
the visual system. Consider the recent study 
by Harter et al. (1976) in which differeat 
checkerboard patterns of various sizes were 
presented dichoptically to the two eyes. Vi- 
sual evoked potentials and reaction times to 
a checkerboard pattern that was flashed to 
the left eye were determined as a function of 
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the size of a constant checkerboard pattern 
viewed by the right eye was varied. Both re- 
sponse measures indicated reduced sensitiv- 
ity to a pattern presented to the left eye 
when that pattern was most similar to the 
pattern viewed by the right eye. The authors 
concluded that the results were consistent 
with the notion of binocularly innervated, 
size-specific cortical neurons, which were 
selectively adapted by the pattern presented 
to the right eye. 

It is not the purpose of the present discus- 
sion to dispute the specific conclusions of the 
Harter et al. study. As they pointed out, 
there is considerable converging evidence 
from other studies for the same conclusion; 
so, in fact, there may be good reason to 
accept their proposal. Rather, the im- 
portance of the Harter et al study in 
the present context is to demonstrate the 
plausibility of an alternate explanation of 
dichoptic results based upon the particular 
weakness of the dichoptic procedure that 
concerns the ultimate combination of retinal 
signals prior to an observer's response. In 
this regard, consider the predicted results if 
there were only retinal size-specific cells— 
perhaps the retinal ganglion cells with their 
concentric receptive fields of varying size. A 
fatiguing of one such set of cells in the right 
eye could result in increased noise or inhibi- 
tory signals being transmitted to the higher 
centers, thereby affecting the ultimately 
perceived input from the left eye. Hence, 
given the phenomenal overlap of the monocu- 
lar visual fields and the eventual convergence 
of ocular signals at cortical levels prior to 
awareness, it would be possible to explain the 
same pattern of results in terms of size- 
specific retinal processes. Unique processing 
by cortical centers need not be inferred. More- 
over, consider the likely pattern of results if 
the above study had employed simple homo- 
geneous fields of varying luminance levels 
also presented dichoptically. Results similar 
tæthose of Harter et al. that reflected a de- 
pression in performance with increasing test 
(left eye) and background (right eye) simi- 
larity would most likely be expected, but it is 
highly doubtful that such results would be 
interpreted in terms of the existence of bi- 


nocularly innervated, luminance-specific cor- 
$ 
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tical neurons. Before proposing such a con- 
struct, a researcher would first wish to rule 
out in some way the known retinal effects of 
varying luminance. 

Several other studies that probed the nature 
of the hypothesized spatial frequency (or 
size-specific) channels in the visual system 
also employed dichoptic techniques as experi- 
mental means by which to infer the locus of 
such effects. Blakemore and Campbell (1969) 
and Blakemore and Sutton (1969) demon- 
strated the interocular transfer of adapta- 
tion effects that were spatial frequency spe- 
cific, theoretically reflecting the adaptation 
of a narrowly tuned, binocularly innervated 
cortical channel. More recently, Blake and 
Levinson (1977) demonstrated interocular 
facilitation of grating detection (i.e., lowered 
threshold contrast for a monocular, striped 
target by the presentation of a subthreshold 
grating to the other eye). This effect also was 
spatial frequency specific: Optimal facilita- 
tion was found when the gratings to the two 
eyes were most similar. Although these stud- 
ies may indeed share some of the weaknesses 
of the dichoptic procedure detailed previ- 
ously, there are other critical factors that 
render their dichoptic procedure considerably 
less suspect. On the other hand, as mentioned 
in the context of the Harter et al. (1976) 
study, there is significant converging evidence 
both from other psychophysical studies and 
from electrophysiological investigations that 
supports the cortical locus of neurons in the 
visual system that are selectively sensitive to 
the spatial frequency of a retinal stimulus 
(cf. Blake & Levinson, 1977). However, even 
more important is the fact that in the three 
studies just cited, an additional empirical re- 
sult was also stressed that substantiates the 
postretinal interpretation of the locus of such 
effects. Specifically, the interocular effects 
described in these studies were also found to 
demonstrate orientational selectivity; that is, 
only if the contraocular stimulus was of 
roughly the same orientation (tilt) as the 
target stimulus was the interocular viewing 
condition comparable to that of monocular 
viewing. The importance of this additional 
demonstration for the conclusion of a central 
locus to the effects rests in the known types 
of receptive fields of retinal cells (cf. Kuffler 
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& Nicholls, 1976). The last stage of process- 
ing within the retina, that of the ganglion 
cells, exhibits concentric receptive fields of 
various sizes, Therefore, purely retinal pro- 
cessing of a stimulus may demonstrate some 
spatial frequency analysis either by the dif- 
ferential excitation of ganglion cells by stimuli 
of varying widths or by the combined activity 
of a large number of ganglion cells with ran- 
domly distributed receptive fields (Kelly, 
1975). Later, binocularly innervated cortical 
cells could simply reflect the processing prop- 
erties of this earlier Stage in the system. How- 
€ver, orientational selectivity of these same 
perceptual effects would not be expected un- 
til the involvement of higher centers that 
combine the output from several such concen- 
tric ganglion cells at specific orientations (cf. 
Hubel & Wiesel, 1962). Hence, the repeated 
finding in the literature of Successful dichoptic 
(i.e., interocular) viewing is not in itself con- 
clusive evidence for a characteristically or 
qualitatively different nature to the Processing 
at earlier ( retinal) levels in the system prior 
to the cortical conjunction of monocular sig- 
nals; but the complementary finding of orien- 
tational selectivity renders this same conclu- 
sion much more tenable. It should be stressed 
that the dichoptic procedure alone is not the 
convincing demonstration, 

These examples from the visual literature 
that deals with hypothesized Spatial fre- 
quency channels were intended to exemplify 
an inherent weakness of the dichoptic pro- 
cedure i 


sidering the other available evidence that 
Supports the central locus of these spatial fre- 
quency channels (cf. Blake & Levinson, 
1977; Breitmeyer & Ganz, 1976). But it 
should be remembered that the focus of the 
present argument is not on the Specific con- 
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clusions of these studies but on their meth- 
odology. Julesz (1971) similarly discussed 
this procedural weakness of the dichoptic pro- 
cedure, and Kolers’s (1972) argument against 
inferring a necessary cortical locus from the 
unimpaired dichoptic viewing of some phe- 
nomenon rested on the same points (Kolers, 
Note 1), The importance of this logical error 
can be demonstrated perhaps more per- 
Suasively by reference to an area of research 
that, like afterimages and aftereffects, may 
have been retarded by the uncritical and 
unquestioned use of the dichoptic procedure, 

Example 2: Locus of iconic memory, 
Iconic memory, or short-term visua] storage, 
refers to the persistence exhibited by the 
visual system following the brief presenta- 
tion of a target (cf. Neisser, 1967; Sakitt, 
1976). It has been argued in the past that 
the locus of iconic memory must be post. 
retinal because of the demonstrated dichoptic 
masking of the icon (cf. Dick, 1974; Jace- 
witz & Lehmann, 1972). However, there is 
increasing evidence from recent research that 
the bulk of what has been referred to in the 
literature as iconic memory involves retinal 
persistence effects (Long, 1978; Sakitt, 1976; 
Sakitt & Long, 1978). In her review of the 


iconic literature, Sakitt (1976) suggested that " 


the peripheral locus of the icon can be recon- 
ciled with the results of dichoptic masking 
experiments if it is assumed that the pro- 
longed signals from the retina are transmitted 
to the higher levels where binocular interac- 
tion occurs. Hence, although the masking it- 
self may take place at a central site of binoc- 


ular combination, the icons themselves та; /- 


be due entirely to prolonged photoreceptor 
activity. As mentioned previously, since the 
perception of any event occurs not in the eye 
itself but at higher levels, it is necessarily 
difficult using the dichoptic procedure to 
tease out the visual phenomena (e.g., after- 
images) that have their predominant impact 
on the retina (which in turn sends its output 
to binocularly innervated cortical centers) 
from the more purely central phenomena that 
are not contained in the monocular signals 
alone. 

A similar argument can also be raised 
against the conclusion by Haber and Standing 


(1969) of the central locus of iconic memory 
; 


1 
| 
l 
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from results of dichoptic presentation of 
stimuli in a nonmasking paradigm. In their 
study, a circular target of variable duration 
was repeatedly presented at various rates 
until the subject reported the target to be 
on continuously. The presentation of the 
target to alternate eyes on successive flashes 
(i.e., dichoptic viewing) had no effect on the 
chosen temporal interval for continuity of 
4. perception, as compared with monocular view- 
' ing conditions. This lack of difference be- 

tween monocular and dichoptic viewing of 
b the target was interpreted as evidence for 
the central locus of the persistence effect. 
However, in light of the present argument it is 
claimed that such results actually demon- 
strate only that the input from the two eyes 
is combined prior to the perceptual decision 
about the subjective duration, The results do 
not indicate whether the persistence Oc- 
curred prior to or at this point of combina- 
tion; they show only that the perception of 
the event occurred after the confluence of the 
retinal signals, 

Value of “correlograms” for localizing pro- 
cesses. The criticisms raised in this section 
against the inference of a cortical basis to 
, phenomena that can be perceived dichoptic- 
= *ally evince serious limitations to the prac- 
l tical value of the typical dichoptic demonstra- 
tion as an unambiguous psychoanatomical 
tool, However, it should be noted that these 
criticisms do not appear to be equally dam- 
aging to the related nonphysiological pro- 
cedures represented by random-dot stereo- 
grams. Julesz (1971) painstakingly outlined 
the value of random-dot stereograms, ana- 
"glyphs, and other forms of so-called correlo- 
grams in inferring cortical processes for a 
large range of visual phenomena. The usual 
А criticisms raised against drawing this cortical 
| inference from dichoptic demonstrations are 

much less applicable to this rather intriguing 

М class of stimuli. Each monocular pattern by 
| itself is drastically inadequate for perception 
| ~ of ће given phenomenon and appears to each 
| eye as composed of random dots. 1t is only in 
| the particular horizontal disparities between 
portions of the two retinal inputs that the 
complete visual phenomenon is defined. In 
Julesz’s (1971) words, *Random-dot stereo- 
grams do not contain, evem physically the 
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global information in the left and right reti- 
nal projections. It is only the relation between 
the left and right patterns that produces a 
pattern of the desired kind" (p. 7). Since this 
comparison between retinal inputs (i.e., dis- 
parity detection) can only occur after the 
locus of conjunction of monocular images, 
cortical processes must underlie the percep- 
tion of these phenomena that are portrayed 
exclusively on the cyclopean retina of the 
mind's eye.? 


Summary 


The purpose of the present article has been 
to point out several potential weaknesses that 
underlie the use of the dichoptic viewing pro- 
cedure to infer peripheral versus central 
processes in the perception of a given phe- 
nomenon. These problems with the dichoptic 
procedure were shown to limit the conclu- 
sions from either the successful (i.e. unim- 
paired) or the unsuccessful dichoptic viewing 
of some visual effect. It was argued that 
failure in dichoptic viewing, instead of neces- 
sarily reflecting the retinal locus of critical 
processes for the perception of a given phe- 
nomenon, could also result from either bi- 
nocular rivalry effects or the sufficiency of 
monocularly driven cortical events for a cer- 
tain percept. On the other hand, the success- 
ful dichoptic viewing of a visual phenomenon, 
instead of indicating the dominant cortical 
locus of the phenomenon, could result from 
the phenomenal overlap of the visual fields 
(such that the observer cannot distinguish 
actual transfer from continued output from 
the stimulated eye) and from the relatively 


3 An interesting recent example of the use of such 
cyclopean stimuli for localizing processes can be 
found in a study by Fox and Lehmkuhle (Note 2). 
Dynamic noise stereograms were used to present the 
brief array of letters in a Sperling (1960) partial- 
report task. Each monocular pattern alone appeared 
as randomly moving elements in the display; only in 
the disparity between certain identical portions in the 
two monocular inputs were the letters contained 
(ie. seen in depth). No iconic memory was found 
for these purely postretinal letters. Not only is this 
result consistent with other recent findings that in- 
dicate a peripheral locus to iconic memory (eg. 
Sakitt & Long, 1978) but it is also opposite to those 
results, described previously, that were obtained with 
traditional dichoptic procedures. 
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late stage of combination of the input from 
the two eyes. To illustrate each of these 
arguments, empirical examples were employed 
to demonstrate the plausibility of each alter- 
nate hypothesis to the classic dichoptic inter- 
pretation. It is believed that the criticisms 
raised in this article seriously question the 
previously uncritical and almost automatic 
use of the dichoptic viewing procedure for 
determining the level of the visual system 
that underlies performance on a given visual 
task, 


Reference Notes 


1. Kolers, P. A. Personal communication, summer 
1977, 


2. Fox, R, & Lehmkuhle, S. Iconic memory in 
stereospace: Seeing without storing. Paper pre- 
Sented at the meeting of the Psychonomic Soci- 
ety, Washington, D.C, November 1977. 
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Categories for Classifying Language in Psychotherapy 
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A review of language analysis systems employed in psychotherapy research sug- 
gests a typology based on the combination of three category types with two 
coding strategies. The types are (a) content categories, (b) intersubjective cate- 
gories, and (c) extralinguistic categories. They are defined by distinct sets of 
language features. The coding strategies are (a) the classical coding strategy, in 
which categories describe the text, and (b) the pragmatic coding strategy, in 
which categories describe the speaker. A review of research results suggests that 
the content, intersubjective, and extralinguistic features constitute distinct chan- 
nels of communication and that (a) the content channel carries information per- 
taining to the speaker's psychodynamic process and personality structure, (b) 
the intersubjective channel carries information pertaining to the quality of the 
speaker's relationship with the other, and (c) the extralinguistic channel carries 
information pertaining to the speaker's transitory emotional state. System con- 
sistency criteria are suggested for use in conjunction with the typology to evalu- 
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ate categories and category systems. 


If the first stage in the scientific study of 
a phenomenon is naming and classifying, the 
study of verbal behavior in psychotherapy is 
mired in its first stage. Reviews of the psy- 
chotherapy content analysis literature (Auld 
& Murray, 1955; Kiesler, 1973; Marsden, 
1965, 1971; Meltzoff & Kornreich, 1970) 
display an ungainly proliferation of categories 
and systems of categories to describe the 
verbal behavior of therapist and client. One 
reviewer (Kiesler, 1973) put it this way: 


Psychotherapy process research has to rank near the 
forefront of research disciplines characterized as 
chaotic, prolific, unconnected, and disjointed, with 
researchers unaware of much of the work that has 
preceded and the individual investigator tending to 
start anew completely ignorant of closely related 
previous work. (p. xvii) 


The proliferation of categories and cate- 
gory systems and the disorganization of this 
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area of research reflect the lack of consensus 
on what are the most significant aspects of 
verbal interaction, In the absence of a unified 
theoretical understanding, investigators seem 
dissatisfied with existing systems, and they 


continue to search for more illuminating ways а 


to capture the richness of verbal behavior in 
psychotherapy (Goodman & Dooley, 1976; 
Kepecs, 1977; Kiesler, 1973; Labov & Fan- 
shel, 1977; Strupp, 1957; Stiles, in press-b). 
The results of this continuing search—the 
numerous categories and classification sys- 
tems—have created a need for an integrated 
descriptive framework, a framework that 
would facilitate the comparison and evalua- 
tion of alternative categorization schemes 
(Freedman, Leary, Ossorio, & Coffey, 1951— 
1952; Reusch & Bateson, 1949; Rice, 1965; 
Rice & Wagstaff, 1967). The present article 
addresses this need; that is, it describes an 
order we see among existing language analysis 
categories and suggests guidelines for select- 
ing or creating classification systems appro- 
priate for particular research problems. 

Our framework is a typology of categories. 
We use three distinct sets of language fea- 
tures to identify three types of categories. 
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CLASSIFYING LANGUAGE IN PSYCHOTHERAPY 


The types we propose subsume the vast ma- 
jority of the language categories that have 
M ^" been used to study verbal interaction in psy- 
chotherapy. In brief, content categories, such 
as mother or death anxiety, concern denota- 
tive or connotative semantic content. Inter- 
subjective categories, such as self-disclosure 
or question, concern syntactically implied 
and other relationships between the communi- 
cator and recipient. Extralinguistic categories, 
such as pauses or laughing concern vocal 
noises, tonal qualities, and temporal pattern- 
* ing of speech, defined independently of se- 
mantic content and syntactic structures. 
Cutting across the three types are two dis- 
tinct coding strategies, previously described. 
by Berelson (1952) and by Marsden (1965, 
1971). In the classical strategy, categories 
describe characteristics of the text (or some 
other record of the communication), whereas 
in the pragmatic strategy, categories describe 
characteristics of the communicator such as 
his or her internal state, intentions, socio- 
economic class, and so on. For example, the 
category mother would be classical if it coded 
instances of maternal references in the text; 
the category death anxiety would be prag- 
matic if it coded utterances judged to reflect 
У“ *the communicator’s conscious or unconscious 
concern about death. The classical strategy. 
requires two operational steps from the raw 
text to inferences about psychological pro- 
cesses: The coder identifies instances of cate- 
gories in the text, and the researcher later 
makes inferences based on category frequen- 
cies (or indices derived from category fre- 
s». quencies). The classical strategy thus makes 
explicit the process of inference about the 
communicator's characteristics. The pragmatic 
strategy uses only one step; coders make in- 
ferences about psychological processes (or 
other characteristics of the communicator) in 
the process of coding. These inferences may 
~ or may not be based on specified behaviors 
^ (e.g., behaviors that instantiate death anx- 
iety may or may not be exhaustively cata- 
logued), but in either case, the specific be- 
haviors are not recorded. Thus the pragmatic 
strategy permits complex contextual judg- 
ments that may be impossible to specify com- 
pletely (Labov & Fanshel, 1977; Russell, in 
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press), but it obscures the relationships be- 
tween behaviors and inferred characteristics 
of the communicator (Marsden, 1971). 

The classical-pragmatic distinction has 
been used for distinguishing whole systems of 
categories or approaches to psychotherapy 
process research (Berelson, 1952; Marsden, 
1965, 1971), but it is better applied to char- 
acterize individual categories: Single systems 
can (and do) contain both classical and prag- 
matic categories. Likewise, although the clas- 
sical-pragmatic distinction has been used pri- 
marily for systems that employ content cate- 
gories, it is useful for intersubjective and 
extralinguistic categories as well. 

Previous efforts at organizing language re- 
search in psychotherapy have classified studies 
or systems by criteria other than category 
types. Auld and Murray (1955) distinguished 
methodological studies, descriptive studies of 
cases, and theoretically guided studies of ther- 
apy. Kiesler (1973) distinguished “systems 
of direct psychotherapy process analysis,” 
which focus on therapist and/or patient be- 
havior, from “systems of indirect psycho- 
therapy analysis,” which include indirect pro- 
cess analysis, therapist’s conceptions, and 
patient preferences. Marsden (1965, 1971) 
partitioned studies into three models, the 
classical, the pragmatic, and the nonquantita- 
tive. As noted above, we have adopted the 
classical-pragmatic distinction to classify 
categories. Several other authors have made 
distinctions that parallel aspects of our typol- 
ogy. Mahl and Schulze (1964), Matarazzo 
and Wiens (1977), and Phillips, Matarazzo, 
Matarazzo, Saslow, and Kanfer (1961) have 
distinguished extralinguistic categories and 
category systems from all other category and 
category system types. Phillips et al. recog- 
nized two controlling frames of reference: one 
that “directs attention to the communication 
aspects of verbal behavior, that is, to some 
symbolic content of the words spoken, using 
content analysis to define and quantify its 
variables" and one that "focuses upon the 
quantitative temporal characteristics of in- 
terview interaction, utilizing measures such 
as number and duration of utterances, dura- 
tion of silence, etc." (p. 260). Mahl and 
Schulze (1964) reviewed just those studies 
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concerned with the extralinguistic features of 
speech (eg. pitch, pauses, rhythm, etc.). 
Similarly, several investigators (Dollard & 
Auld, 1959; Murray, 1956; Seeman, 1949; 
Snyder, 1945, 1963) have implicitly recog- 
nized a distinction between content categories 
and intersubjective categories by constructing 
systems of content categories for patient ver- 
bal behavior and separate systems of inter- 
subjective categories (called technique cate- 
gories by Dollard & Auld, 1959) for therapist 
verbal responses. 


Table 1 
Content Category Systems 


Murray (1956); patient content categories 
pragmatic strategy 


Disturbance of free association 

Agreement with therapist remarks [intersubjective] 

Disagreement with therapist remarks 
[intersubjective] 

Intellectual discussion 

General anxiety 


Sex anxiety 

Sex frustration 

Affection 

Affection anxiety 
Affection frustration 
Dependence 

Dependence anxiety 
Dependence frustration 
Independence and self-assertion 
Independence anxiety 
Independence frustration 
Unspecified anxiety 
Unspecified frustration 


Stone, Dunphy, Smith, & Ogilvie (1966); Harvard 
Third Psychosociological Dictionary 
(partial list) ; classical strategy 


Natural realm; psychological processes; emotions 
Arousal—states of emotional excitement 
Urge—drive states 
Affection—incidents of close . . . 
Pleasure—states of gratification 
Distress—states of despair, guilt, shame, etc. 
Anger—forms of aggressive expression 

Thought 
Sense—perceptions and awareness 
Think—cognitive processes 
If—conditional words 
Equal—words denoting similarity 
Not—words denoting negation 
Cause—words denoting a cause-effect relationship 

Evaluation 
Good—synonyms for good 
Bad—synonyms for bad 
Ought—words indicating a moral imperative 


relationships 


ROBERT L. RUSSELL AND WILLIAM B. STILES 


Our 3 X 2 typology is not an exhaustive 
classification of methods used to analyze ver- 
bal interaction in psychotherapy. The typol- 
ogy is not intended to cover ratings of verbal 
behavior, though ratings clearly cut across 
our typology. Thus, for example, ratings of 
client “experiencing” (Klein, Mathieu, Gend- 
lin, & Kiesler, 1970; Rogers, 1958, 1959) or 
the “immediacy” of therapist or patient re- 
sponses (Mehrabian, 1972) are not included. 
Also, the typology is not intended to cover 
systems that classify language units marked 


Dollard & Auld (1959); patient's signs; 
pragmatic strategy 


An internal response-produced stimulus; anxiety, 
apprehension, distress, tension, or fear 

Unconscious anxiety or unconscious sense of guilt 

Anxiety is perceptibly reduced 

Reduction of unconscious anxiety or guilt 

Confirmation 

Dependence 

Unconscious dependence 

Patient is aware of and frightened by his dependent 
motive 

Conscious dependent motive to which unconscious 
anxiety is attached 

Dependent motive is unconscious; anxiety is 
conscious 

Both dependent motive and anxiety are unconscious 

Anxiety component is reduced 

Unconscious anxiety attached to conscious depen- 
dence motive is reduced 

Dependence motive is unconscious; 
anxiety component is reduced 

Unconscious anxiety evoked by unconscious de- 
pendence motive is reduced 


conscious 


White, Fichtenbaum, & Dollard (19665) ; patient's 


evaluation of self categories; 
pragmatic strategy 


Positive self-evaluation 

Anxiety about self, symptom, dissatisfaction 

Hostile feelings directed toward others; 
assertiveness 

Hostile feelings directed toward self; self-blame 

Nonsexual love and reduced anxiety about family 

Anxiety evoked by family members 

Sex motive, dating, marital relations 

Anxiety evoked by sex motive 

Involvement with academic and career motive; 
social mobility 

Anxiety evoked by academic/career motive Or 
social mobility 

, 
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Kepecs (1977) ; focal conflict categories; 
pragmatic strategy 


Positive human relations 

Hostility out 

Mastery 

Assertion 

Reactions (reactive motive) 
Hostility in 

Masochism 

Helplessness 

Other defenses 

Defensive hostility out 

Defensive positive human relations 
Defensive mastery 

Adaptive activity ; relatively nonconflictual solutions 


Hall & Van de Castle (1966) ; dream contents 
(partial list); classical strategy 
Objects 
Architecture 
Household 
Food 
Implements 
Travel 
Streets 
Regions 
Nature 


« Body parts 


"Clothing 
Communication 
Money 
Miscellaneous 


Laffal (1968) ; Cognitive Conceptual Dictionary 
(partial list) ; classical strategy 
Absurd 
Agree 1 (sympathy) 
Agree 2 (agreement) 
Agree 3 (similarity) 
All 1 (whole) 
All 2 (much) 
All 3 (frequent) 
Animal 
Art 
Astronomy 1 (space) 
Astronomy 2 (weather) 
Back 
Bad 
Begin 
Big 


Gottschalk & Gleser (1969); Anxiety Content 
Analysis Scale; pragmatic strategy 


Death anxiety 

Mutilation anxiety 
Separation anxiety 

Guilt anxiety 

Shame anxiety 

Diffuse or nonspecific anxiety 


Thibaut & Coules (1952) ; overt aggression 
categories; pragmatic strategy 


Direct aggression 
Indirect aggression 
Affective neutrality 
Friendly statements 
Self-augmentation 
Self-reduction 
Self-neutral 


Note. Discrepant category types are listed in brackets. 


for some special syntactic feature such as 
фазе (Bieber, Patton, & Fuhriman, 1977; 
Patton, Fuhriman, & Bieber, 1977). The cate- 
gories we discuss are mainly nominal vari- 
ables, so indices of intensity or degree are 
based on frequencies. We think it best to 
concentrate on the category types that appear 
most productive and recur most frequently in 
Psychotherapy process research rather than 
to attempt to include all possible categories. 
Ой the other hand, the typology we propose 
may also be useful for areas other than psy- 
chotherapy research (see Berelson, 1952; for 
an introduction to applications of language 
analysis categories to other fields in the social 
sciences, see Gerbner, Holsti, Krippendorf, 
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Paisley, & Stone, 1969; Holsti, 1968; Pool, 
1959). 

The category systems shown in Tables 1, 
2, and 3 were classified as content, intersub- 
jective, or extralinguistic, respectively, on the 
basis of predominant category type in the 
system. Similarly, they were classified as prag- 
matic or classical on the basis of the pre- 
dominant coding strategy that was employed 
to score speech units to the constituent cate- 
gories. 


Content Categories 


Content categories describe the semantic 
content of words or word groups in the text. 
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Table 2 
Intersubjective Calegory Systems 
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Stiles (in press-a) ; verbal response modes 


Classical strategy 
Disclosure form 
Question form 
Edification form 
Acknowledgment form 
Advisement form 
Interpretation form 
Confirmation form 
Reflection form 

Pragmatic strategy 
Disclosure intent 
Question intent 
Edification intent 
Acknowledgment intent 
Advisement intent 
Interpretation intent 
Confirmation intent 
Reflection intent 


Murray (1956); therapist content; 
pragmatic strategy 


Instructions 

Labels 

Strong approvals 
Disapprovals 

Demands 

Directions 

Mild probes 

Mild approvals 

Mm [classical strategy ] 
Not classifiable [residual] 


Bales (1970) ; interaction process analysis; 
pragmatic strategy 
Seems friendly [mixed] 
Dramatizes [mixed ] 
Agrees 
Gives suggestion 
Gives opinion 
Gives information 
Asks for information 
Asks for opinion 
Asks for suggestion 
Disagrees 
Shows tension [extralinguistic ] 
Seems unfriendly [mixed] 


Porter (1943); therapist checklist; 
pragmatic strategy 


Defining the interview situation [mixed content 
and intersubjective] 

Bringing out and developing the problem situation; 
leading 

Developing client’s insight and understanding; 
clarification, interpretation, and problem iden- 
tification 

Sponsoring client activity ; fostering decision making 


Snyder (1945) ; counselor categories ; 
pragmatic strategy 


Structuring 

Forcing client to choose and develop topic 
Directive questions 

Nondirective leads and questions 
Simple acceptance 

Restatement of content or problem 
Clarification or recognition of feeling 
Interpretation 

Approval and encouragement 
Giving information or explanation 
Proposing client activity 
Disapproval and criticism 


Lennard & Bernstein (1960) ; therapist informational 
specificity categories; pragmatic strategy 


Passive encouragement 

Active encouragement 

Limits to subjective matter area 
Limits to specific old proposition 
Interpretation 

Limits to specific answer 

Excludes discussion 

Introduces specific new proposition 


Strupp (1957); type of therapeutic activity ; 
pragmatic strategy 


Facilitating communication 

Exploratory operations 

Clarification 

Interpretive operations 

Structuring [content ] 

Direct guidance 

Activity not clearly related to the task of therapy 
[content ] 

Unclassifiable [residual] 


Bandura, Lipsher, & Miller (1960) ; therapist 
activity; pragmatic strategy 


Reflection 

Labeling 

Exploration 

Approval 

Ignoring 

Topical transition [content] 
Silence [classical extralinguistic ] 
Mislabeling 


Goodman & Dooley (1976); listener response 
modes; pragmatic strategy ж 
Question [classical strategy] 
Advisement 
Interpretation 
Reflection 
Disclosure 
Silence [classical extralinguistic ] 


Note. Discrepant category types are listed in brackets. 
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Table 3 
Extralinguistic Category Systems 


Dibner (1956); speech characteristics; 
classical strategy 


Unfinished sentence 
Breaking in with a new thought, generally by 
breaking into another sentence [mixed] 
Interrupted sentence 
Repeating words or phrases 
Stuttering 
I don't know [content] 
Sighing 
Laughing 
$ Voice change 
Questioning the interviewer [intersubjective] 
Blocking 


Rice & Wagstaff (1967); voice quality; 
pragmatic strategy 


Emotional 
Focused 
Externalizing 
Limited 


Matarazzo, Wiens, Matarazzo, & Saslow (1968); 
interaction chronograph; classical strategy 


Mean speech duration 
Mean speech latency 
Percentage of interruption 


Mahl (1956) ; speech disturbance categories; 
classical strategy 
“Ah” 
Sentence correction 
Sentence incompletion 
Repetition 
Stutter 
Intruding incoherent sound 
Tongue slip 
Omission 


Eldred & Price (1958); voice categories; 
classical strategy 


Alterations of pitch: overhigh and overlow 
Alterations of volume: overloud and oversoft 
Alterations of rate: overfast and overslow 
Breakup 


Lasswell (1935); speech categories; 
pragmatic strategy 


Slower speech rate; increase of unconscious tension 
Faster speech rate; decrease of unconscious tension 


Fairbanks & Pronovost (1939) ; pitch categories; 
classical strategy 


Pitch level 
Pitch range 
Extent of pitch shifts 


Note. Discrepant category types are listed in brackets. 


(Examples of content category systems are 

given in Table 1.) The categories describe 

manifest or latent content: Denotative mean- 

ing and connotative meaning constitute mani- 

fest content; referential-contextual, symbolic, 

or metaphorical meanings constitute latent 

content. For example, Lasswell (quoted in 

- Kaplan, 1943) wrote, “ʻI love my husband 

more than anyone in the whole world’ may be 

taken at its face value; or we may decide that 

the wife ‘doth protest too much.’ In the first 

instance we describe manifest content, and 

in the second, we interpret according to latent 

content” (p. 234). Manifest content is identi- 

». fied by the classical coding strategy, in which 

AS the coder makes inferences about the charac- 
tefistics of the communicator. 

Classical content categories describe the 
manifest content of the text, that is, either 
the dictionary meanings of the words or word 
groups that occur in the text or the connota- 
tive or constituent meanings normally as- 

spp, cribed to them, regardless of the context in 


which they occur. Thus classical content cate- 
gories serve “as a conceptual grid to be laid 
upon a language sample in order to reveal the 
density of the various concepts in the sample” 
(Laffal, 1968, p. 280). Classical content cate- 
gories vary in abstractness. The following is 
an example of a concrete category (also see 
Table 1): 


Implements: Three subclasses of implements are 
scored. The first letter of the scoring symbol is I to 
which a second letter is attached to indicate the 
subclass. 


Tools (Scoring symbol: IT). This subclass includes 
tools, and machinery parts, Objects that are used in 
vocational activities are generally included here, al- 
though some such as typewriter are scored in the 
communication class. Examples of the IT subclass 
are hammer, nail, saw, screwdriver, wrench, pliers, 
shovel, rake, lawn mower, lathe, X-ray machine, 
jack, level, and starting button of a machine. House- 
hold appliances are scored in the household class and 
parts of conveyances are scored in the travel class. 
(Hall & Van de Castle, 1966, p. 46) 
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At the other end of the range, words or word 
groups are coded according to their underly- 
ing associative or conceptual communality. 
For example, “We will start the project 
shortly” and “He was born on July second” 
have a common constituent meaning: begin- 
ning” (Laffal, 1968, p. 279). The coder in 
scoring the units start and born in the cate- 
gory begin (see Table 1) employs the classi- 
cal coding strategy—the semantic feature be- 
ginning is marked for words like start and 
born regardless of the context of their use. 
Scoring depends only on the capability of the 
coder to recognize semantically similar words 
or word groups. 

Pragmatic content categories describe some 
characteristic or condition of the communi- 
cator. Thus, instead of locating particular 
words or word groups with similar meanings 
the coder "now directly scores resistance, ten- 
sion, adjustment” (Dittes, 1959, p. 329). For 
example, Murray’s (1956; see Table 1) prag- 
matic content category, generalized anxiety, 
was defined as follows: 


Generalized Anxiety: Included all psychological and 
somatic expressions of anxiety which are not related 
to a drive nor related to any specific person or ob- 
ject; general “free floating” anxiety and guilt. 

а. “I feel panicky about the thought of death.” 

b. “I tremble and then it would ease off, then I 
start again . . . in waves.” 

с. “I feel tense as if there's some force inside of me 
trying to get out." (p. 27) 


Similarly, White, Fichtenbaum, and Dol- 
lard's (1966a, 1966b) pragmatic content cate- 
gory positive self-evaluation (see Table 1) 
would be scored for each sentence judged to 
reflect, by virtue of its content, a favorable at- 
titude toward the speaker. Thus, “I’m getting 
a broader idea of myself as an entity and full 
person" (White et al, 1966a, p. 108) is 
scored to the category positive self-evaluation. 
This scoring procedure typifies the pragmatic 
scoring strategy employed by those pragmatic 
content category systems enumerated in Table 
1. As the examples above illustrate, many of 
the pragmatic content categories of interest 
to researchers of psychotherapy are used as 
theoretical constructs by clinicians. 

A review of published research suggests 
that content categories have been most suc- 
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cessfully employed to investigate internal psy- € 
chodynamic processes, motives, drive condi- ~ 
tions, characterological traits, and changes | 
in these client characteristics in therapy. The 
focus on internal psychodynamic processes 
was made explicit by Murray (1954): “We 
propose to study the content of verbal be- 
havior in psychotherapy with respect to un- 
derlying motives and defenses” (p. 305). 
Murray (1956) also wrote, “Тһе categories , 
of the content analyses were defined in terms 
of motivation and conflict, influenced by psy- 
choanalytic and learning theories, and for- 
mulated in such a way as to be most relevant 
to an eventual understanding of the under- 
lying processes of psychotherapy" (p. 23). 
Auld and Dollard (1966) enumerated the 
key psychological phenomena that they felt 
were amenable to investigation with content 
categories: resistance, transference, uncon- 
scious motive, inhibition, dependence, hos- 
tility, and interpretation. Dollard and Mowrer 
(1947), Raimy (1948), Kauffman and Raimy 
(1949, Murray, Auld, and White (1954), 
Leary and Gill (1959), Freedman et al. 
(1951-1952), Lennard and Bernstein (1960, 
1969), Auld and White (1959), White et al. 
(1966a, 1966b), Hall and Van de Castle 
(1966), and Thibaut and Coules (1952) have 
carried out investigations that embrace simi- 
lar assumptions. 

As an example of the use of content cate- 
gories to investigate psychodynamic processes, 
Murray (1954) found that the frequency of 
hostility statements in psychotherapy in- 
creased as the frequency of defensive state- 
ments decreased; he interpreted this as eviz- 
dence that the client’s anxiety had decreased 
as therapy progressed. Thibaut and Coules 
(1952), focusing on the relative frequency 
of hostility statements, reported that residual 
hostility lessened for subjects who directly 
communicated their hostile feelings after be- 
ing provoked. White et al. (1966a) were able 
to determine that “the therapist focused тоге 
than the patient did during the active period 
of therapy on the areas of sex and evaluation 
of self" (p. 47) by the high proportion of 
statements scored to these content categories. 
By considering sex and evaluation of self as 
the therapist's target area, the authors were 
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able to report that the therapist had ap- 

2 parently been successful: The scored content 
of the patient's talk about these areas in- 
creased from the first to last quarter of ther- 

apy and was judged to be adaptive. Lennard 

and Bernstein (1960) reported that in the 
data collected from four therapists and two 
patients who interacted in a total of 500 ses- 
sions, “the therapists as a group led the pa- 

|. tients slightly in the proportion of communica- 
MW tion dealing with affect" (р. 85). In compar- 
ing the early with the later sessions, they 

+ found that “the proportion of both therapist 
and patient references devoted to feelings in- 
creased—almost doubled—for the sample as 
a whole" (p. 85). 

Content categories also reflect personality 
variables, as well as trends or focal points in 
the communicated content. Sarason, Ganzer, 
and Singer (1972) found that high- and low- 
defensive subjects used different content cate- 
gories to describe themselves, after having 
listened to models that differed in self-dis- 
closure style. Kepecs (1977) used content 
categories to locate the focal conflict of the 
client. Auld and White (1959) found that ex- 
perienced therapists are more likely to inter- 
"= vene than are apprentice therapists following 
| "à patient's utterance scored as resistance. 
They also reported that the patient's talk 
seemed to hang together: Categories on a 
specific topic were more likely to follow one 
another than to be followed by a category on 
a different topic. Although content categories 
have been used most consistently to investi- 
gate internal psychodynamics and charactero- 
wrt logical traits, one group of researchers (Gotts- 
chalk & Frank, 1967; Gottschalk & Gleser, 
1969; Gottschalk, Winget, & Gleser, 1969) 
have concluded that “the relative magnitude 
of an affect can be validly estimated from 
the typescript of the speech of an individual 
using solely content variables and not includ- 
S ing paralanguage variables" (Gottschalk & 
| њу Gleser, 1969, p. 96). А number of studies 
have been carried out that directly or indi- 
rectly address this issue (Cook, 1969; Gotts- 
chalk & Frank, 1967; Hart & Brown, 1974; 
Mahl, 1956, 1959; Markel, Meisels, & Houck, 
1964; Markel & Roblin, 1965; Mehrabian & 
Ferris, 1967). Though the empirical findings 
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are somewhat equivocal, Mahl's (1956) the- 
oretical assessment that “the most valid mea- 
sures [of transitory states] will be based on 
the expressive [i.e., extralinguistic] aspects of 
speech rather than on the manifest content 
measures” (p. 13) is still most compelling, 
empirically and theoretically (see Mahl, 1959, 
for a theoretical discussion of this issue; see 
Cook, 1969; Markel & Roblin, 1965, for some 
empirical evidence). 


Intersubjective Categories 


Intersubjective categories are descriptive of 
syntactically implied and other relationships 
between the communicator and recipient. For 
example, self-disclosure implies that the com- 
municator reveals something to the recipient, 
question implies that the communicator seeks 
information from the recipient, and so forth. 
(Examples of intersubjective category sys- 
tems are given in Table 2.) In contrast to con- 
tent categories, intersubjective categories can 
typically be defined without reference to the 
semantic content (or extralinguistic features) 
of the communication. 

Many intersubjective categories can be de- 
fined by syntactic features alone (Goodman 
& Dooley, 1976; Stiles, 1978, in press-b). For 
example, question has a well-attested set of 
syntactic features associated with it; self- 
disclosure can be defined as a first-person 
declarative sentence. Since these syntactic 
features are characteristics of the text, speech 
units are scored to such categories by means 
of the classical coding strategy. For example, 
in Stiles’s (1978, in press-a, in press-b) clas- 
sical intersubjective system (see Table 2), 
sentences such as “I’d really like to talk about 
my feelings of being an experimental sub- 
ject” and “I can’t stand needles” would be 
scored to the category disclosure form because 
of their first-person subjects. 

The intersubjective categories used in psy- 
chotherapy research are more often descrip- 
tive of interpersonal intentions, which may 
or may not be expressed using the correspond- 
ing syntactic form. For example, an utterance 
identified as a question by syntactic criteria 
may express the interpersonal intention “asks 
for information,” but it may also express the 
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interpersonal intention *gives suggestion," as 
in *Don't you think you should lock the 
door?" 

In Goodman and Dooley's (1976) system, 
such utterances as “I had the same problem 
and solved it with . . . ” or “Do you think it 
would work better if you tried ..." (p. 109) 
would be scored to the pragmatic content 
category advisement (see Table 2). To judge 
whether a communicator is giving a sugges- 
tion or is asking for information, the coder 
must infer the communicator’s intent and 
thus employs the pragmatic coding strategy. 
Pragmatic intersubjective categories are used 
frequently in interaction research, though they 
require the coder to make more complicated 
inferences than their classical counterparts. 
For instance, confrontation is defined by 
Barnabei, Cormier, and Nye (1974) as “a 
response indicating some sort of discrepancy 
in the client's message . . . a ‘you said but 
look' condition" (p. 356). Or, similarly, di- 
rect guidance (see Table 2) is defined by 
Strupp and Wallach (1965) as "suggestions 
for activity either within or outside of the 
therapeutic framework; giving information, 
stating an opinion, answering direction ques- 
tions, speaking as an authority" (p. 118). 

In published research, intersubjective cate- 
gories have been most consistently and suc- 
cessfully employed to measure psychothera- 
peutic technique, interpersonal roles and re- 
lationships in therapy, and therapy process. 
Intersubjective categories have long been 
used to describe and teach psychotherapeutic 
technique. Freud (1912/1958) explicitly ar- 
gued in favor of using interpretation while 
he condemned suggestion and disclosure. 
Rogers (1942, 1951, 1957) advocated reflec- 
tion (or restatement or clarification) as a 
technique. More recently, comprehensive sys- 
tems of intersubjective categories have been 
developed to aid in training counselors and 
therapists (e.g, Goodman & Dooley, 1976; 
Ivey, 1971). Research evaluating these sys- 
tems has demonstrated their efficacy in train- 
ing new professionals (e.g., Moreland, Ivey, 
& Phillips, 1973). 

In view of the different technical prescrip- 
tions of the various schools of therapy, it is 
not surprising that intersubjective categories 
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consistently differentiate the therapeutic in- 
terventions made by practitioners of those 
Schools (Auerbach, 1963; Cartwright, 1966; 
Staples, Sloane, & Whipple, 1976; Strupp, 
1955, 1957; Stiles, in press-b). In addition, 
several studies have used intersubjective cate- 
gories to describe a single type of psycho- 
therapy (Porter, 1943; Seeman, 1949; Snyder, 
1945; Strupp, 1958). In these studies, client- 
centered, psychoanalytic, existential, gestalt, 
and behavior therapists have been shown to 
use characteristic but different profiles of in- 
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tersubjective categories. Similarly, therapists’ 4 


style of participation, as well as that of 
clients, has been related to their choice of in- 
tersubjective categories (Rice, 1965, 1973; 
Rice & Wagstaff, 1967; Segal, 1970). Inter- 
subjective categories have also been employed 
to characterize the verbal interaction of schiz- 
ophrenic families. Lennard and Bernstein 
(1969) found that in a schizophrenic family 
the child's presentations of self are discon- 
firmed by the mother and “her presentations 
(of him) are disconfirmed by him" (p. 125). 
Bandura, Lipsher, and Miller (1960), Frank 
and Sweetland (1962), Murray (1956), Rott- 
Schafer and Renzaglia (1962), and Winder, 
Farrukh, Bandura, and Rau (1962) showed 


that differential utilization of certain inter-' 


subjective categories resulted in alteration of 
the content of the client's speech. 


Extralinguistic Categories 


Extralinguistic categories are descriptive of 
vocal noises that do not have the structure of 
language: modifications such as pitch, reso-, 
nance, amplitude, and so on of language and 
other vocal noises, and temporal patternings 
associated with language behavior. (Examples 
of extralinguistic category systems are given 
in Table 3.) Extralinguistic categories are de- 
fined without reference to either semantic con- 
tent or syntactic structure. 

Vocal noises that do not have the structure 
of language have been termed vocalizations 
(Trager, 1958). The following six extralin- 
guistic categories are examples of vocalizations 
(1-3 from Mahl, 1956; 4-6 from Dibner, 
1956, 1958; see Table 3). 
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1. Whenever the definite ай sound occurs, 
it is scored. 

2. An intruding incoherent sound is a sound 
that is absolutely incoherent as a word to the 
listener. 

3. Stutter. 

4. Sighing (or deep breath). 

5. Laughing includes any kind of laugh or 
chuckle. 

6. Blocking occurs when there is groping 
for the proper expression, indicating unusual 

4 hesitation. 

Modifications of language and other vocal 
noises have been termed qualifiers (Trager, 
1958). Pitch, rhythm, resonance, loudness, in- 
tonation, and so on are typical modifications 
of the noises people emit. 

The following three categories (Fairbanks 
= & Pronovost, 1939) illustrate how опе spe- 
cific modification (i.e., pitch) might be de- 
lineated for use in analyzing patterns of 
speech (see Table 3). : 

1. Pitch level is median frequency in cycles 
per second. 

2. Pitch range is the highest minus the low- 
est pitch in cycles per second. 

3. Extent of pitch shifts is the change in 
“pitch between the last pitch measured in a 

given phonation and the first pitch measured 

in the phonation that follows. 

Utterance duration, utterance latency, rate 
of speech production, and so on are observ- 
able characteristics of speech production and 
can be termed temporal patterning in speech. 
An example is Matarazzo, Wiens, Matarazzo, 
and Saslow’s (1968) unit of latency silence 
(see Table 3), defined as “the duration of 
time from the moment one person in the dyad 
terminates an utterance until the second per- 
son begins his next comment” (p. 355). 

Most extralinguistic categories have used 
the classical strategy; indeed, investigators 
who use extralinguistic systems have prided 

jy themselves on the objectivity of their systems 

(&g, Mahl, 1959; Matarazzo et al, 1968; 

Phillips et al, 1961; Saslow & Matarazzo, 

1959). However, there is no inherent barrier 

to coding communicator characteristics directly 

from extralinguistic cues, that is to using 
the pragmatic strategy for extralinguistic cate- 
Jr BONES: An example of a pragmatic extralin- 
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guistic category is Bales’s (1950, 1970) cate- 
gory shows tension (see Table 2): 


Several varieties of acts are scored in this category, 
not all of which may seem similar on a superficial 
level. Laughter, in particular, may seem quite differ- 
ent from signs of anxious emotionality. Signs of 
anxious emotionality indicate a conflict between 
acting and withholding action. Minor outbreaks of 
reactive anxiety may first be mentioned, such as ap- 
pearing startled, disconcerted, alarmed, dismayed, per- 
turbed, or concerned. Hesitation, speechlessness, 
flurry, fluster, confusion, trembling, blushing, flush- 
ing, stammering, sweating, blocking-up, gulping, 
swallowing, or wetting the lips persistently may 
also be included. (Bales, 1970, p. 124) 


Judging from published research, extralin- 
guistic categories have been used most suc- 
cessfully to investigate transitory motiva- 
tional and emotional states. The association of 
speech disturbances with states of anxiety and 
tension has been long established by investi- 
gators using a variety of speech disturbance 
categories and indices of anxiety (Dibner, 
1956, 1958; Eldred & Price, 1958; Lasswell, 
1935; Mahl, 1956, 1959; Panek & Martin, 
1959). Dibner (1956), for example, found 
that situational anxiety produced by the use 
of ambiguous techniques by the interviewer 
was associated with speech disturbances in 
the patient. Kasl and Mahl (1965), Mahl 
(1959), and Cook (1969) have shown that 
speech disturbance categories are more sensi- 
tive to momentary states of anxiety than to 
trait anxiety. For example, Cook found that 
two measures of trait anxiety were not re- 
lated to the non-ah speech disturbance cate- 
gories (the measures of anxiety were the 
Taylor Manifest Anxiety Scale and the Mc- 
Reynolds Assimilation Scale), while transient 
anxiety was related. 

Extralinguistic categories have also been 
found to be sensitive to other transitory emo- 
tional states besides anxiety (Boomer, 1965; 
Eldred & Price, 1958; Hargreaves, Stark- 
weather, & Blacker, 1965). In fact, the as- 
sociation of affective states with extralin- 
guistic cues is apparently well enough ap- 
preciated by most people that the cues can be 
used to communicate specific emotions. For 
instance, Fairbanks and Pronovost (1939) 
have shown that the communication of five 
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different emotions (i.e., contempt, anger, fear, 
grief, and indifference) can be reliably dis- 
tinguished by measures of mean pitch levels, 
mean pitch ranges, and the mean extent of 
pitch shifts and that these specific emotions 
can be reliably identified by listeners. Simi- 
larly, Beier and Zautra (1972) have shown 
that the affective information communicated 
extralinguistically can even be understood, at 
least in part, by people of different cultures. 

In more recent years, research with ex- 
tralinguistic categories has identified more 
subtle—though equally important—transitory 
states. Boomer and Dittman (1963) sug- 
gested that filled pauses might serve as an in- 
dex of self-monitoring by clients in psycho- 
therapy, and the same function for unfilled 
pauses has been suggested (Rochester, 1973). 
Manaugh, Wiens, and Matarazzo (1970) 
found that subjects instructed to lie to their in- 
terviewer showed significant differences in 
their mean duration of utterance as compared 
with subjects who were not told to lie. An- 
other group of researchers (Butler, Rice, & 
Wagstaff, 1962; Duncan, Rice & Butler, 1968; 
Rice, 1965; Rice & Wagstaff, 1967; Wexler & 
Butler, 1976) have tentatively identified sev- 
eral constellations of extralinguistic features 
that have differential associations with therapy 
outcome, counselor and client participation, 
and the “good” hour. 

Extralinguistic features of speech have also 
been used to investigate personality traits or 
characterological makeup. However, reviewers 
of this field of inquiry have been skeptical of 
the appropriateness of using extralinguistic 
categories as a basis for making inferences 
about personality types. For instance, Stark- 
weather (1961) maintained that “despite the 
frequent reports of success in identifying per- 
sonality traits from vocal cues, the numerous 
[reported] failures . . . leave the writer pessi- 
mistic concerning the utility of inferring such 
traits from non-verbal [but vocal] stimuli? 
(p. 65). Although recent work in this area has 
shown some success, parallel studies frequently 
do not confirm the findings. For example, the 
personality dimensions of assertiveness/domi- 
nance and extraversion/sociability have been 
identified with high interjudge agreement from 
extralinguistic features (Markel et al., 1964; 
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Markel & Roblin, 1965; Sherer, 1972). How- 
ever, in related studies, patterns inconsistent $ 
with the above results were obtained (e.g, 
Hart & Brown, 1974). 


Recommendations 


Criteria for constructing and evaluating cate- 
gory systems have been proposed by others 
(Butler et al, 1962; Goodman & Dooley, # 
1976; Heyns & Zander, 1953; Holsti, 1968, 
1969; Lazarsfeld & Barton, 1951; Weick, , 
1968). These criteria fall into two classes, 
those that deal with the practicability of sys- 
tems and those that deal with the internal con- 
sistency of categories and category systems. 
Although our typology can serve as a useful 
guide in identifying or constructing categories 
and category systems that conform to the cri- 4 
teria of both classes, we limit our discussion to “$ 
(a) suggesting means by which a researcher 
can attain internally consistent categories and 
category systems and (b) pointing out some 
common methodological problems. (See Good- 
man & Dooley, 1976, for a recent set of prac- 
ticability criteria.) 


System Consistency Criteria 


1. The categories within the system should 
be mutually exclusive, that is, “there should be 
one and only one place to put an item within 
a given classification system" (Lazarsfeld & 
Barton, 1951, p. 151; see also Butler et al., 
1962; Holsti, 1969). 

2. The categories within the system should 
be exhaustive: All relevant items in the sam- 
ple must be capable of being placed into a 
category (Holsti, 1969). 

3. The categories within the system should 
be derivable from a single classification prin- 
ciple; that is, *conceptually different levels of 
analysis must be kept separate" (Holsti, 1969, | 
p. 100). “When an object is classified at the <ý 
same time from more than one aspect, eath 
aspect must have its own separate set of cate- 
gories" (Lazarsfeld & Barton, 1951, p. 157); 
that is, if one is interested in classifying items 
in terms of a number of different aspects si- 
multaneously, a fully multidimensional clas- 
sification must be set up. 2 


Our typology suggests two recommenda- 
tions for meeting the system consistency cri- 
teria: (a) Categories should be pure types, 
that is, content, intersubjective, or extra- 
linguistic combined with either the classical or 
the pragmatic strategy, rather than conjunc- 
tive or disjunctive mixtures, and (b) cate- 
gories within a system (or subsystem) should 
be of the same type. 

Thus, we identify two types of problematic 
coding schemes: categories that are conjunc- 
tive or disjunctive mixtures and systems that 
^ mix together category types. For example, the 
Strupp (1957) category structuring (see Ta- 
ble 2) is scored when the therapist is judged 
to be structuring the therapeutic situation 
(ie. an intersubjective category type) or 
when the therapist is discussing theory (i.e., 
a content category type). A mixed system is 
one in which the categories are of more than 
one type. For example, Heyns and Zander 
(1953, p. 391) pointed out that Bales's 
(1950) interaction process analysis system 
(see Table 2) contains more than one type of 
category: “Category 3, Shows Solidarity and 
Category 2, Shows Tension Release seem to 
| be descriptions of interaction along affective 
jp... dimensions” (and are defined with reference 

to extralinguistic features); “Category 5, 
Gives Opinion, Category 6, Gives Orienta- 
tion and Category 4, Gives Suggestion refer to 
intellectual problem-solving activity of the 
group” (and are defined with reference to 
intersubjective features). 
Following the recommendations suggested 
by the typology does not constrain the re- 
"t. searcher from employing a variety of cate- 
“Bory and category system construction strate- 
gies, but does help facilitate the systematic 
organization of such strategies. For instance, 
the sentence “Did m-mother really leave?” 
can be coded to any of the three category 
types, using either the classical or the prag- 
matic strategy. It might be scored to the 
ўқ. Classical content category mother, the prag- 
matic content category separation anxiety, 
the classical intersubjective category ques- 
tion, the pragmatic intersubjective category 
seeking reassurance, the classical extralin- 
guistic category stutter, and the pragmatic 
extralinguistic category nervousness. Insist- 


ing that a system contain only one type 
LI 
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of category (i.e., recommendation b) and that 
categories should be pure types (i.e., recom- 
mendation a) does not guarantee that cate- 
gories within a system will be mutually ex- 
clusive (i.e., system consistency criterion 1) 
or that each of the categories will be deriva- 
ble from a single classification principle (i.e., 
system consistency criterion 3), but permit- 
ting a system to contain more than one cate- 
gory type or mixed categories virtully guar- 
antees that the system will not be mutually 
exclusive or derivable from a single principle. 

Application of the typology makes category 
identification explicit and reasonably simple, 
permitting researchers to separate concep- 
tually and empirically different levels of 
analysis that heretofore have too often been 
lumped together. Thus, logically sound cate- 
gory systems, consisting of pure category 
types, can be constructed with reference to 
the three sets of features, content, intersub- 
jective, and extralinguistic. By keeping these 
dimensions separate, researchers interested in 
more than one set of features can build truly 
multidimensional systems. 

Our view suggests that the content, inter- 
subjective, and extralinguistic categories cor- 
respond to distinct channels of verbal com- 
munication, which convey different types of 
psychological information. Information con- 
cerning the speaker's personality structure 
and dynamics is carried primarily in the con- 
tent channel; information concerning the na- 
ture of the speaker's current relationship to 
the other person is carried primarily in the 
intersubjective channel; and information con- 
cerning the speaker's transitory emotional 
state is carried primarily in the extralinguistic 
channel. Although these associations are not 
exclusive, they are strong enough to recom- 
mend that investigators interested in one 
type of information would be wise to select a 
system that codes the corresponding channel. 

The division of the study of language be- 
havior in psychotherapy into three areas 
reflects not only trends in the empirical stud- 
ies reviewed and theoretical and methodo- 
logical considerations but also historical and 
philosophical views at large. For example, 
Johnson (1976), influenced by Kuhn's (1970) 
analysis of the structure of scientific enter- 
prises, argued “that two of the major under- 
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lying paradigms in present day psychology 
are the behavioristic paradigm and the 
Freudian paradigm" (p. 4). He pointed out 
that those psychotherapy process researchers 
“who have been influenced by Freud have 
stressed the importance of the content of the 
interview" (p. 4), while those who have been 
heavily influenced by behaviorism (or more 
specifically, positivism) have “focused on 
clearly denotable subject behaviors” (p. 5), 
as are measured by the Interaction Chrono- 
graph (Matarazzo et al, 1968). A third 
paradigm in psychotherapy process research 
has grown out of the search for variables with 
"systematic interpersonal reference" (Freed- 
man et al., 1951-1952, p. 143). Researchers 
influenced by this interpersonal orientation 
describe relationships in terms of the sorts of 
information convéyed through intersubjective 
categories, 

If from one perspective process studies of 
psychotherapy have appeared chaotic, repeti- 
tive, and so forth, we have found that the 
typology provides a descriptive framework 
within which many of the unresolved and 
seemingly unrelated problems tend to cluster 
and come into clearer focus. We hope that 
this framework will provide some of the 
impetus needed to begin to move psycho- 
therapy process research out of its first stage. 
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Intraclass Correlations: Uses in Assessing 
Rater Reliability 
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Reliability coefficients often take the form of intraclass correlation coefficients. 
In this article, guidelines are given for choosing among six different forms of 


the intraclass correlation for reliability studies in which и targets are rated by k 
Judges. Relevant to the choice of the coefficient are the appropriate statistical 


model for the reliability study and the applications to be made of the reliability 


results. Confidence intervals 


Most measurements in the behavioral 
Sciences involve measurement error, but 
judgments made by humans are especially 
plagued by this problem. Since measurement 
error can seriously affect statistical analysis 
and interpretation, it is important to assess 
the amount of such error by calculating a 
reliability index. Many of the reliability 
indices available can be viewed as versions of 
the intraclass correlation, typically a ratio 
of the variance of interest over the sum of the 
variance of interest plus error (Bartko, 1966; 
Ebel, 1951; Haggard, 1958). 

There are numerous versions of the intraclass 
correlation coefficient (ICC) that can give 
quite different results when applied to the 
same data. Unfortunately, many researchers 
are not aware of the differences between the 
forms, and those who are often fail to report 
which form they used. Each form is appropriate 
for specific situations defined by the experi- 
mental design and the conceptual intent of 
the study. Unfortunately, most textbooks 
(e.g., Hayes, 1973 ; Snedecor & Cochran, 1967; 
Winer, 1971) describe only one or two forms 
of the several possible. Making the plight of 
the researchers worse, some of the older 
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for each of the forms are reviewed. 


references (e.g., Haggard, 1958) contain 
mistakes that have been corrected in a variety 
of forums (Bartko, 1966; Feldt, 1965). 

In this article, we attempt to give a set of 
guidelines for researchers who have use for 
intraclass correlations. Six forms of the ZCC 
are discussed here. We discuss these forms in 
the context of a reliability study of the ratings 
of several judges. This context is a special case 
of the one-facet generalizability study (G 
study) discussed by Cronbach, Gleser, Nanda, 
and Rajaratnam (1972). The results we 
Present are applicable to other one-facet 
studies, but we find the case of judges most 
compelling. 

The guidelines for choosing the appropriate 
form of the JCC call for three decisions: (a) 
Is a one-way or two-way analysis of variance 
(АМОУА) appropriate for the analysis of the 
reliability study? (b) Are differences between 
the judges' mean ratings relevant to the 
reliability of interest? (c) Is the unit of analysis* 
an individual rating or the mean of several 
ratings? The first and second decisions pertain 
to the appropriate statistical model for the 
reliability study, and the second and the third 
to the potential use of its results. 


Models for Reliability Studies 


In a typical interrater reliability study, each 
of a random sample of n targets is rated 
independently by & judges. Three different 
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" Analysis of Variance and Mean Square Expectations for One- and Two-Way Random Effects 


—————————————————————— 


EMS 
One-way Two-way Two-way 
random random mixed model 
Source of effects effects for for Case 3* 
variation df MS for Case 1 Case 2 
Between targets n-i BMS kor + сү? Ёст? + сг + ск? kor + ск? 
Within target n(k — 1) WMS ow сј + о? ак 0) + јо? + св? 
Between judges (k — 1) JMS cm пол? + о? + or nos + for + се: 
Residual (n — 1)(k — 1) EMS — er og fot + се: 


^ f = k/(k — 1) for the last three entries in this column. 


cases of this kind of study can be defined: 

1. Each target is rated by a different set of 
k judges, randomly selected from a larger 
population of judges. 

2. A random sample of k judges is selected 
from a larger population, and each judge rates 
each target, that is, each judge rates л targets 
altogether. 

_ 3. Each target is rated by each of the same # 
judges, who are the only judges of interest. 

Each kind of study requires a separately 
specified mathematical model to describe its 


results. The models each specify the decomposi- 


tion of a rating made by the ith judge on the 
ЈА target in terms of various effects. Among 
the possible effects are those for the ith 
Judge, for the jth target, for the interaction 
between judge and target, for the constant 
level of ratings, and for a random error com- 
ponent. Depending on the way the study is 
designed, different ones of these effects are 
Yestimable, different assumptions must be made 
about the estimable effects, and the structure 
of the corresponding anova will be different. 
The various models that result from the above 
cases correspond to the standard ANOVA 
models, as discussed in a text such as Hayes 
(1973). We review these models briefly below. 
Under Case 1, the effects due to judges, to 
the interaction between judge and target, 
and to random error are not separable. Let «i; 
denote the ith rating (; = 1, ..., k) on the 
Jth target (7= 1,..., т). For Case 1, we 
assume the following linear model for z;;: 


Xij = p + b; + wij. a) 


In this equation, the component џи is the 
overall population mean of the ratings; b; 
is the difference from и of the jth target's 
so-called true score (ie., the mean across 
many repeated ratings on the jth target); 
and w;; is a residual component equal to the 
sum of the inseparable effects of the judge, 
the Judge X Target interaction, and the error 
term. The component 5; is assumed to vary 
normally with a mean of zero and a variance 
of ст? and to be independent of all other com- 
ponents in the model. It is also assumed that 
the w;; terms are distributed independently 
and normally with a mean of zero and a 
variance of cw. The expected mean squares 
in the anova table appropriate to this kind of 
study (technically a one-way random effects 
layout) appear under Case 1 in Table 1. 

The models for Case 2 and Case 3 differ 
from the model for Case 1 in that the com- 
ponents of w,; are further specified. Since the 
same & judges rate all n targets, the component 
representing the ith judge's effect may be 
estimated. The equation 


wy = p + ai + bd (ab); t eu (2) 


is appropriate for both Case 2 and Case 3. 
In Equation 2, the terms vij, u, and 5, аге 
defined as in Equation 1; a; is the difference 
from р of the mean of the ith judge's ratings; 
(ab); is the degree to which the ith judge 
departs from his or her usual rating tendencies 
when confronted by the jth target; and ei; is 
the random error in the ith judge’s scoring of 
the jth target. In both Cases 2 and 3 the target 
component 0; is assumed to vary normally 
with a mean of zero and variance oz? (as in 
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Case 1), and the error terms e;; are assumed 
to be independently and normally distributed 
with a mean of zero and variance c. 

Case 2 differs from Case 3, however, with 
regard to the assumptions made concerning a; 
and (аб), in Equation 2. Under Case 2, a; isa 
random variable that is assumed to be normally 
distributed with a mean of zero and variance 
oy’; under Case 3, it is a fixed effect subject 
to the constraint Za; — 0. The parameter corre- 
sponding to оу? is 0j? = Za;?/(k — 1). 

In the absence of repeated ratings by each 
judge on each target, the components (аб) :; 
and e;; cannot be estimated separately. Never- 
theless, they must be kept separate in Equation 
2 because the properties of the interaction are 
different in the two cases being considered. 
Under Case 2, all the components (аб), where 
i21,...,b;j— 1,...,n, can be assumed 
to be mutually independent with a mean of 
zero and variance o7*. Under Case 3, however, 
independence can only be assumed for inter- 
action components that involve different 
targets. For the same target, say the jth, the 
components are assumed to satisfy the 
constraint 


k 
У (ab); = 0. 


i=l 


A consequence of this constraint is that 
any two interaction components for the same 
target, say (ab); апа (ab)v;j, are negatively 
correlated (see, e.g., Scheffé, 1959, section 8.1). 
The reason is that because of the above 
constraint, 


0 = var 0з (аб), ] = k var [(ab);;] 
+ k(k — 1) cov [ (ab), (ар), ] 
= Еол? + k(k — 1)с, 


say, where c is the common covariance between 
interaction effects on the same target. Thus 


а 


T grece 


с (3) 

The expected mean squares in the ANOVA 
for Case 2 (technically a two-way random 
effects layout) and Case 3 (technically a two- 
way mixed effects layout) are shown in the 
final two columns of Table 1. The differences 
are that the component of variance due to the 
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interaction (c7) contributes additively to each | 
expectation under Case 2, whereas under за 
Case 3, it does not contribute to the expected. 
mean square between targets, and it con- 
tributes additively to the other expectations 
after multiplication by the factor f= £/ (&—1). 
In the remainder of this article, various 
intraclass correlation coefficients are defined 
and estimated. A rigorous definition is adopted 
for the JCC, namely, that the JCC is the | 
correlation between one measurement (either f 
a single rating or a mean of several ratings) 
оп a target and another measurement ob- и 
tained on that target. The JCC is thus a 
bona fide correlation coefficient that, as is 
shown below, is often but not necessarily 
identical to the component of variance due | 
to targets divided by the sum of it and other 
variance components. In fact, under Case 3, 
it is possible for the population value of the +“ 
ICC to be negative (a phenomenon pointed 
out some years ago by Sitgreaves [1960]). 


Decision 1: A One- or Two-Way 
Analysis of Variance 


In selecting the appropriate form of the JCC, 
the first step is the specification of the ар- 9 
propriate statistical model for the reliability 
study (or С study). Whether one analyzes the 
data using a one-way or a two-way ANOVA 
depends on whether the study is designed 
according to Case 1, as described earlier, or 
according to Case 2 or 3. Under Case 1, the 
one-way ANOVA yields a between-targets mean 
square (BMS) and a within-target mean 
square (WMS). k 

From the expectations of the mean squares” 
shown for Case 1 in Table 1, one can see that 
WMS is as unbiased estimate of су"; in 
addition, it is possible to get an unbiased 
estimate of the target variance ст? by sub- 
tracting WMS from BMS and dividing the 
difference by the number of judges per target. 
Since the wi; terms in the model for Case 1 Я 
(see Equation 1) are assumed to be inde-- 
pendent, one can see that ст? is equal to the 
covariance between two ratings on a target. 
Using this information, one can write & 
formula to estimate p, the population value 
of the JCC for Case 1. Because the covariance 
of the ratings is a variance term, the index 4 
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in this case takes the form of a variance ratio: 


p р = с77/ (ст? + стр). 
The estimate, then, takes the form 
BMS – WMS 
ICC(1, 1) = 


BMS + (Е — )WMS' 


where k is the number of judges rating each 

target. It should be borne in mind that while 
(а ICC(1, 1) is a consistent estimate of p, it is 
biased (cf. Olkin & Pratt, 1958). 

If the reliability study has the design of 
Case 2 or 3, a Target X Judges two-way 
ANOVA is the appropriate mode of analysis. 
This analysis partitions the within-target sum 
of squares into a between-judges sum of 
squares and a residual sum of squares. The 
corresponding mean squares in Table 1 are 
denoted JMS and EMS. 
> It is crucial to note that the expectation 

of BMS under Cases 2 and 3 is different from 
that under Case 1, even though the compu- 
tation of this term is the same. Because the 
effect of judges is the same for all targets under 
Cases 2 and 3, interjudge variability does not 
affect the expectation of BMS. An important 
practical implication is that for a given 
к. Population of targets, the observed value of 

У BMS in a Case 1 design tends to be larger than 

that in a Case 2 or Case 3 design. 

There are important differences between 
the models for Case 2 and Case 3. Consider 
Case 2 first. From Table 1 one can see that an 
estimate of the target variance от? can be 
obtained by subtracting EMS from BMS and 
dividing the difference by &. Under the assump- 

Я ,, tions of Case 2 that judges are randomly 
"sampled, the covariance between two ratings 
on a target is again с", and the expression for 


Table 2 


Four Ratings on Six Targets 
Midi. oo 0 0 0o 0. 


Judge 
' 
21 Target 1 2 3 4 
s~ 

1 9 2 5 8 
2 6 1 3 2 
3 8 4 6 8 
4 7 1 2 6 
5 10 5 6 9 
6 6 2 4 7 
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Table 3 
Analysis of Variance for Ratings 
Source of variance df MS 
Between targets 5 11.24 
Within target 18 6.26 
Between judges 3 32.49 
Residual 15 1.02 


the parameter p is again a variance ratio: 
р = ar / (0r? + os? + oi? + oz’). 
It is estimated by 


ICC(2, 1) 
5 BMS—EMS 
ВИЗА (k—1)EMS+-k(JMS—EMS)/n’ 


where is the number of targets. To our 
knowledge, Rajaratnam (1960) and Bartko 
(1966) were the first to give this form. Like 
ICC(1, 1), ICC (2, 1) is a biased but consistent 
estimator of p. 

As we have discussed, the statistical model 
for Case 3 differs from Case 2 because of the 
assumption that judges are fixed. As the reader 
can verify from Table 1, one implication of this 
is that no unbiased estimator of ст? is available 
when о? > 0. On the other hand, under Case 
3, ат? is no longer equal to the covariance 
between ratings on a target, because of the 
correlated interaction terms in Equation 2. 
Because the interaction terms on the same 
target are correlated, as shown in Equation 3, 
the actual covariance is equal to от? — o7°/ 
(k — 1). Another implication of the Case 3 
assumption is that the total variance is equal 
to ez? + er? + oz’, and thus the correlation is 


_ or — er /(k — 1) 
PU о? Бог og 


This is estimated consistently but with bias by 


BMS — EMS 
1008, ) = gus + Ф DEMS 


As is discussed in the next section, the interpre- 
tation of ICC (3, 1) is quite different from that 
of ICC(2, 1). 

It is not likely that CC (2, 1) or ICC(3, 1) 
will ever be erroneously used in a Case 1 study, 
since the appropriate mean squares would not 
be available. The misuse of JCC(1, 1) on data 
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Table 4 
Correlation Estimates From Six Intraclass 
Correlation Forms 


Form Estimate 
ICC (1, 1) 17 
ICC (2,1) 29 
ІСС (3,1) 71 
ICC (1, 4) 44 
ІСС (2, 4) .62 
ICC (3,4) 91 


from Case 2 or Case 3 studies is more likely. 
A consequence of this mistake is the under- 
estimation of the true correlation p. For the 
same set of data, JCC (1, 1) will, on the average, 
give smaller values than JCC (2, 1) or JCC (3, 1). 

To help the reader appreciate the differences 
among these coefficients and also among the 
two coefficients to be discussed later, we apply 
the various forms to an example. Table 2 gives 
four ratings on six targets, Table 3 shows the 
ANOVA table, and Table 4 gives the calculated 
correlation estimates for various cases. 

Given the choice of the appropriate index, 
tests of the null hypothesis—that р = 0—can 
be made, and confidence intervals around the 
parameter can be computed. When using 


yom 
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_ (k — Dn — 1 ЖЕ) + nf + (k — 1)5] — Bp}? 


ІСС(1, 1), the test that p is different from x 
is provided by calculating F, = BMS/WM: 
and testing it on (п — 1) and n(k— 1) 
degrees of freedom. A confidence interval for p 
can be computed as follows: Let Fi_p(i,j) 
denote the (1 — 5)-100th percentile of the A 
distribution with 7 and j degrees of freedom 
and define 


Fo = Fo-Fy_yaln(k — 1), (n—1)] (4). 
and 
Fr = F./Fisal(n — 1), n(k— 1)]. (5) 
Then 
F,—1 Ву —1 : 
ъа) <°< 020 


When JCC(2, 1) is appropriate, the signifi 
cance test is again an F test, using Fo 0 
= BMS/EMS on (п — 1) and (k — 1) (n = 1) 
degrees of freedom. The confidence interval 
for JCC (2, 1) is more complicated than that for 
ICC(1,1), since the index is a function 
three independent mean squares. Following. 
Satterthwaite (1946), Fleiss and Shrout (1978) 
have derived an approximate confidence 
interval. Let j 


where Fy = JMS/EMS and р = ICC(2, 1). 
= Раа», (n — 1)], then 


n(BMS — F*EMS) 


(n — DRRR + {n[1 + (k — 08] — oy ° 


If we define- F* = ЕР, „Г (п — 1), v] and Fy 


n(F,BMS — EMS) 


F*[RIMS + (kn — k — n)EMS]+ nBMS € ^ 


gives an approximate (1 — «)-100% confi- 
dence interval around p. 

Finally, when appropriate, ICC(3,1) is 
tested with F, = BMS/EMS on (n — 1) and 
(n — 1)(k — 1) degrees of freedom. If we 
define 


Fr = Р,/Е, а (и — 1), (n—1)(&—1)); (8) 
Fy = Fo Ву ла (п — 1)(#— 1), (п – 1), (9) 
then 


Ер — 1 Еу — 1 
Р.о) 4 RUD 


isa (1 — a)-100% confidence interval for p. 


< EIMS F (en — k — )EMS + nF,BMS. КМ) 


Decision 2: Can Effects Due to Judges Be 
Ignored in the Reliability Index? 


In the previous section we stressed the 
importance of distinguishing Case 1 from 
Cases 2 and 3. In this section we discuss the 
choice between Cases 2 and 3. Most simply. 
the choice is whether the raters are considered 
random effects (Case 2) or fixed effects (Case 
3). Thus, under Case 2 we wish to generalize 
to other raters within some population, 
whereas under Case 3 we are interested only 
in a single rater or a fixed set of & raters. i 
Course, once the appropriate case is iden! 


fied 
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the choice of indices is between JCC(2, 1) and 
ICC (3, 1), as discussed before. 

Most often, investigators would like to say 
that their rating scale can be effectively used 
by a variety of judges (Case 2), but there are 
some instances in which Case 3 is appropriate. 
Suppose that the reliability study (the G 
study) precedes a substantive study (the 
decision study in Cronbach et al.’s terms) 


i, in which each of the k judges is responsible 


p. 


for rating his or her own separate random 
sample of targets. If all the data in the final 
study are to be combined for analysis, the 
judges’ effects will contribute to the variability 
of the ratings, and the random model with 
its associated ICC (2, 1) is appropriate. If, on 
the other hand, each judge’s ratings are 
analyzed separately, and the separate results 
pooled, then interjudge variability will not 
have any effect on the final results, and the 
model of fixed judge effects with its associate 
ICC (3, 1) is appropriate. 
Suppose that the substantive study involves 
a correlation between some reliable variable 
available for each target and the variable 
derived from the judges’ ratings. One may 
either determine the correlation for the entire 
study sample or determine it separately for 
“each judge’s subsample and then pool the 
correlations using Fisher’s 2 transformation. 
The variability of the judges’ effects must be 
taken into account in the former case, but 
can be ignored in the latter. 
Another example is a comparative study 
in which each judge rates a sample of targets 
from each of several groups. One may either 


t „compare the groups by combining the data 


from the # judges (in which case the component 
of variance due to judges contributes to 
variability, and the random effects model 
holds) or compare the groups separately for 
each judge and then pool the differences (in 
Which case differences between the judges' 
mean levels of rating do not contribute to 
yaribility, and the model of fixed judge 
effects holds). 

When the judge variance is ignored, the 
correlation index can be interpreted in terms 
9f rater consistency rather than rater agree- 
ment. Researchers of the rating process may 


x Choose between 1СС (8, 1) and ICC(2, 1) on 


425 


the basis of which of these concepts they wish 
to measure. If, for example, two judges are 
used to rate the same » targets, the consistency 
of the two ratings is measured by JCC(3, 1), 
treating the judges as fixed effects. To measure 
the agreement of these judges, JCC(2, 1) is 
used, and the judges are considered random 
effects; in this instance the question being 
askedis whether the judges are interchangeable. 

Bartko (1976) advised that consistency is 
never an appropriate reliability concept for 
raters; he preferred to limit the meaning of 
rater reliability to agreement. Algina (1978) 
objected to Bartko's restriction, pointing out 
that generalizability theory encompasses the 
case of raters as fixed effects. Without directly 
addressing Algina's criticisms, Bartko (1978) 
reiterated his earlier position. The following 
example illustrates that Bartko's blanket 
restriction is not only unwarranted but can 
also be misleading. 

Consider a correlation study in which one 
judge does all the ratings or one set of judges 
does all the ratings and their mean is taken. 
In these cases, judges are appropriately con- 
sidered fixed effects. If the investigator is 
interested in how much the correlations might 
be attenuated by lack of reliability in the 
ratings, the proper reliability index is ZCC (3, 1), 
since the correlations are not affected by 
judge mean differences in this case. In most 
cases the use of JCC(2, 1) will result in a lower 
value than when JCC(3,1) is used. This 
relationship is illustrated in Tablés 2, 3, and 4. 

Although we have discussed the justification 
of using JCC(3, 1) with reference to the final 
analysis of a substantive study, in many cases 
the final analytic strategy may rest on the 
reliability study itself. Consider, for example, 
the case discussed above in which each judge 
rates a different subsample of targets. In this 
instance the investigator can either calculate 
correlations across the total sample or calculate 
them within subsamples and pool them. If 
the reliability study indicates a large dis- 
crepancy between 7CC(2, 1) and ICC(3, 1), 
the investigator may be forced to consider 
the latter analytic strategy, even though it 
involves a loss of degrees of freedom and a 
loss of computational simplicity. 
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Decision 3: What Is the Unit of Reliability? 


The ICC indices discussed so far give the 
expected reliability of a single judge's ratings. 
In the substantive study (D study), often it is 
not the individual ratings that are used, but 
rather the mean of m ratings, where m need 
not be equal to #, the number of judges in the 
reliability study (G study). In such a case 
the reliability of the mean rating is of interest ; 
this reliability will always be greater in 
magnitude than the reliability of the individual 
ratings, provided the latter is positive (cf. 
Lord & Novick, 1968). 

Only occasionally is the choice of a mean 
rating as the unit of analysis based on sub- 
stantive grounds. An example of a substantive 
choice is the investigation of the decisions 
(ratings) of a team of physicians, as they are 
found in a hospital setting. More typically, 
an investigator decides to use a mean as a 
unit of analysis because the individual rating 
is too unreliable. In this case, the number of 
observations (say, m) used to form the mean 
should be determined by a reliability study 


‘in pilot research, for example, as follows. Given 


the lower bound, pz, on p from Inequality 6 or 
Inequality 7, whichever is appropriate, and 
given a value, say p*, for the minimum accept- 
able value for the reliability coefficient (e.g., 
p* = .75 or .80), it is possible to determine m 
as the smallest integer greater than or equal to 


is p*(1 — px) 
ра(! — p*) 


Once m is determined, either by a reliability 
study or by a choice made on substantive 
grounds, the reliability of the ratings averaged 
over m judges can be estimated using the 
Spearman-Brown formula and the appropriate 
ICC index described earlier. When data from 
m judges are actually collected (e.g., in the D 
study following the G study used to determine 
ту, they can be used to estimate the reliabilities 
of the mean ratings in one step, using the 
formulas below. In these applications, # = m. 
The formulas correspond to JCC(t, 1), 
ICC (2, 1), and ICC (3, 1), and the significance 
test for each is the same as for their correspond- 
ing single-rater reliability index. 
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The index corresponding to ICC(1,1) is | 
ІСС(1, В) = (BMS — WMS)/BMS. Letting ` 
Fz and Fy be defined as in Equations 4 and 5, 


1 il рст g. 

EL Fu 

is a (1 — a)- 100% confidence interval for the 

population value of this intraclass correlation. 
The index corresponding to ICC 2, 1) is 


BMS — EMS 
1000, D = BMS + JMS — EMSy/n 


The confidence interval for this index is most 
easily obtained by using the confidence bounds 
obtained for JCC (2, 1) in the Spearman-Brown 
formula. For example, the lower bound for 
ICC (2, k) is 


= kp** 
PLETE (k= рог 
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where pz** is the lower bound obtained for | 


ICC (2, 1). 

For 1СС (3, 1), the index of consistency for 
the mixed model case, the generalization from 
a single rating to a mean rating reliability is 


not quite as straightforward. Although the?" 


covariance between two ratings is от? — or'/ 
(k — 1), the covariance between two means 
based on k judges 15 ст". As we pointed out 
before, under Case 3 no estimator exists for 
this term. 

If, however, the Judge X Target interaction 
can be assumed to be absent, then the ap- 


propriate index is Ed 


ICC(8, k) = (BMS — EMS)/BMS. 


Letting F; and Fy be defined as in Equations 
8 and 9, 


1 1 

i= Ti psc Fo 

is a (1 — a)-100% confidence interval for the 
population value of this intraclass correlation. 
ICC (3, k) is equivalent to Cronbach's (1951) 
alpha; when the ratings of observers are 
dichotomous, it is equivalent to the Kuder- 


Richardson (1937) Formula 20. 
hg 
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INTRACLASS CORRELATIONS 


Sometimes the choice of a unit of analysis 
causes a conflict between reliability considera- 
tions and substantive interpretations. A mean 
of k ratings might be needed for reliability, 
but the generalization of interest might be 
individuals. 

For example, Bayes (1972) desired to relate 
ratings of interpersonal warmth to nonverbal 
communication variables. She reported the 
reliability of the warmth ratings based on the 


' judgments of 30 observers on 15 targets. 


Because the rating variable that she related 
to the other variables was the mean rating 
over all 30 observers, she correctly reported 
the reliability of the mean ratings. With this 
index, she found that her mean ratings were 
reliable to .90. When she interpreted her 
findings, however, she generalized to single 
observers, not to other groups of 30 observers. 
This generalization may be problematic, since 
the reliability of the individual ratings was 
less than .30—a value the investigator did not 
report. In such a situation in which the unit 
of analysis is not the same as the unit general- 
ized to, it is a good idea to report the relia- 
bilities of both units. 


Conclusion 


It is important to assess the reliability of 
judgments made by observers in order to 
know the extent that measurements are 
measuring anything. Unreliable measurements 
cannot be expected to relate to any other 
variables, and their use in analyses frequently 
violates statistical assumptions. Intraclass 
correlation coefficients provide measures of 
reliability, but many forms exist and each is 
appropriate only in limited circumstances. 

У This article has discussed six forms of the 
intraclass correlation and guidelines for choos- 
ing among them. Important issues in the 
Choice of an appropriate index include whether 
the Anova design should be one way or two 
Way, whether raters are considered fixed or 
random effects, and whether the unit of 
analysis is a single rater or the mean of several 
raters. The discussion has been limited to a 
relatively pure data analysis case, # observers 
ung n targets with no missing data (ie.. 
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each of the n targets is rated by exactly & 
observers). Although we have implicitly limited 
the discussion to continuous rating scales, 
Feldt (1965) has reported that for ICC(3, k) 
at least, the use of dichotomous dummy 
variables gives acceptable results. Readers 
interested in agreement indices for discrete 
data, however, should consult the Fleiss 
(1975) review of a dozen coefficients or the 
detailed review of coefficient kappa by Hubert 
(1977). 
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Equity Theory and the Cognitive Ability of Children 
J. G. Hook and Thomas D. Cook 


Northwestern University 


A number of studies indicate that preadolescents allocate more rewards to those 
who have done more work. Adams's equity theory is most often used to explain 
this finding. One assumption of equity theory is that persons compute ratios 
and compare them for proportionality. However, research on logico-mathemat- 
ical development indicates that children do not solve problems of proportional- 
ity until they are 11-15 years old. This suggests that equity theory may not be 
an adequate explanation of how children allocate rewards in experiments on 
equity. Children's allocation behaviors do change with age, from the possibly 
self-interested or equal allocations of children under 6 years, to the descriptive 
ordinal equity allocations of 6- to 12-year-olds, to the possibly proportional al- 
locations of persons 13 years and older. This sequence is consistent with the 
normal sequence of logico-mathematical development, suggesting that observed 
allocation behaviors may be a function of cognitive ability as well as manipu- 
lated situational variables. 


Adams's (1965) equity theory isa cogni- forms ratios of outcomes to inputs for each 
tive social comparison theory. It is concerned comparison person. Next, the allocator com- 
With the cognitive activity of the individual pares the ratios. If they are equivalent, or 
Who is confronted with a problem of dis- ` proportional, equity obtains. If they are not 
tributive justice. We call this individual the proportional for any two comparison persons, 
allocator, Adams assumed that the allocator inequity is assumed to result. Thus, inequity 
Constantly compares persons, including the obtains when 
allocator, on two dimensions: outcomes and 
"puts. Inputs are contributions (e.g. labor) Outcomes for Person X 
ог attributes (e.g., being well educated) that Inputs by Person X 


Justify claims on outcomes. Outcomes are Outcomes for Person Y 
rewards or desirable things, which can be = Й 


either tangible (e.g. pay) or intangible (e.g. 
ve). Adams assumed that the allocator first y inequity obtains, Adams assumed that the 
allocator will recognize it and feel a sort of 
È discomfort- This discomfort is then assumed 
ao author was partially supported by the to motivate the allocator to restore equity, 
niversity Social Sciences Program of Northwestern — either directly (eg, by redistributing re- 
Requests for Aes eras wards) or by cognitive distortion. 

prints: should ape Sanaa Adams’s equity theory incorporates the 


| y Cook, De 
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tionship of proportionality between inputs 
and outcomes, between work and reward; 
that is, rewards should be allocated according 
to merit. Each allocator, however, has sub- 
stantial latitude in determining the content of 
equity, because the allocator alone deter- 
mines which particular factors to include as 
inputs and outcomes and how each should 
be weighted. Adams's theory is not an ethi- 
cal theory because he did not say equitable 
allocations were good. Rather, he predicted 
that people would behave as if they believed 
equitable allocations were good. According 
to Adams, people behave that way because 
they are responsive to a social norm that 
prescribes equitable allocations and pro- 
scribes other allocations. The equity norm is 
presumably the cause of the allocator's dis- 
comfort when the allocator recognizes in- 
equity. 

Few equity theorists believe that the 
equity norm is always applicable. Rather, 
the norm is presumed to guide behavior in 
certain situations and not in others. When 
the equity norm is not situationally appro- 
priate, other norms may be relevant. The 
equality norm (Sampson, 1969), for example, 
commands that persons receive the same out- 
comes, no matter what their inputs. Self- 
interest, which is perhaps a genetically in- 
fluenced norm (Campbell, 1972), commands 
that persons maximize their own share. A 
norm for altruism (Bryan, 1972) commands 
the opposite. Other hypothesized norms, less 
relevant to this article, include reciprocity 
(Schopler, 1970; Staub, 1972) and “to each 
according to his need" (Berkowitz, 1972; 
Leventhal, Weiss, & Buttrick, 1973). 

Most allocation research has been designed 
to identify the personal and situational vari- 
ables that influence an allocator's choice of 
norms. For example, research indicates that 
under some conditions females make equal 
allocations of reward regardless of relative 
work (Uesugi & Vinacke, 1963), whereas 
males make equitable allocations (Vinacke 
& Stanley, Note 1). Under different condi- 
tions the reverse can be true (Kidder, Bellet- 
tirie, & Cohn, 1977). Other empirical work 
suggests that allocators make equal alloca- 
tions when group harmony is a salient concern 
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(Smith & Cook, 1973), but make equitable 
allocations when productivity is а salient 
concern (Leventhal, Michaels, & Sanford, 
1972). 

In the first study in which equity theory 
was used to predict children's allocation be- 
haviors, Leventhal and Anderson (1970) led 
individual children to believe that they were 
members of dyads doing work for the experi- 
menters. One independent variable was the 
sex of the child. The second involved a ma- 
nipulation of the amount of work a child 
did relative to the amount he was led to be- 
lieve that his (unseen) partner had per- 
formed. This relative-work information was 
the only information each child was given 
about his partner. Each child was assigned 
to one of three work conditions: superior 
work (three times as much work as the part- 
ner in a 15:5 ratio), equal work (same 
amount of work in a 15:15 or 5:5 ratio), 
and inferior work (one third as much work 
in a 5:15 ratio). The dependent measure 
was the number of rewards, out of a total 
of 20, that each child kept for himself when 
he was asked to distribute the rewards anon- 
ymously between himself and the other child 
in any way he desired. Equity theory was 
used to predict that the children would keep 
the same proportion of rewards as the pro- 
portion of work they had done. For example, 
the inferior-work child's equity cognition 
appeared first as the dilemma 


My work — 5 Other’s work = 15 
My reward = ? Оћег reward = ?' 

A 
to be resolved as 5/5 = 15/15. Thus, equity 
theory, which assumes that the allocator 
forms ratios in his head and compares them 
for proportional equity, predicts that su 
perior-, equal-, and inferior-work subjects 
should keep 15, 10, and 5 rewards, respet- 
tively. 

Table 1 shows the results of the Leventhal 
and Anderson study. Recall that the score 
could range from 0 to 20, with 10 indicating 
that the child divided rewards equally be 
tween himself and the other child. The meats 
for females do not differ significantly #08 
each other or from 10. This suggests tha! 
under the conditions of this experiment 


ба 
М 
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female allocators were influenced by an equal- 
ity norm. Males in the superior-work (15 
units) condition kept significantly more for 
themselves than did males in the two equal- 
work conditions. This result seems to support 
an equity norm interpretation, However, it 
does not support an equity interpretation in 
the strictest sense of Adams's theory. Rather, 
it is an ordinal equity, in which rank order is 
preserved from the work dimension to the 
reward dimension. The mean value of 12.7 
is halfway between the proportional equity 
predicted by equity theory (i.e., 15) and the 
equality norm (10). We henceforth assume 
that such behavior reffects ordinal equity. 
It is crucial to understand that even if 
the superior workers kept larger rewards and 
the inferior workers smaller rewards (al- 
though the latter did not occur in this 
study), such results would not necessarily be 
Consistent with equity theory. This is be- 
Cause the theory requires that workers keep 
the same proportion of rewards as their pro- 
portion of work. Thus, results like those 
found with the superior-work males of Leven- 
thal and Anderson cannot readily be inter- 
preted in terms of equity theory alone. One 
Would have to assume that proportional 


` equity is operating together with other forces 


influencing allocation, or that some alterna- 
tive (e.g., ordinal equity) is involved. 


Table 1 
Mean Number of Rewards Kept by Subjects 
în Leventhal and Anderson (1970) 


Partner's work unit 


Subject's work unit 5 15 
d Females 
Theoretical no, rewards 10 2 
fe cual M no. rewards 10.7 10.1 
Theoretical по, rewards 15 19 
ctual M no. rewards 10.3 10.2 
5 Males 
Theoretica] no. rewards 10 3 
Actual M no. rewards 10.6 пл 
Theoretical no, rewards 15 10 
ctual M no. rewards 12.7 10.3 
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Before we move to the literature review 
it is important that we briefly review two key 
aspects of equity theory. In the Leventhal 
and Anderson (1970) study, equity was in- 
ferred from the reward allocations of children 
from different input groups, and there was 
no explicit comparison with the precise scores 
predicted by the proportional equity theory. 
In other words, the hypotheses and inde- 
pendent variables were derived from a theory 
that assumes interval scales for the input 
and outcome variables, but inferences about 
equity were made merely on the basis of 
ordinal group differences on the dependent 
variable. It is as though equity researchers are 
content to infer equity if the mean differences 
are in the expected direction, irrespective of 
whether they differ from the theoretically 
specified numerical differences. The justifica- 
tions for such slippage are presumably (a) 
that experiments cannot control all inputs 
and outcomes and that some of the non- 
manipulated variables that subjects make 
relevant to a particular equity ratio may 
codetermine their allocative behavior; (b) 
that equity may be aroused simultaneously 
with other norms that codetermine alloca- 
tion; and (c) that equity theory should be 
understood as a general metaphor for study- 
ing allocative behavior rather than as a 
formal theory pretending to completeness 
and specificity. Unfortunately, equity theo- 
rists are rarely explicit about the reasons for 
inferring equity from ordinal patterns of data 
that are in line with, but different from, the 
more exact interval-scale predictions that the 
theory appears to make. 

Second, equity theory makes somewhat ex- 
treme assumptions about the cognitive ac- 
tivity of the allocator. Recall that the theory 
states that an allocator feels discomfort when 
equity ratios are not proportional. Almost all 
tests of equity leave this discomfort unmea- 
sured and thus assumed. Logically prior to 
this feeling is the assumption that the allo- 
cator sets up and compares ratios. Still prior 
is the assumption that the allocator is com- 
petent to set up and compare ratios. Then, 
of course, there is the large assumption that 
the allocator is able to conceptualize hetero- 
geneous outcomes (e.g., satisfaction or dol- 
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Table 2 
Allocation Literature 
pss E UVaRCOIAL: de orc erc -—-— ———— 
Age in Equity 
Study years Class Result quotient* 
Masters (1968) 3-5 P Self-interest or equality 
Nelson & Dweck (1977) 4 P Equality 18 
Peterson, Peterson, & McDonald (1975) 4 3rd Equality Ц 
Peterson, Peterson, & McDonald (1975) 4 3rd Equality i 
Antone & Hendricks 3-7 P Self-interest (3-6) 
(Note 3) Equality (7) 
Lane & Coon (1972) 4 P Self-interest 4 
5 P Equality 6 
Handlon & Gross (1959) 4-5 P Self-interest 
Lerner (1974) 5 Р Equality E. 
Lerner (1974) 5 3rd Ordinal equity 40^ 
Equality 225 
Hook (1978) 5 P Self-interest 22 
Leventhal & Anderson (1970) 5 P Equality ing 
D 
Leventhal, Popp, & Sawyer (1973) 5 3rd Equality 20 
Ordinal equity 40* 
Leventhal, Popp, & Sawyer (1973) 5 3rd Ordinal equity E 
б 
Сооп, Lane, & Lichtman (1974) 5 3rd Ordinal equity 46 
Larsen & Kellogg (1974) 4-8 B Equality 
Olejnik (1976) 5 3rd Ordinal equity 
Lerner (1974) 6 P Ordinal equity 
Equality 
Olejnik (1976) 6 3rd Ordinal equity 
Libby & Garrett (1974) 6 P Ordinal equity 
Streater & Chertkoff (1976) 6 P Ordinal equity 
Streater & Chertkoff (1976) 6 3rd Ordinal equity 
Anderson & Butzin (1978) 6 3rd Ordinal equity 
Hook, Brockett, & Smith (Note 4) 6-7 3rd Ordinal equity 
Olejnik (1976) 7 3rd Ordinal equity 
Coon, Lane, & Lichtman (1974) 7 3rd Ordinal equity 
Streater & Chertkoff (1976) 8 3rd Ordinal equity 
Streater & Chertkoff (1976) 8 3rd Ordinal equity 
Leventhal, Popp, & Sawyer (1973) 8 3rd Ordinal equity 
Leventhal, Popp, & Sawyer (1973) 8 3rd Ordinal equity 
Olejnik (1976) 8 3rd Ordinal equity 
Tompkins & Olejnik (1978) 7-9 3rd Ordinal equity 
Equality 
Anderson & Butzin (1978) 8 3rd Ordinal equity 
Cohen & Sampson (Note 5) 3-12 3rd 
Hook, Brockett, & Smith (Note 4) 8-9 3rd Ordinal equity 
Coon, Lane, & Lichtman (1974) 9 3rd Ordinal equity 
Hook (1978) 9 P Ordinal equity 
Handlon & Gross (1959) 9-11 Р Equality 
Libby & Garrett (1974) 10 P Ordinal equity 
Lerner (1974) 10 i Ordinal equity 
Lerner (1974) 10 3rd Ordinal equity 
Anderson & Butzin (1978) 10 3rd Ordinal equity 
Benton (1971) 1-12 P Ordinal equity and equality 
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| Table2 (continued) 
Age in Equity 
Study years Class Result quotient^ 

Hook, Brockett, & Smith (Note 4) 10-11 3rd Ordinal equity 48 

Morgan & Sawyer (1967) 10-12 P Ordinal equity 58 

Coon, Lane, & Lichtman (1974) 1 3rd Ordinal equity 68 

Streater & Chertkoff (1976) 12 3rd Ordinal equity 

Hook, Brockett, & Smith (Note 4) 12-13 3rd Ordinal equity 68 

Hook (1978) 13 Р Proportional equity 94 

“Garrett & Libby (1973) 14 P Proportional equity 96 

Anderson (1976) Adult 3rd Proportional equity 

Leventhal & Michaels (1969) Adult IP Ordinal equity 43 
40 

Leventhal, Weiss, & Long (1969) Adult P Proportional equity 109 

Leventhal & Lane (1970) Adult P Proportional equity 79» 
79° 

Lane, Messe, & Phillips (1971) Adult P Proportional equity 

Kahn (1972) Adult P Ordinal equity 61^ 
48° 

Cohen (1974) Adult P Proportional equity 183 

Shapiro (1975) Adult ју Proportional equity 80 

x Von Grumbkow, Deen, Steensma, & 
Wilke (1976) Adult 3rd Proportional equity 78 
Reis & Gruzen (1976) Adult P Ordinal equity 48 
Kidder, Bellettirie, & Cohn (1977) Adult P Ordinal equity 


Servers of others’ work, 

* Decimal points omitted. 

^ Subjects were male. 

З а Subjects were female. 
Correspondent problem. 

* Ratio problem. 


lars) and heterogeneous inputs (the quantity 
and quality of work or qualifications) and 
ìs able to weld them into a single, unidimen- 
sional scale of outcomes ог inputs. It may 
or may not be true that adults go through 
such a cognitive process, of which they may 
well be capable under some conditions This 
article is concerned with whether children 
are capable of such cognitive activities 
and whether they actually perform them. 


у 


Allocation Literature: Children and Adults 
и отешогр for the Literature Review 


n Table 2 is a summary of the allocation 
паше, Two types of studies are included. 
> Leventhal and Anderson (1970) study 
an example of one type, in which subjects 

are allowed to compare the work inputs of 
cd and then are asked to distribute re- 


Note. P = studies in which subjects asked to allocate rewards were also participants in the work and po- 
tential reward recipients; 3rd = studies in which subjects asked to allocate rewards were third party ob- 


wards (outcomes) among the persons who 
have performed work. The Libby and Garrett 
(1974) study is an example of the second 
type, in which the allocator is asked to dis- 
tribute rewards between persons who did 
equal work throughout but were unequally 
rewarded in a previous allocation. 

Two types of allocation studies are ex- 
cluded because they do not involve work 
comparisons and reward allocations among 
the same persons. First, in “helping” studies, 
allocators are asked to donate some of their 
rewards to a third party who did not work 
(Long & Lerner, 1974; Miller & Smith, 1977; 
Staub, Note 2) or are asked to share an un- 
earned gift with a friend or stranger (Ugu- 
rel-Semin, 1952; Wright, 1942a, 1942b). 
Second, in some equity studies, all the work 
and outcome values are told to subjects, mak- 
ing the computation of an equity formula a 
fait accompli. The dependent measures are 
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such factors as subjects' ratings of the al- 
locator’s attractiveness or fairness (Brick- 
man & Bryan, 1975, 1976) or the quality of 
the subject’s continued performance under 
some preestablished allocation rule (Lawler, 
1968). 

The explicit condition from Nelson and 
Dweck (1977) is excluded from Table 2. 
This is because children were instructed to 
give out rewards “so that you get the right 
amount for doing this much work” (p. 194), 
and then had a chance to physically copy in 
their reward allocation what the work dif- 
ferences had been. In the research reported 
in Table 2, children were instructed to allo- 
cate rewards in a fair manner or in whatever 
manner they wanted, and copying was not 
explicitly requested. 

The studies in Table 2 are arranged ac- 
cording to the ages of the subjects. Studies 
of multiple age groups are reported multiple 
times in appropriate age positions. The col- 
umn labeled Class indicates whether the sub- 
jects who were asked to allocate rewards 
were participants in the work and potential 
reward recipients (P studies) or were dis- 
interested third party observers of others’ 
work (3rd studies). 

The column labeled Result classifies each 
study with respect to the allocation principle 
the mean scores seem to indicate the sub- 
jects followed. The four classes of results are 
self-interest, equality, ordinal equity, and 
proportional equity. Self-interest could only 
occur in P studies, which means the subjects 
kept more for themselves than they allocated 
to others, no matter how much work each 
had done. Equality, ordinal equity, and pro- 
portional equity are defined with respect to 
the Equity Quotient column. The equity quo- 
tient is a numerical index of the extent to 
which the study results are consistent with 
Adams’s proportional equity, The quotient is 
the ratio of the percentage point differences 
between experimental conditions predicted by 
proportional equity theory to the differences 
between experimental conditions in the ob- 
served allocation behaviors. For example, in 
Leventhal and Anderson (1970) the inferior- 
work males should have kept 25% of the 
reward because they did 25% of the work. 
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The superior-work males should have kept | 
75% of the reward because they did 75% 
of the work. Thus, the gap between the two 
groups predicted by proportional equity the- 
ory is 50 percentage points (75 — 25 = 50). 
The actual difference between the two ex- 
perimental conditions, in terms of their actual 
allocation behaviors, was 8 percentage points. 
This is because the interior-work group kept 
an average of 55.5% of the reward, whereas 
the superior-work group kept an average of 
63.5%. The ratio of the observed gap (8) 
to the predicted gap (50) is therefore .16 
(8/50), which is the quotient reported in 
Table 2 for the males from Leventhal and 
Anderson. (Actually, all values in the table 
have been multiplied by 100 to avoid decimal 
points.) 

The ratio in Leventhal and Anderson can 
be expressed visually as 


Observed : ИА и 63.5 = s 
IDredicted 3925.0 е. 75.0 — 50. 


A quotient of zero means that subjects did 
not differentiate their reward allocations: 


Observed: 50-50 = 70) 
Predicted. 25.0 пао x4 15.0;::507 


A quotient of 100 means that subjects made 
reward allocations precisely as predicted by 
proportional equity theory: 


Obsérvad тео e eS 75 = 50 
Predicted; $25; ое 75 = 50: 


А quotient of 50 is the prototypical ordinal? 
equity relationship, with observed scores fall- 
ing between the equality and the propor- 
tional equity predictions: 


Observed: Gi: ses. 625 2125 
Predicted 125.205 ence een 15 = 50; 


In Table 2, then, scores between 0 and 25 
are labeled equality or self-interest. Scores- 
between 26 and 75 are called ordinal equity, | 
and scores greater than 75 are labeled 270" 
portional equity. 


1]f a study included only one work condition 
(Larsen & Kellogg, 1974) or more than two d 
in the reward allocations (Streater & Сенко | 


| Relationship Between Age and Equity 


The studies in Table 2 and other alloca- 
tion studies suggest that a number of inde- 
pendent variables influence children's reward 
allocations. Children's allocation behaviors 
may be influenced by whether the allocator 
is a participant or a third party, whether the 
participants are in a team or nonteam rela- 
tionship (Lerner, 1974), and whether they 
are cooperating or competing (Crockenberg, 
Bryant, & Wilce, 1976), expect future inter- 
actions with each other (Dreman, 1976), are 
male or female (Leventhal & Anderson, 
1970), believe the experimenter will or will 
not evaluate their allocations (Leventhal, 
Popp, & Sawyer, 1973), and have insufficient, 
sufficient, or oversufficient total reward to 
allocate (Coon, Lane, & Lichtman, 1974). 

However, the independent variable of spe- 
cial interest for this article is age, especially 
as it mediates logical development. The most 
striking age-linked feature of Table 2 is that 
the proportional equity responses predicted 
by equity theory are entirely absent in stud- 
les of subjects under 13 years of age. On the 
other hand, the results provide strong sup- 
port for an alternative, ordinal equity inter- 
pretation with children. Three sets of studies 
are especially interesting in this regard be- 
Cause the same investigators employed the 
Same designs with children 12 years of age 
ог under and with children or adults of 13 
years and over. The equity quotient scores 
for 5.9-year-old children in Leventhal and 
Anderson (1970) were 16 for males and 1 for 
pafemales, whereas adult quotients in Leventhal 
and Lane (1970) were 79 for both males and 
females. Libby and Garrett’s (1974) 6- and 
S-year-olds gave quotients of 60 and 53, re- 
Spectively, whereas Garrett and Libby’s 
(1973) 14-year-olds gave a quotient of 96. 
Hook’s (1978) 5-, 9-, and 13-year-olds gave 
quotients of 22, 56, and 94, respectively. 


us No quotient could be calculated. Also, in two 
the 165 (Anderson, 1976; Anderson & Butzin, 1978) 
Scal Independent manipulations were on an ordinal 
Wer (ек, one child gave little effort and the other 
15/5) effort) rather than on a ratio scale (eg, 

» Which did not allow quotient calculation. 

x 3765€ few studies are classified on logical grounds. 
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The average equity quotient in studies of 
children under 13 years was 36.9, whereas 
the average quotient for 13-year-olds and 
above was 79.9. Proportional equity, there- 
fore, appears to be restricted to the teen 
years and beyond. 

How can one explain the ordinal equity of 
childhood? Five possibilities are discussed 
below. 

Confounded inputs interpretation. One 
interpretation is that children use other in- 
puts in addition to work in their proportional 
equity equations. For example, suppose most 
children felt that need should be weighed as 
heavily as work. Since most studies provided 
no information about the other’s need, the 
average subject might assume that he and the 
other were equal on need. If so, the superior- 
work subject would have total inputs some- 
what greater than the other, but not so much 
greater than if he had used only work in 
forming inputs. Thus, in allocating rewards 
proportional to inputs, the superior workers 
would make what, to the researcher who was 
not aware of the hidden need input, looked 
like an ordinal equity allocation. Yet the 
allocation might be based on proportional 
equity. (As with inputs, a subject could also 
add hidden outcomes or rewards to those 
manipulated by an experimenter.) The un- 
controlled-inputs-and-outcomes problem is a 
major weakness of equity theory, making 
falsification extremely difficult. No matter 
what reward allocation a subject made, it 
could be argued that he was behaving equit- 
ably on the basis of some uncontrolled in- 
truding input or outcome. 

Nonisomorphism of physical and psycho- 
logical scales. A second interpretation fol- 
lows from the possibility that preadolescents 
transform the physical work scales into psy- 
chological scales of a different form. If so, 
one can imagine circumstances in which de- 
scriptive ordinal allocation data can be in- 
terpreted as consequences of proportional 
thinking of the type suggested by Adams 
(1965). But it is proportional thinking based 
on psychological scales that are not linear 
functions of the corresponding physical scales. 

Intra-allocator norm combination. А third 
interpretation of the ordinal equity data of 
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younger children is that they are simultane- 
ously aware of the allocations appropriate to 
equality and proportional equity norms but 
resolve any dilemmas associated with the 
different norms by taking the average. Such 
a compromise yields an ordinal equity allo- 
cation. This interpretation holds that older 
children have clear preferences for propor- 
tional equity over equality norms, which is 
why their data reflect proportional rather 
than ordinal equity. No data exist of which 
we are aware that probe whether children 
under about 13 years of age consider both 
equality and proportional equity norms be- 
fore deciding on a compromise between the 
two. 

Interallocator norm combination. A fourth 
interpretation is that some younger children 
make equality allocations, whereas others 
make proportional equity allocations. The 
average scores in this case would look like 
ordinal equity allocations. However, such an 
interpretation requires a bimodal distribu- 
tion of allocations. Hook’s (1978) data are 
not bimodally distributed, and no published 
study mentions a bimodal distribution or re- 
ports group differences in variances for the 
allocation measure. Consequently, the inter- 
pretation based on the interallocator norm 
combination does not seem likely. 

Inability interpretation. A fifth inter- 
pretation, the primary concern of this ar- 
ticle, is that children under the age of 13 
are typically incapable of solving problems 
of ratio proportionality. If this were true, it 
would rule out the first three interpretations, 
since all three assume that half or more 
of the children are engaged in the cognitive, 
ratio-proportionality activity posited by 
equity theory. The inability interpretation 
implies that children do not weight several 
norms to decide which to apply in a given 
situation because it postulates that children 
are incapable of recognizing proportional 
equity. However, the inability explanation 
does not rule out the possibility. that chil- 
dren under 13 years of age may be capable 
of acquiring and expressing work-reward 
relationships in an ordinal equity sense, as- 
suming that ordinal equity is an earlier ac- 
quisition than is proportional equity. 
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To decide whether children under the age 
of 13 can solve problems of proportional or 
ordinal equity, we turn to the literature on 
logico-mathematical development: the inter- 
view work of Piaget’s school and the more 
statistical, normative work of other investi- 
gators. | 


Development of Proportional Thought 


Table 3 is a list of Piaget's studies of 
the development of logico-mathematical pro- 
portion. The column labeled Task refers to 
the particular problem used by Piaget to 
explore proportional thought. According to 
his theory, proportional thought is a cogni- 
tive structure that is central to, or embedded 
in, all kinds of mathematical, physical, and| 
Social problems. Piaget has not studied pro- 
portional thought directly. Rather, he has 
studied it in the context of his research on 
such diverse yet proportion-dependent areas 
as geometry, chance, time, and functions, 
Piaget and his colleagues usually conduct | 
clinical interviews with children of different 
ages and classify their responses into aged 
related stages. Each stage is supposed to rep- | 
resent a qualitatively different approach, and 
the stages are thought to unfold in an im | 
variant sequence, which culminates in mature 
proportionality, the stage at which all logical 
components have been integrated into а со 
herent structure. The column labeled Attain- 
ment Age in Years refers to the approximate 
age that Piaget's subjects attained the stage 
of proportional thought. Piaget was not very 
explicit about how the attainment age W: 
defined, and it is obvious that the age de 
pended on certain task factors, for example, 
perceptual and memory requirements. There 
fore, the attainment ages should be considered 
estimates. 

To illustrate Piaget’s approach, we coni 
sider two of his studies. In studies of thé 
development of the concept of speed, Piaget 
(1970) asked children to trace and time 
movements of two objects, A and B, that 
ran in succession. The children were the 
asked which moved faster. The youngest chil" 
dren did not transfer the ordinal rank of thé 
objects from distance traveled to speed. 12 
other words, if A moved farther than B 


EQUITY THEORY 


: Table 3 


437 


Piaget's Research on the Development of Proportional Thought 
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Аре Attainment 
studied age 
Study Task in years in years 
Piaget & Inhelder (1956) Constructing similar triangles and rectangles 6-14 11-12 
Piaget (1957) Constructing similar triangles 6-14 11-12 
Inhelder & Piaget (1958) Guessing the size of shadows cast by rings 6-14 12 
Equilibrium creation on a balance arm with 
different weights 6-14 13 
Piaget, Inhelder, & Building equal-volume block structures on 
Szeminska (1960) unequal-area bases 6-14 11-12} 
Piaget, Grize, Szeminska, & 
Vinh Bang (1968) Making different-size fishes equally well fed 6-14 10-11 
Covarying different-size circles with 
different line positions 7-13 12-13 
Relative movements of wheels of different 
diameters 6-13 11-13 
Relative movements of objects pulled by 
different-diameter pulleys 6-14 12-14 
" Equilibrium on a balance arm 6-14 12-13 
Piaget (1970) Drawing lines to symbolize relative speeds 5-14 12-13 
4 of objects 
Piaget (1974) Stretching elastic bands and guessing 5-14 13-14 
relative lengths of segments 
Maintaining angles formed by strings held 
fast with different angles 5-14 13-14 
Guessing the number of small unit beakers 
d necessary to fill larger beakers 5-14 13-14 
Piaget & Inhelder (1975) Probabilities of lottery drawing outcomes 5-14 11-12 


from beakers containing various ratios of 


(29 elements 


the same amount of time, these children did 
Dot necessarily say that A moved faster. The 
7- to 9-year-olds did make the transforma- 
tion, but only if either distance traveled or 
time was held constant. If time and distance 
_ Were varied simultaneously, the 7- to 9-year- 
Mlds were unable to set up the necessary 
ratios of distance to time for comparison: 


Distance for A — Distance for В 
Time for A —  Timefor B ` 


This problem is of course highly analogous 

to the equity theory problem, if one sub- 
_ Stitutes outcomes for distance and inputs for 
atime. 

The 10- and 11-year-olds could often solve 
the two-variable problem for simple 2:1 
ratios, However, their solutions may have 
Ns. Intuitive instead of formal. They did 

Seem to be able to explain why they 
, Blessed as they did. Only 12- to 14-year-olds 
у оша Solve the problems for more complex 


ratios and state the principle that distance 
divided by time equals speed. 

Piaget (1974) obtained analogous results 
by asking children to estimate the lengths of 
elastic band segments. The bands were di- 
vided into Segments A and B, distinguished 
by color. Segment A was longer than Segment 
B by various ratios, For example, in a 2:1 
ratio problem, the elastic band at rest was 
3 cm long, with segments of 2 cm (A) and 
1 cm (B). The band was then stretched to 
lengths that were multiples of its rest length, 
for example, doubled to 6 cm. Before the 
children could examine the segments of the 
expanded band, they were asked how long 
each segment should be. The youngest chil- 
dren (4-5 years) failed to preserve ordinality 
through the transformation, not realizing 
that A should be longer than B. Somewhat 
older children (6-9 years) noted that A 
should be longer, but did not preserve the 
ratio relationships between A and B. Rather, 
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Table 4 


Anglo-A merican Research on t. 


Study 


Task 
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he Development of Proportional Thought 


Lovell (1961) 


Lunzer (1965) 
Lovell & Butterworth (1966) 


Bruner & Kenney (1966) 


Steffe & Parr (Note 6) 


Fischbein, Pampu, & 
Manzat (1970) 


Brainerd & Allen (1971) 


Lee (1971) 
Tomlinson-Keasey & 


Keasey (1974) 
Webb (1974) 


Chapman (1975) 


Equilibrium on a balance arm 

Guessing the shadows cast by 
rings 

Number series and analogies 

Equilibrium on a balance arm 

Guessing the shadows cast by 
rings 

Number analogies 

Calculating the relation between 
polygan angles and the number 
of sides 

Calculating the areas of similar 
triangles 

Filling similar and dissimilar 
beakers to same and different 
proportions full 

Pictorial and symbolic ratio 
forms 

Finding equal ratios of beads 
in two beakers 


Density conversion 
Equilibrium on a balance arm 
Guessing shadows cast by rings 
Equilibrium on a balance arm 
Equilibrium on a balance arm 
Picking the container with 


higher proportions of 
certain-color beads 


Age 
studied Attainment 
in years age in years 
11-15 13-15 | 
11-15 13-15 
9-17 13-15 
13-15 
11-15 13-15 
11-15 13-15 
11-15 13-15 
11-15 13-15 
5-7,9, 11 1 
13-15 
5-6 12-13 
9-10 
12-13 
10-11 80% failed 
proportionality 
problem 
5-17 13 
5-17 13 
11-12 Only in the 
18-20 18-20 group 
5-11 None 
(All IQs 
> 160) 
6 Only in college 
8 group 
10 
College 


they preserved the additive relationship. If 
Segment A was 1 cm longer than Segment B 
before the stretching (2 — 1 = 1), it should 
have been 1 cm longer afterwards (4 — 3 — 
1). The oldest children discovered the ratio 
solution first on an intuitive level and with 
small (2:1) ratios and then, at about 13 
years of age, on a formal basis with complex 
ratios. 

Piaget's other studies yielded results quite 
analogous to the two examples. The develop- 
mental stages seem to unfold in the same 
sequence no matter what task is used. The 
age of attainment varies somewhat depending 
on the task, but in all cases it is between 10 


and 15 years of age, averaging about 13 
years of age. 

Piaget’s methodology has been criticized 
as being too clinical. Table 4 is a summary 
of research carried out by non-Piagetian psy” 
chologists. Many of these studies employ 
the large samples, statistical analyses, ай 
standardized problem presentation and Te 
sponse modes that are missing in the Pis 
getian research. The attainment ages mer: 
tioned in the table refer to the age at which 
half or more of the subjects had demon 
strated proportional thought, if it was pong 
sible to glean this information from the pu» 
lished reports. In general, the studies 
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Table 4 are consistent with Piaget's results. 
They document the same acquisition stages 
and the fact that mature proportional thought 
is not developed until adolescence. However, 
these attainment ages are a year or two later 
than those reported by Piaget. 


Implications of the Literature Review 


Our review of the literature on logic-math- 
ematics development indicates that the aver- 
age child under 13 years of age does not 
solve problems of ratio proportionality. There 
is no evidence with physical and mathemati- 
cal problems that preteens cognitively con- 
struct and compare ratios, just as there is no 
evidence for similar cognitive activity in the 
allocation literature. Our review offers no 
support for the equity theory assumptions 
that preteens form cognitive ratios and feel 
discomfort in the absence of proportionality. 
The major methodological and theoretical 
implications of the review are now spelled 
out. 


Methodological Implications 


Our negative conclusions about the rele- 
vance of Adams’s (1965) proportional equity 
theory for children under 13 years have to 
be tentative. They must be interpreted in 
the light of several issues. The research sum- 
marized in Tables 3 and 4 does not necessarily 
demonstrate that children under 13 years are 
Incapable of proportional solutions to logico- 
Mathematical problems, It may only show 
4 that they do not perform behavior commen- 
Surate with such thinking. The studies cited 
in these tables were open ended, and child- 
dren were asked to respond in whatever man- 
ner they wished. Generally, they were not 
told that there was a right answer that they 
Must find; no incentives were provided to 
Induce children to perform in ways indica- 
, tive of proportional thinking; and the studies 
"-did not follow the training or enrichment 

Paradigm whereby a child is told a correct 

Tésponse and is reinforced for replicating it. 

Brainerd and Allen (1971) claimed that cor- 

řective feedback over a series of trials on 

Conservation of density induced this capacity 
Tu 10- and ll-year-olds who at first failed 
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to conserve and did not seem to have the 
capacity. The trained children not only con- 
served density at a higher rate than a control 
group but also apparently generalized their 
learning to problems in conservation of vol- 
ume. Varying training, instructional sets, and 
incentives to reveal capacities that are not 
spontaneously manifested may provide evi- 
dence of proportional capacity in younger 
children. We do not know, since the appro- 
priate studies have not yet been performed.? 
Until they are, it will not be clear whether 
the age norms for both allocative behaviors 
and logico-mathematical skills are due to the 
biological constraints of maturation or to 
environmental factors that lead children not 
to perform behavior of which they are indeed 
capable under certain conditions. (See Brain- 
erd, 1978, for a discussion of this problem.) 

Even if one assumes that preteens are in- 
capable of proportional responses to logico- 
mathematical problems, does this mean they 
are incapable of such responses to allocation 
dilemmas? Our response must be, “We are 
not yet sure,” largely because of the possibil- 
ity of confounded inputs and nonisomorphic 
psychological and physical scales. But on 
the other hand, our review does suggest that 
behaviors in the allocative and logico-mathe- 
matical domains are highly congruent. Also, 
a number of studies (Damon, 1975; Hook, 
1978; Tompkins & Olejnik, 1978) document 
correlations between responses to the two 
types of problem. However, such relation- 
ships do not show which type, if either, is 
prior. Future research is needed to test the 
idea, implicit in our review, that the logico- 
mathematical proportionality concept de- 
velops either prior to, or simultaneous with, 
the allocational proportionality concept. 

Some experimental features may permit 
children to make allocations that are en- 
tirely consistent with proportional equity 
but are not the result of engaging in propor- 
tional cognitions. The first such feature in- 


2The experiment by Brainerd and Allen (1971) 
did not involve feedback on actual numerical dis- 
tributions, and their subjects could have given 
proportionality responses without ratio cognitions. 
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volves the recognition of equity as a de- 
pendent behavior rather than the creation of 
equity. Brickman and Bryan (1976), for 
example, showed children videotapes in which 
characters transferred rewards from one per- 
son to another and ‘either created or elimi- 
nated proportional equity. The children then 
rated the transfer agent. Agents who created 
equity were viewed more favorably than 
those who eliminated it. Thus, the children 
did not create an equitable allocation, but 
recognized one. 

A second feature involves the use of cor- 
respondent allocation problems. Using the 
Leventhal and Anderson (1970) paradigm, 
suppose that two persons contribute 15 and 
5 inputs, respectively, and then must dis- 
tribute 20 outcome rewards. A proportional 
equity allocation could be created by setting 
outcomes in exact correspondence to inputs 
for each person: 15 for 15 and 5 for 5. This 
allocation requires no knowledge of ratios. 
Nelson and Dweck (1977), for example, 
asked 4-year-olds who had done either nine 
or three units of work to divide 12 rewards. 
In the conditions in which they specifically 
instructed children to make allocations con- 
sistent with work, all of the children made 
proportional allocations. Although Nelson and 
Dweck argued that this demonstrated the 
capacity of even 4-year-olds to allocate in 
an equitable manner, Anderson and Butzin 
(1978) noted that such proportional alloca- 
tions could have been due to correspondence 
procedures involving no social comparison or 
proportional cognition at all. Also, Nelson and 
Dweck observed proportional allocations only 
when they explicitly requested work-reward 
correspondence. In other conditions, Nelson 
and Dweck and other researchers using cor- 
respondence designs (Streater & Chertkoff, 
1976; Tompkins & Olejnik, 1978) did not 
observe proportional allocations. 

What if, however, two persons contributed 
three and one inputs, respectively, and were 
asked to allocate 20 rewards? Children could 
give 3 rewards to one worker and 1 to the 
other and then repeat the process until the 
20 rewards were exhausted. This problem 
demands iteration but not ratio cognitions. 

It would be more plausible to assume ratio 
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cognitions if the two workers contributed 6 
and 2 units of work before allocating 20 
rewards. 
A third feature involves the ordinal scaling 
of the independent variable. Anderson and 
Butzin (1978) used adjectives to describe 
the work efforts of two persons. Persons X 
and Y were said to have tried “a little bit,” 
“kind of hard,” and “very hard.” Children 
aged 6-, 8-, and 10-years-old were asked to 
allocate 20 candy rewards to X and Y under 
all nine possible combinations of relative 
work, Equity theory was employed to pre- 
dict the allocations. For example, if X and 

Y both tried “very hard” they should have 
received equal numbers of rewards (10 each). 

If X tried “very hard" and Y tried "kind 
of hard," then X deserved more rewards. 
This equity prediction, however, is ordinal 
only, not proportional. If X receives more | 
rewards than Y, no matter how many more, 
ordinal equity is supported. The input adjec- 
tives cannot be placed in a ratio. 

The only way to differentiate ordinal and 
proportional equity predictions with such or- 
dinally scaled independent variables is to 
assign numbers to the adjectives, such as 
a little bit = 1, kind of hard = 2, and so on. 
Assuming such an underlying interval scale; 
the nine data points in the Anderson and 
Butzin design that reflect allocation behavior 
would be spaced differently if the subjects 
employed an ordinal equity rule than if they 
employed a proportional equity rule. Grant- 
ing such a scaling assumption, Anderson 
and Butzin’s data with children are con- 
sistent with ordinal equity, whereas Ander-? 
son's (1976) earlier data with adults appear 
to be more consistent with proportional 
equity. 

Finally, in interpreting the studies we have 
reviewed it is worth bearing in mind that we 
have stressed children’s probable incapacity 
to generate the logical preconditions for cal-, 
culating equity ratios. We have thereby as- ; 
sumed that it is more useful to conside 
equity theory in the proportional sense ex- 
plicit in Adams’s theoretical formulations 
than in the less strict sense implied by th? 
belief of equity researchers that the theory 8 
corroborated whenever the outcome dala 
show ordinal relationships that are congr | 
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ent with the theory, even though the data do 
" not attain the exact numerical values that the 
formal theory suggests. Tt is difficult to judge 
whether one should hold equity theory ac- 
countable in terms of the standards of the 
theory or in terms of the (more relaxed) 
standards of tests of the theory. We have 
opted for the former over the latter. But it 
is noteworthy that had we opted for the 
standard of past empirical tests, then the 
ordinal equity responses of children between 
7 and 13 years of age would be interpreted 
as supporting equity theory rather than in- 
terpreted—as we have done—as not directly 
supportive. 

Future tests of equity in young children 
would be improved if (a) they were explicit 
about the degree to which, as a theoretical 
construct, equity requires cognitive interval 
'» Scaling and proportional thinking; (b) they 

incorporated experimental procedures that 

"unconfound" the capacity to think in ratio 

terms and the willingness to think this way 

or to perform behaviors indicative of such 
thinking; (c) they examined the ways in 

Which the development of allocative skills 

and the development of logico-mathematical 

skills are temporally related to each other; 
| and (d) they made sure that behaviors in- 
dicative of proportional thinking were not the 
results of correspondence procedures that do 
not necessarily require proportional thinking. 

Equity is a slippery construct that requires 

explicit formulation and the careful choice 

of experimental procedures that are carefully 
tailored to the preferred theoretical formula- 
«etion. In these respects equity is like other 

Justice constructs, (For a discussion of the 

Construct validity of relative deprivation and 

the requirements of validation tests, see Cook, 

Crosby, & Hennigan, 1977.) 


Theoretical T mplications 


,. UP to now we have made the point that 
“equity theory in the proportional form pro- 
Posed by Adams (1965) is probably inap- 
Propriate when applied to children under the 
аве of 13. We have made this point (a) be- 
Cause on logico-mathematical tasks structur- 
ally analogous to equity tasks children do 
| Pot behave in а manner consistent with the 
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Table 5 
Results of Allocation Research in Three 
Age Groups 
c ccce ENDE 
Age in years 
Result 3-5 6-12 13+ 
Self-interest 5 0 0 
Equality 11 4 0 
Ordinal equity 5 29 4 
Proportional equity 0 0 9 


equity assumption that they form cognitive 
ratios and (b) because the allocative be- 
havior of children in equity experiments is 
not entirely consistent with a proportional 
interpretation. However, we have also made 
the point that future research with better 
designs and procedures may produce results 
commensurate with proportional equity the- 
ory, though we are skeptical. 

The research reported in Tables 3 and 4 
suggests a three-step development of pro- 
portional thought, as evidenced by children's 
responses to logico-mathematical problems. 
Children under approximately 6 years of age 
do not preserve an ordinal relationship from 
one dimnesion (e.g., distance traveled) to 
another (e.g., speed). Middle school children 
from about 6 to 12 years old preserve such 
ordinal relationships by setting up corre- 
sponding series, but do not relate two di- 
mensions in terms of fractions or ratios. 
Finally, adolescents and adults above 13 
years of age can and do compare ratios for 
proportionality. 

Table 5 reports the research results in 
Table 2 that are consistent with each of the 
four dominant allocation norms—self-inter- 
est, equality, ordinal equity, and proportional 
equity. The studies are divided into columns 
according to the ages of the subjects: 3- to 
5-year-olds make what appear to be self-inter- 
est and equality allocations; 6- to 12-year- 
olds make what seem to be ordinal equity al- 
locations; and 13-year-olds and older seem 
to prefer proportional equity allocations. 
Thus, allocation behaviors seem to follow 
a sequence of stages similar to those of 
thought in general and are roughly corre- 
spondent to Piaget’s sequential stages of 
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Table 6 
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Three Steps in the Development of Allocation Behavior 
А aaaaaaaaaaaaalalalalalalalalaaiaaaaaaaaassssssssstttttttllttttllMMlMMlŘÃħŮ 


Сотрагіѕоп Probable 
onset age 
Step Person A Person B in years 
Unidimensional Outcome A <> Outcome B 3 
Input А > Input В 
Ordinal Outcome A 7 Outcome B 
{ 
Input А > Input B 6 
Proportional Outcome А Outcome В 13 


———— ө 
Input A 


Input B 


intellectual development: preoperational, con- 
crete operational, and formal operational 
thought. 

Unidimensional allocative comparisons of 
children under 6 years of age. Although 
self-interest and equality seem to be very 
different, they share one attribute not pos- 
sessed by other allocation principles: Both 
require comparisons on one dimension only— 
the reward dimension. In the Leventhal and 
Anderson (1970) design, the child allocator 
does not have to recall or consider relative 
work at all to make a self-interest or 
equality allocation, Self-interest and equality 
are simple, unidimensional allocation prin- 
ciples within the logical capacity of the pre- 
school child. 

A glance at Table 6 should clarify this 
unidimensional interpretation. The arrows in 
this table link the objects or concepts to be 
compared by the reward allocator. The litera- 
ture suggests that a 4-year-old, for example, 
can compare the outcomes of Persons A and 
B to determine whether they are equal or 
which is larger. Also, the 4-year-old can 
compare the inputs of Person A and Person 
B. Thus, the allocation cognitions of the 
unidimensional child involve only the ob- 
servation that outcomes or inputs are the 
same or different for different persons. 

Ordinal equity theory in children under 
13 years of age. The middle school child 
is usually capable of preserving ordinal re- 
lationships on two or more dimensions for 
comparison. In other words, the ordinal 
equity child is able to compare the compari- 


sons. In terms of Table 6, the child notes 
first that Person A’s outcomes are greater 
than Person B’s outcomes and that Person 
A's inputs are greater than Person B's inputs. 
But then the child goes on to compare the 
relationships to see whether or not they are | 
consistent. If A’s inputs are greater than B’s, 
but B’s outcomes are greater than A’s, then 
equity is violated. In essence, the child is 
comparing ranks between people on two di- 
mensions, a comparison of abstractions. (The 
capacity to hold relationships in mind for | 
comparison is an attribute of Piaget's con- 
crete operational thought. Piaget [1964] dis- 
cussed this capacity as an element of the 
ability to seriate, or rank, objects on some | 
dimension such as size or number. To seriate 
sticks, for example, Piaget argued that it is 
necessary to simultaneously realize that A < 
B and B « C.) 

Ordinal equity, in the allocation context," 
is nothing new. Indeed, Homans's (1961) 
distributive justice concept, which he applies 
to adults, is an ordinal allocation rule: 


As a practical matter, distributive justice is realized 
when each of the various features of his invest- 
ments and his activities, put into rank order iM 
comparison with those of other men, falls in the: 
same place in all the different rank orders. (p. 264) | 


As far as ordinal equity with children 55 
concerned, it may have two substages. The 
first is ordinal equity, as just described, 1n 
which two ordinal comparisons are compar 
ordinally to see if the ranks differ. The set- 
ond, perhaps beginning around 9 years 0“ 


y 
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age, could be called impending interval 
equity. At this point, the child may come to 
believe that inequity exists, even though Per- 
son A has greater outcomes and inputs than 
Person B. This feeling of inequity would 
occur when A’s inputs were much larger than 
B's but his outcomes were only a little larger. 
Piaget, Grize, Szeminska, and Vinh Bang 
(1968) have explored this onset of measure- 
ment in the physical domain; but it has not 
been explored in allocation research. 


Proportional Equity Theory in Adolescents 


Formal operational — thinkers—normally 
persons of adolescent age or older—are 
capable of the proportional thought implied 
by Adams (1965). Adams's theory implies 
that the ratios that are compared for pro- 
portionality consist of the inputs (or work) 
and the outcomes (or rewards) of a single 
person: 


Reward А _ Reward B 
Work A _ Work B^ 


This formulation causes certain cognitive dif- 
ficulties because the numerator and denomi- 
nator are expressed in different units, How- 
ever, the formally equivalent proportion, in 
which a dimension rather than a person is 
expressed in the ratio, is cognitively simpler: 


Work А _ Reward A 
Work B Reward B` 


It is probably comprehended earlier by the 
Adolescent, When more than two persons are 
being compared, the simplier formulation can 
be modified to take the form 


у Work A 
Work A + Work B + Work C 
АЕ Reward А 
_ Reward А + Reward B + Reward С 


We Suggest that subjects cognitively assess 
Фасћ person in terms of whether his work, 
аз à proportion of all the work, is equivalent 
P his reward, as a proportion of all the 
‘ward. Hook (1978) found this to be the 
Way 13-year-old proportional thinkers made 


1 allocations; and other than for Homans, it 
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is the formulation adopted by the allocation 
theorists who preceded Adams (see Patchen, 
1961; Sayles, 1958). It is also consistent 
with work by integration theorists (Ander- 
son & Farkas, 1975) on the relative accuracy 
of prediction of the three equity equations 
just mentioned. 
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and an Analysis of Distribution Statistics 
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A method of obtaining an average reaction time distribution for a group of 
subjects is described. The method is particularly useful for cases in which data 
from many subjects are available but there are only 10-20 reaction time ob- 
servations per subject cell. Essentially, reaction times for each subject are 
organized in ascending order, and quantiles are calculated. The quantiles are 
then averaged over subjects to give group quantiles (cf. Vincent learning 
curves). From the group quantiles, a group reaction time distribution can be 
constructed. It is shown that this method of averaging is exact for certain 
distributions (i.e., the resulting distribution belongs to the same family as the 
individual distributions). Furthermore, Monte Carlo studies and application of 
the method to the combined data from three large experiments provide evidence 
that properties derived from the group reaction time distribution are much the 
same as average properties derived from the data of individual subjects. This 
article also examines how to quantitatively describe the shape of reaction time 
distributions. The use of moments and cumulants as sources of information 
about distribution shape is evaluated and rejected because of extreme depen- 
dence on long, outlier reaction times. As an alternative, the use of explicit 
distribution functions as approximations to reaction time distributions is con- 


sidered. 


Despite the recent popularity of reaction 
time research, the use of reaction time dis- 
tributions for both model testing and model 
development has been largely ignored. This 
is surprising in view of the fact that proper- 
ties of distributions can prove decisive in 
discriminating among models (Sternberg, 
Note 1) and can falsify models that quite 
adequately describe the behavior of mean re- 
action time (Ratcliff & Murdock, 1976). 

Two methods have been used to obtain 
distributional or shape information. One 
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method, advocated by Sternberg (1969; 
Sternberg, Note 2), is to use moments and 
cumulants to describe distribution shape 
without assuming any particular reaction 
time distribution function. A second method, 
used by Ratcliff and Murdock, is to assume 
an explicit distribution function and use the 
parameters of this distribution to provide 
information about shape. Both these methods 
are unattractive because they require 5 to 10 
times the number of observations usually, 
collected in an experiment. For example, to 
fit an explicit function such as a gamma dis- 
tribution to experimentally obtained reaction 
times, a minimum of about 100 observations 
per subject per condition are required for те 
liable convergence of fitting procedures and 
stability of parameter estimates. Similarly; 
to obtain stable estimates of higher moments, | 
are typically required. The necessity for 4 
large number of observations becomes а pog 
ticular problem in experimental endeavors 1: 
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А 
several thousand observations per condition 


REACTION TIME DISTRIBUTIONS 


which the test materials used require a great 
deal of time and effort for construction (e.g., 
paragraphs; Kintsch, 1974). For such re- 
search programs, it would take years to con- 
struct enough materials to allow application 
of either of the two distributional methods. 

In the first part of this article, I present 
a method for combining data from individual 
subjects to produce group reaction time dis- 
tributions based on as few as 10 observations 
per subject cell. To form group distributions, 
reaction times for each subject are organized 
in ascending order, and quantiles are calcu- 
lated. The quantiles are then averaged over 
subjects to give group quantiles (Vincent 
averaging; Vincent, 1912). From the group 
quantiles, a group reaction time distribution 
can be constructed. This group distribution 
method averages over individual subjects’ 
data in a way that retains shape information, 
and this is demonstrated in three ways: 
First, it is shown that for certain distribu- 
tional forms (exponential, Weibull, and logis- 
tic), Vincent averaging of individual distri- 
butions of a particular form with different 
parameters results in a group distribution of 
the same functional form. Second, a distribu- 
tion that has been used to describe reaction 


` time data (Ratcliff & Murdock, 1976) was 


Wa 


used in Monte Carlo studies to generate re- 
action times that were then combined accord- 
ing to Vincent’s method. With 20 reaction 
times per pseudosubject, the group distribu- 
tions generated by this method have the 
same form as the distribution used to gen- 
erate the data, Third, the method was applied 
to the combined data from three large recog- 
nition memory experiments that used the 
Study-test procedure (Ratcliff & Murdock, 
1976), with about 120,000 observations in 
total. It is shown that parameters derived 
from fitting a distribution function (used by 
Ratcliff & Murdock, 1976) to the group dis- 
tribution are the same as averages of the 
Parameters derived from fitting the function 
to individual distributions. 

In the second part of the article, I criti- 
cally examine the use of moments and cumu- 
lants for describing distribution shape. The 
Stability of moment and cumulant estimates 


y is examined first by calculating sampling 


447 


standard deviations and second by observing 
the stability of estimates when outlier re- 
action times are trimmed from the distribu- 
tion. In addition, the use of empirical dis- 
tribution functions to provide information 
about distribution shape is examined. 

The notion of shape can be defined in dif- 
ferent ways. Mosteller and Tukey (1977, 
chap. 1) defined shape as what is left when 
location (position of the distribution on 
the abscissa) and scale (the scale on the 
abscissa) are given up. They showed that 
shape cannot be defined in terms of the 
mathematical form of the distribution func- 
tion. For example, the family of beta density 
functions have the same functional form, 
but differ widely in shape (Mosteller & 
Tukey, 1977, p. 9). However, one of the 
most striking properties of reaction time dis- 
tributions is that in the main they all have 
roughly the same shape, being skewed to 
the right. (Occasionally normality is claimed 
for simple reaction time distributions, but 
this is probably not true [Mosteller, & Tu- 
key, 1977, p. 11].) The group distribution 
method is concerned with averaging over sub- 
jects while preserving distribution shape, 
which for distribution functions shaped like 
reaction time distributions often turns out 
to be much the same as preserving the func- 
tional form, as is shown later. 

Reaction time distributions have been ex- 
amined in some detail with respect to specific 
mathematical models, McGill (1963) pro- 
vided an excellent summary of work prior to 
1963 and presented formal theory for a num- 
ber of latency models. Green and Luce 
(1971) have used transform techniques in 
conjunction with a specific decision model 
to decompose reaction time distributions into 
component distributions, and this method of 
decomposition has been used in testing a 
neural timing theory (Luce & Green, 1972). 
Hohle (1965), Snodgrass (1969), and Snod- 
grass, Luce, and Galanter (1967) have fitted 
various empirical distributions to choice and 
simple reaction time data. None of this 
work, however, provides a general approach 
to obtaining distributional or shape informa- 
tion. 
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Before proceeding to a discussion of meth- 
ods, I briefly illustrate potential uses of dis- 
tributional information by listing predictions 
made by four models about distribution 
shape. First, serial scanning models of item 
recognition that assume independent and 
identically distributed comparison stages pre- 
dict (by the central limit theorem) that as 
the number of comparison stages increases, 
the skewness of the reaction time distribu- 
tion will decrease, and so the distribution will 
become more normal in shape. Second, the 
Atkinson and Juola (1973) model of item 
recognition predicts bimodal reaction time 
distributions. Third, the multiple observa- 
tions model for signal detection (Pike, 1973) 
predicts that when the count criteria in- 
crease, mean latency will increase and skew- 
ness will decrease. Fourth, the random walk 
model for item recognition (Ratcliff, 1978) 
predicts that as the relatedness between probe 
item and memory item decreases, the mode 
and mean of the reaction time distribution 
will diverge. These examples are meant only 
to indicate the kinds of predictions that 
models produce and thus the kinds of tests 
for which distributional analyses prove useful. 


Group Reaction Time Distributions 


In experimental psychology it is usual to 
generalize findings across subjects. Often this 
is done by averaging data over subjects and 
making inferences based on the group data. 
Unfortunately, if raw reaction times from 
several subjects were simply combined to 
obtain distributional information, then the 
group distribution would not reflect the shape 
of the individual distributions. As an illus- 
tration, consider two subjects’ unimodal re- 
action time distributions with respective 
means of 500 msec and 900 msec, each with 
100-msec standard deviations. Simply com- 
bining the data would give a bimodal distri- 
bution, and this would not reflect the uni- 
modal, individual distributions. 

If there are enough observations per sub- 
ject cell, then the best way to obtain distri- 
bution information is to derive distributional 
or shape estimates for each subject cell and 
then average these estimates over subjects. 
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Figure 1. An example of Vincent (1912) averaging 
applied to cumulative distribution functions; Fa(t) 
is the Vincent average curve of the curves Fi(t) 
and F.(t), and í is the average of tı and ts. 


For example, Ratcliff and Murdock (1976) 
have used the function arising from the con- 
volution of the normal [N(u,c)] and ex- 
ponential [g(t) = (1/z)e7] distribution 
functions, f(é), as an empirical summary of 
the shape of individual subjects' reaction 
time distributions (see also Hohle, 1965). 
Generalization was then accomplished by 
finding the average of each convolution pa- 
rameter (и, о, and т) across subjects. The 
expression for the convolution is 


(и о I0 eet ort 
r(2x)! 
14-и) /ol—o/ 
P4 e "ду. (1 


зе 


10 = 


In а similar vein, Sternberg (Note 2) 
found four cumulants of distributions for 
each subject cell and generalized across sub- 
jects by averaging each cumulant across sub- , 
jects for each cell. However, the usual ex- 
periment does not provide the number oí 
observations required for these methods. Dis- 
tribution information can still be obtained by 
using the group distribution method. 


Group Distribution Method Я 


The method is very similar to the tech- | 
nique devised by Vincent (1912) for plot- 
ting learning curves. In Vincent's procedure; 
each individual's learning curve is divided 
into equal fractions (number of trials 10 | 
10%, 20%, . . .), and performance of sub- | 
jects at each fraction is summed and then « 


| 
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averaged. Figure 1 shows an example of this 
*Vincentizing" procedure applied to two cu- 
mulative reaction time distributions to pro- 
duce the average cumulative distribution. In 
| essence, reaction times at a fixed probability 
| level (quantile) from the two distributions 
are averaged to give the mean quantile re- 
| action time. 
| The procedure for estimating the sample 
n quantiles is carried out as follows: Each sub- 
ject’s reaction times T1, . . . , T, are arranged 
in ascending order of magnitude: Та), Те 
+++, Tim, where Tu, is the ith order sta- 
- tistic (David, 1970; Sarhan & Greenberg, 
| 1962). From these ordered reaction times, q 
sample quantiles are estimated for each sub- 
jects data (generally with д <n). Each 
| quantile is then averaged across subjects to 
give a mean m% sample quantile. In detail, 
i, Suppose there are observations for the first 
Subject and one wishes to obtain q quantiles 
(9 < n). Then for each subject, each ordered 
latency Tu is replaced by q equal latencies, 
T, thereby forming a list that is the length 
of the product of q and n: Ta» Tas +++; 
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calculate the first quantile, the first n la- 
tencies are summed and divided by n; the 
second quantile is given by the sum of the 
next n latencies divided by n, and so on. 
This procedure is equivalent to simple linear 
interpolation. For example, if there were 14 
responses and deciles were to be calculated, 
the first decile would be given by (10/14) 
Та) + (4/14)T s, the second by (6/14) 
Те) + (8/14) T(s), the third by (2/14)Т (5) 
+ (10/14)T 4, + (2/14)Т (5), and so on. 
When the quantiles have been calculated for 
each subject, each quantile is averaged across 
subjects to give group quantiles. 

Group distribution histograms can be con- 
structed by plotting quantiles on the abscissa 
and then constructing rectangles between ad- 
jacent quantiles such that all the rectangles 
have equal areas, as in Figure 2. 

Several points about this method need 
discussion. First, the group distribution 
should be thought of as representing the dis- 
tribution of the average subject, just as aver- 
age reaction time represents the reaction time 
of the average subject. Second, order statis- 
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Figure 2, Two sample group reaction time distributions for 10% quantiles. (Data are from the 
ree experiments reported later and represent correct rejections in Output Blocks 1 and 4.) 
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(David, 1970, chap. 4). However, if there 
are roughly the same number of observations 
for each subject cell and if the individual 
distributions for each subject cell have ap- 
proximately the same shape, then the group 
distribution will reflect the same bias as the 
individual distributions. Third, it is a gen- 
eral problem that the shape of a group curve 
may not reflect the shape of individual 
curves, This problem was considered in great 
detail in the mid-1950s (Bakan, 1954; Estes, 
1956; Hayes, 1953; Sidman, 1952; Spence, 
1956). The general conclusion reached was 
that group curves often do not reflect the 
form of individual curves but that if care is 
taken group curves can be used to test hy- 
potheses about individual curves. The next 
two sections examine this problem with re- 
spect to the Vincent averaging procedure, 
and later sections examine the problem with 
respect to Vincent averaging of reaction time 
distributions. 


Some Exact Results for Vincentized Curves 


Estes (1956) considered the problem of 
averaging learning curves and classified some 
simple functions into cases in which the 
functional form is not changed by averaging 
and cases in which the functional form is 
changed. Some similar results can be ob- 
tained for the Vincentizing procedure. (Note 
that distributions that differ only by a trans- 
lation that is shifted along the time axis 
have the same form under Vincent averag- 
ing.) For Vincent averaging, it is necessary 
to obtain the following relationship: time as 
a function of cumulative probability (see 
Figure 1). Consider the exponential distribu- 
tion. The cumulative probability distribution 
is given by 


Еф 21— ets; 
t=-rm[1—F@]. (2) 


Consider the average of two exponential dis- 
tributions with parameters т; and zs: 


t= (а + )/2 
(та + 72) 
EIN 


= in[1 — 8001. (3) 
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Thus the *Vincentized" average of л expo- 
nential distributions is exponential with pa- 
rameter 3", — 1 7;/m. For the Weibull distri- 
bution, 


FQ) 21— eum, 


with fixed parameter y; the Vincentized dis- 
tribution is also a Weibull distribution, with 
parameter 5". 1 ti/m. Similarly, for the 
logistic distribution, 


Fi) = 1/[1 — g- 47018]; 


the Vincentized distribution is also logistic, 
with parameters а = 3";—1 ai/n and B= 
3"; =1 Вип. Although normal distributions 
will not give exactly a normal distribution 
when Vincentized, the logistic distribution is 
a very good approximation to the normal dis- 
tribution, so that any differences are prob- 
ably very small. It should be noted that the 
exponential, Weibull (y 1), and logistic 
distributions have been postulated (or are 
very similar to distributions that have been 
postulated) to represent the distributions of 
processing stages in various models. 


Vincentizing the Gamma Distribution 


The gamma distribution has often been 3 


used to model reaction time distributions 
(McGill, 1963). Consider the gamma dis- 
tribution with parameter 2 (i.e., the convolu- 
tion of two exponential distributions) : 


F(t) = 1 — e-t*(1 + 1/7). (4) 


By following an analysis similar to that pre- y 


sented above (Equations 2 and 3), it can be 
shown that Vincentized gamma distributions 
are not members of the gamma family. How- 
ever, for all practical purposes the differ- 
ence is negligible. For example, Vincentizing 
two gamma distributions (Equation 4) with 


parameters т = 100 msec and т = 300 mset, 
gives 165, 276, 405, and 599 msec for the 


20th, 40th, 60th, and 80th percentile points, 
respectively. The gamma distribution with 
parameter 7 = 200 msec has the correspond- 
ing points 165, 275, 404, and 599 msec. Thus 
Vincentizing gamma distributions produces 4 
distribution that is very similar in shape to 
another gamma distribution. 


dit 
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The examples so far have all considered 
the Vincentizing of combinations of distribu- 
tions that differ from one another only in 
the parameters that have dimensions of time, 
that is, parameters that represent the dura- 
tion of some processing stage. There are other 
parameters that do not represent durations, 
for example, the number of convolved ex- 
ponential distributions in the gamma distri- 
bution and y in the Weibull distribution. 
Vincentizing distributions that vary in these 
parameters may not produce a distribution 
that is anything like the average subject’s 
distribution. An extreme example of a dis- 
tribution with this problem is the beta dis- 
tribution. Mosteller and Tukey (1977, p. 9), 
in considering the problems involved in deal- 
ing with distribution shape, presented a figure 
showing the family of beta distributions to 
illustrate that even distributions belonging to 
the same family can differ widely in shape. 
From the graphs presented in Mosteller and 
Tukey, it can be seen that very serious prob- 
lems may be involved in averaging across 
distributions of widely differing shape. To 
decide whether Vincent averaging will work 
in cases in which distribution shape varies 
widely among individual distributions, it is 
probably best to test the method as above or 
to perform some Monte Carlo tests as de- 
scribed in the next section. 


Some Monte Carlo Studies Using 
the Convolution Model 


The distribution that is the convolution of 
the normal and the exponential distributions 
(Equation 1) has been used as an empirical 
Model of reaction time distributions (Rat- 
cliff, 1978; Ratcliff & Murdock, 1976). The 
fits of the convolution to the data are good 
enough to make it reasonable to use the 
Convolution in Monte Carlo studies testing 
the Vincentizing procedure. The Monte Carlo 
Studies are presented to illustrate the use of 
the Vincentizing procedure under optimal 
Conditions in which the form of the individ- 
ual distributions is known. 

To use the Monte Carlo method it is 
necessary to generate a random number from 
the convolution of normal and exponential 
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distributions. This can be accomplished by 
simply adding a random number generated 
from the normal distribution and a random 
number generated from the exponential dis- 
tribution. Most computer systems have a 
random number generator that will produce 
random numbers between 0 and 1 from a rec- 
tangular distribution. Equation 2 can be used 
to produce exponentially distributed random 
numbers (with parameter 7) by substituting 
rectangularly distributed random numbers 
(RND) for F(t). Normally distributed ran- 
dom numbers with mean џи and standard de- 
viation с can be obtained using the method 
proposed by Box and Muller (1958), as 
shown in Equation 5: 


t = [-21n(RND)] cos (2eRND)o + и. (5) 


Each Monte Carlo study consisted of sev- 
eral experiments (typically 50 to 100). In 
each experiment, 20 reaction times were 
obtained from each of 40 pseudosubjects. 
The 20 reaction times were arranged in as- 
cending order and then averaged across sub- 
jects to give group 5% quantiles. The con- 
volution model was then fitted to the set of 
5% quantiles (5%, 10%, 1576, . . .) using 
the maximum likelihood method described 
in Ratcliff and Murdock (1976). Note that 
the quantile reaction times are derived from 
random variables; and so, strictly speaking, 
the parameter estimates do not have the nice 
properties of maximum likelihood estimators. 
However, estimating parameters this way is 
no worse than estimating parameters by, say, 
the least squares method, because the quan- 
tile reaction times are not independent and 
the expression being fitted is nonlinear. Re- 
sults are shown in Table 1. 

In general, the parameters » and 7 derived 
from fits to the Vincentized distribution are 
very close to the input values (used to gen- 
erate the pseudodata). However, as 7 in- 
creases (from .05 to .30), the value of o (in- 
put value = .04) becomes more and more 
underestimated. This suggests that in any 
practical use, the value of о is likely to be 
underestimated and less reliable than the 
values of » and т. It is interesting to note 
that the values of 5,, sz, and s, are very close 
to the asymptotic variance estimates for the 
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Table 1 
Monte Carlo Studies for the Convolution Model 
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А 


Input parameter 


Fitted parameter*and standard error estimate 


No. of 
и с т nu ба с So T Se experiments 
.50 04 .05 .5005 .0004 .0371 .0003 .0486 .0005 65 
50 04 #15 4996 .0004 :0325 .0004 .1498 .0006 101 
«50 .04 ,30 .5016 .0008 .0275 .0007 .2955 -0015 58/109^ 
.50 0 .50 .5002 0014 .0749 .0012 „4994 .0024 98 
.50 .04 .055-.250^ -4980 .0006 0314 .0006 .1545 


:0012 52 


Note. s = the standard error in the mean ((Z(X; — M)?/n(m — 1)}}). 
a 51 of the 109 experiments terminated with the fitted value of ø equal to zero. 
* The 40 pseudosubjects had different rs, ranging from .055 to .250 in steps of .005, M = .1525. 


convolution. model presented in Ratcliff and 
Murdock (1976, Table 2). The last series 
of Monte Carlo experiments presented in 
Table 1 used 40 pseudosubjects with different 
т values. The value of the average Vin- 
centized + was almost equal to the average 
input т. This result shows that the Vincent- 
averaging properties of the exponential dis- 
tribution carry over to parameter т of the 
convolution to a good approximation. 

These Monte Carlo studies show that ap- 
plication of Vincent's (1912) procedure to 
the distribution that is the convolution of 
normal and exponential distributions, a dis- 
tribution that fits response latency distribu- 
tions reasonably well, introduces little bias 
into parameter estimates. 


Practical Test of the Group 
Distribution Method 


To provide a stable data base for a practi- 
cal examination of the method, three experi- 
ments (with four subjects per experiment) 
were combined, giving about 120,000 reaction 
times in total. The experimental procedure 
was the study-test recognition memory para- 
digm. The experiments have been reported as 
Experiments 2 and 1 in Ratcliff and Mur- 
dock (1976) and Experiment 1 in Ratcliff 
(1978); they are referred to here as Experi- 
ments 1, 2, and 3, respectively. A brief de- 
scription of the study-test procedure is pre- 
sented here; for further details, consult Rat- 
cliff and Murdock. 

In each of the three experiments, a list 
of study words was presented to the sub- 


ject at about one word per sec, followed by 
a test list containing all the study words 
plus an equal number of new words in ran- 
dom order. For each word in the test list, 
the subject had to respond on a 6-point con- 
fidence scale ranging from sure old to sure 
new. Study lists were 16 words long, and 
test lists were 32 words long, except in Ex- 
periment 1 in which the study list contained 
15 words and the test list 30 words. The list 
words were randomly sampled from the To- 
ronto word pool (Okada, 1971). Repetitions 
of words were prohibited until at least two 
lists had intervened. The test list was self- 
paced, and words stayed in view until a re- 
sponse was made. In Experiment 1 and Ex- 
periment 3 rate of presentation of the study 
lists was varied between .5 sec and 2 sec per 
item. Effects on mean reaction time were 
small, on the order of 40 msec. In the fol- 
lowing analyses, data from the different pre- 
sentation-rate conditions are combined; this 
does not significantly affect distribution 
shape. 

The experimental data are classified into 
eight cells, four output- or test-position 
blocks (2-8, 9-16, 17-24, and 25-32) for 
high-confidence hits, and the same four out- 
put-position blocks for correct rejections. 
(The first output position is excluded be- 
cause this reaction time is typically several 
hundred msec slower than other reaction 
times in the test list.) This division of data 
gives about 1,200 observations for each of 
the 96 subject cells (12 subjects x 8 cells). 

It was noted earlier that to test the grouP 
distribution method, properties derived from 


REACTION TIME 


Parameter value averaged 
over subjects 
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Model Fits to the Group Reaction Time Distributions and the Average Parameter 
Fits to Individual Subject Distributions 


Group distribution value 


distribution must be compared 
averages of the properties derived 

individual distributions. The prop- 
r for comparison were the param- 
convolution model, », о, and т 
on 1). Estimates of these param- 
obtained from the group distribu- 
quantiles) for each of the eight 
ng the convolution model to the 
es. (See Ratcliff & Murdock, 
the maximum likelihood method of 
Estimates of the parameters were 
from the individual subject dis- 
by first fitting the convolution to 
ct’s distribution and then averag- 
hed estimates over subjects. The 
given by the two procedures can 
(d in Table 2: For two conditions 
ites with latencies longer than 5 
and estimates with latencies 


re 


ш с т и с т 
Hit: Т < 5 sec 
492 38 178 488 36 179 
498 36 200 494 33 196 
506 37 231 503 35 225 
517 ES! 261 507 37 256 
Correct rejection: T < 5 sec 
517 37 213 513 36 216 
523 40 236 519 38 243 
526 41 272 521 38 273 
524 43 300 518 41 302 
Hit: T < 2 sec 
497 41 158 495 39 157 
502 38 175 499 36 172 
513 40 192 510 39 186 
525 45 210 517 42 205 
Correct rejection: T < 2 sec 
523 40 185 520 38 186 
530 43 200 525 41 202 
536 45 218 533 44 223 
538 49 225 532 48 229 


longer than 2 sec eliminated. It can be seen 
that the two procedures give almost identical 
estimates of the convolution parameters. 
This supports the claim made earlier that 
the group distribution provides an unbiased 
summary of individual data. 

Figures 2, 3, and 4 show some sample data. 
Figure 2 shows group reaction time distri- 
butions for correct rejections in Output Block 
1 and in Output Block 4. Figure 4 shows 
group reaction time distributions and fits of 
the convolution model for hits in all four 
output blocks. The Figure 2 distributions are 
based on 20% quantiles, the Figure 4 dis- 
tributions on 2% quantiles, Figure 3 shows 
some sample fits of the convolution model 
to reaction time distributions for individual 
subjects for hits in Output Block 1. Although 
the chi-squares are often significant (because 
the large numbers of observations make the 
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SUBJECT 1 SUBJECT 2 SUBJECT 3 SUBJECT 4 
+ 544 oss 
o 2.034 Q ·.052 
T = .220 То 
x34 xt. 548 
df » 17 df. 9 
N + вэ N тв 


NUMBER OF RESPONSES 


+458 
а ..032 
T +130 
хе. 944 
df «i0. 

N «na2 


EXP.1 


EXP.2 


Ы + 46) 
c + 029 
T » 164 
xt» 585 n 
dfs 12 a 
N sus? 3 


LATENCY (SEC) 


Figure 3. Empirical and fitted latency distributions for hits, Output Block 1. (Exp. 1 — Experiment 
2, Ratcliff and Murdock, 1976; Exp. 2 — Experiment 1, Ratcliff and Murdock, 1976; Exp. 3= 


Experiment 1, Ratcliff, 1978.) 


chi-square a very powerful test), the fits are 
actually quite good; certainly the convolu- 
tion captures the overall shape of the dis- 
tribution. Problems associated with trunca- 
tion and outliers are discussed in the section 
entitled Moments and Cumulants. 


Probability Mixtures of Distributions 
and Bimodality 


Occasionally a model is developed that 
predicts bimodal reaction time distributions 
arising from a probability mixture of pro- 
cesses (e.g., Atkinson & Juola, 1973). The 
question arises as to whether Vincent aver- 
aging across bimodal distributions from in- 
dividual subjects will produce a bimodal 
group distribution. In general, the answer is 
only under conditions in which the propor- 
tion of responses in each process is approxi- 
mately the same across subjects. For ex- 
ample, Figure 5 shows the Vincent average 


cumulative distribution function for two dis- 
tribution functions, each of which is a prob- 
ability mixture of two processes. One dis- 
tribution has a 25%-75% combination, and 
the other has а 75%-25% combination. The 
resulting group distribution is trimodal (i.e; 
has four points of inflection) and certainly , 
does not reflect the bimodal nature of the 
individual distributions. In situations in 
which bimodality and probability mixtures 
of processes are expected, it is probably best 
to collect several hundred latencies per sub- 
ject condition and investigate the individual 
latency distributions. 


Moments and Cumulants 


Moments have been used for many yea? 
to determine the shape of frequency curve 
either through skewness and kurtosis indice 
or by explicitly determining the frequency 
curve within Pearson’s (cited in Eldertom 


k 
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1906; Elderton & Johnson, 1969) system. 
Recently moments and cumulants have been 
used in the additive factor method for analy- 
sis of stage models (Sternberg, 1969; Stern- 
berg, Note 2). In this section, three related 
problems in the use of moments and cumu- 
lants as sources of shape information are dis- 
cussed. These problems are first that the 
variance associated with estimates of these 
measures is extremely large, second that the 
measures are very sensitive to outliers, and 
third that the measures give information 
about a part of the frequency curve that is 
of little theoretical interest. 

To investigate the variability of moments 
and cumulants, expressions for moments and 
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cumulants and their standard deviations must 
be derived. These expressions are derived for 
an explicit distribution function to allow es- 
timation of numerical values. The convolu- 
tion of normal and exponential distributions 
is chosen because it approximates the shape 
of reaction time distributions. 

Moments are defined as follows (Kendall 
& Stuart, 1969): 


T, 


Y 3B Uf (dt; 


|. -f (t— p)'f(@dt, for i» 1. 


T 
o 
ш 
ә 
> 
E 
ә 
= 
ш 
о 
E of OUTPUT BLOCK 3 
= 
Соје 
Q 
O 
[та 
а oL 
о 
6 OUTPUT BLOCK 4 


5 6 7 8 9 


OUTPUT BLOCK 1 


OUTPUT BLOCK 2 


ко rnt 12 [Ek] 14 15 


TIME ( SEC ) 


Figure 4. Group reaction time distributions for 2% quantiles for hits together with fits of the 


Convolution model to the group distributions. 
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CUMULATIVE PROBABILITY РО 


TIME 


Figure 5. An example of group averaging of two 
bimodal distributions; Fi(t) has 75% of the proba- 
bility density in the first peak of the density func- 
tion, F(t) has 25% of the probability density in 
the first peak, and the resulting Vincent (1912) 
average distribution F:(t) is trimodal. 


Cumulants are expressed by 
fortis 35 


ка = ша — Зиг. 


Ki = Ш, 


The sampling variances of the & statistics k; 
(unbiased estimates of the cumulants к,) are 
given by Kendall and Stuart (1969): 


бпку 


Ки ђе =) 


ка , lÓkeKa 


48 
var (ks) = E SEE — 


"n—1 
34k 12nkaks* 
n—1' (n—1)»—2 
144nks ko 
(n — 1)(n — 2) 
24n(n + 1)* 
(n — 1)(n — 2)(n — 3) 


+ 


Expected values and variances of the # sta- 
tistics for the convolution of normal and ex- 
ponential distribution can now be calculated. 
For the normal distribution, «y = p, x2 = 0°, 
and к = 0, for i> 2; and for the exponen- 
tial distribution, к; = 7'(#— 1)! To convolve 
two distributions cumulants are added; so 
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for the convolution of normal and exponen- | 
tial distributions, ку = р + т, K= о? +7, 
ка = 27°, and ка = 67*. To estimate numeri- 
cal values for cumulants and sampling vari- 
ances of cumulants, values in the range of 
those found in Table 2 are used: џ = .5 sec, 
о = .03 sec, and т = .2 sec. Also, о? << т, 
so that to an accuracy of 1% or 2% it is 
possible to neglect terms in с compared with 
terms іп т. Now some numerical values can : 
be calculated: For n = 100, x =.040 = 011, | 
ка = .016 + (012, and ка = .0096 + .0174; 
for n= 1,000, ко = .040 + .004, кз = .016 
= .004, and к; = .0096 + .0055. From these 
values of cumulants and their estimated sam- 
pling standard errors, it can be seen that 
stability in the third and fourth cumulants 
is not achieved unless tens of thousands of 
observations contribute to the estimates. The 
same kind of instability can be seen in mo- 
ments if corresponding sampling variances 
for moments are calculated (Kendall & Stu- 
art, 1969). 

The second problem with moments and 
cumulants is their sensitivity to outliers. 
There is à practical problem with outlier 
reaction times in that a proportion of these 
responses may be spurious, that is, they до, 
not arise from the process under examination. 
For example, suppose distributional informa- 
tion is being used to evaluate a model that 
postulates a single retrieval process. Then 
an eyeblink, a moment’s inattention, or 4 
deliberate rest by the subject must be con- 
sidered spurious for evaluation of the model. 
The sensitivity of moments апа cumulants t0. 
these spurious outliers can be demonstrated 
by examining the effect of truncation on esti- 
mates of moments. Table 3 shows values of 
Ti, то, та, and та, estimates of moments 
for the latency data used earlier (in obtain- 
ing group reaction time distributions). The 
effects of truncation are particularly striking.’ 
When 1% to 4% of the slower responses ur 
sec < T < 5 sec) are eliminated, mean l4 
tency changes by between 20 and 50 msec, 
variance by a factor of two, and the third 
and fourth moments by an order of magi 
tude. Thus, excluding outliers three or more. 
standard deviations above the mean ("mi^ 
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Table 3 
Moments for Latency Data Truncated at 5 sec and 2 sec 


АА 


Т < 5 зес Т < 2 ѕес 
Output - 
block m та та та п m^ та ту та п 
Hit 
T 6.70 6.98 :91 2.11 13,754 6.54 3.73 18 48 13,645 
2 6.98 8.93 1.29 3.28 15,158 6.77 4.40 21 -20 15,011 
3 7.37 12.07 2.01 5.53 14,046 7.06 5.05 .25 25 13,852 
4 7.79 14.89 2.38 6.36 12,400 7.36 5.83 .28 .28 12,152 
Correct rejection 
1 7.31 9.76 1.44 3.59 13,885 7.08 4.75 24 23 13,722 
2 7.59 11.54 1.69 4.15 15,118 7.29 5.37 .28 27 14,890 
3 7.98 15.36 2.42 6.13 15,050 7.53 6.03 30 .29 14,709 
4 8.24 20.26 3.73 10.49 13,587 7.63 6.49 33 .33 13,192 


Note. The values for m’, are in units of 10 msec, for та in units of 10* msec?, for ms in units of 108 msec’, and 
for m, in units of 10! msec*; m’, is mean latency, та is variance and та and та are the third and fourth 
moments, respectively. 


(ma)? < 2 sec) leads to enormous changes trates the dependence of moments on tails 
in higher moments. of the frequency distribution. In Figure 6 is 

The extreme sensitivity of higher moments plotted the frequency distribution f(x) — 
to outliers is well-known, and Figure 6 illus-  a-"e/7/[r(m — 1)] for m = 10.6, together 


0001 
| 
l 
| 04 05 


% of p, 
beyond 01% point 
5 


103 A 
Plots of f(Xx—E) ЈИ, 


s % beyond 
3-_.26 
7 


01 02 03 04 05 
Figure 6. The distribution f(x) = х" e*/[rT(m — 1)], for m = 10.6. The lower curves show f(x) = 
(z- £)'/u,, which are the normalized contributions to the moments џ,, as a function of x, where 
i 15 the mean. (From “Some Problems Arising in Approximating to Probability Distributions, 
Using Moments” by E. S. Pearson, Biometrika, 1963, 50, 95-112. Copyright 1963 by the Bio- 


metrika Trustees, Reprinted by permission.) 
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Table 4 


Values of Skewness (b), Kurtosis bs, and Pearson's* Measures of Skewness Sko and Ske fo 


Latency Data 


ROGER RATCLIFF 


Output 
block (bı)? bs Sro Ske (b)! [23 S kc 
Hit 
1 4.94 43.4 ДАТ 744 2.55 12.8 2539. 
2 4.84 41.2 475 .820 2.31 10.4 577 
3 4.78 38.0 478 .836 2.22 9.6 .596 
4 445 28.7 492 .903 2.01 8.2 .605 
Correct rejection 
1 4.71 37.7 483 .794 2.35 10.3 .587 
2 4.32 31.2 .495 .828 2.24 9.4 .600 
3 4.02 26.0 .503 .860 2.05 8.0 .623 
4 4.09 25.6 480 .873 2.01 7.8 .608 
Note. Sko = (mean — mode) /standard deviation, where mean = ти, from Table 3, standard deviation — ( 


from Table 3, and mode is calculated from the convolution fits for T < 2 sec by setting the first derivati 
of the probability density function to zero and estimating ¢[f’(t) = 0]. Ske = 3 (mean — median) /standai 


deviation. 


1 Cited in Elderton (1906) and Elderton and Johnson (1969). 


with the function о(х) = (x — £)*f(x) /us for 
$22,3,...,6. Note that 


; 
Re 
й 


where é is the mean, The figure shows clearly 
the third problem with moments—that the 
higher moments of an asymmetrical long- 
tailed distribution depend on the form of 
the frequency function (and thus outliers) in 
a region of the tail that may be of no prac- 
tical interest (Pearson, 1963). 

Third and fourth moments are used as in- 
dices of skewness and kurtosis through (81)? 
= ps/(u2)*/? and B2 = ра/ро?, respectively. 
Pearson has proposed these alternative mea- 
sures of skewness: Sx, = (mean — mode) / 
standard deviation and, to avoid the use of 
the mode, Sx, = 3(mean — median) / standard 
deviation (Kendall & Stuart, 1969). In Table 
4 are shown values of (5,)!, b2 (estimates 
of (81)? and £2), Sx,, and Sx, for the latency 
data used earlier. Note that the estimated 
value of the mode is rather unstable unless 
a fitted probability density function can be 
used to locate the mode (Elderton & John- 
son, 1969). Thus, the mode used in the cal- 
culation of Sx, was obtained from the con- 
volution fit to the group data (for T< 2 sec) 


(x — £)f(x)dx, 


by setting the first derivative of the probi 
bility density function to zero. The truncal 
distribution (T < 2 sec) was chosen be- 
cause inspection of fits of the convolution 
individual subject's histograms indicated 
the empirical histograms and fitted modi 
did not differ systematically (see Figure 

A rather confusing picture of skewn 
estimates emerges from Table 4. By ш 
(b1)? as the estimate of skewness, skewnt 
decreases as output position increases, and 
skewness is halved by the elimination of 1 
to 4% of longer reaction times. On the othe 
hand, by using Sx, and Sx, as measures 0 
skewness, skewness increases as output posi- 
tion increases, and the elimination of out 
liers results in a change of 10%-20% in Sk 
and Ske. From the demonstration in Figure 
and from the behavior of (;)? and Sk, 
must be concluded that the alternative mea 
sures of skewness, (51)? and Sx, are 
cerned with different properties of the d! 
tribution function. Which measure shoul 
be used depends on whether behavior of 
central portion of the distribution functii 
(indicated by Sx) or behavior of the ex 
tail of the distribution function [indicated 
by (61)#] is of interest. 


REACTION TIME DISTRIBUTIONS 


I attempted to fit Pearson’s (cited in El- 
d derton & Johnson, 1969) system of frequency 
curves using the moments in Table 3. The 
curve belongs to Pearson's Type VI class, 
but the system of frequency curves is not 
flexible enough to encompass the distribu- 
tions used in Table 3. The start of a Type 
VI distribution is at some value, a > 0 (El- 
derton & Johnson, 1969). Calculating the 
value of а for one set of data in Table 3 
gave a value of а around 6 sec, which is be- 
` yond the distribution cutoff value. Thus it 
seems that Pearson’s system of frequency 
curves may not be as flexible as is generally 
thought (see Patel, Kapadia, & Owen, 1976, 
for a list of those distributions that belong 
to Pearson’s system and those that do not). 
To summarize, moments and cumulants 
higher than variance have little to offer as 
sources of shape information about reaction 
time distributions because of their extreme 
variability and because they provide infor- 
mation about the extreme tails of the dis- 
tribution that is of little practical interest. 
More reasonable sources of shape informa- 
tion are mean, mode, median and standard 
deviation, together with Pearson’s Sx, and 
_ Ske measures of skewness. 


у 


А Further Alternative to Moments 
and Cumulants 


Another way to obtain shape information 
from reaction time distributions is to fit an 
explicit distribution function and use the 
4 Parameters of this distribution as a summary 
of shape. Ratcliff and Murdock (1976) have 
used the distribution resulting from the con- 
volution of normal and exponential distribu- 
lions (Equation 1) as an empirical summary 
of reaction time distributions in memory re- 
trieval Paradigms. For simple and choice re- 
irs time paradigms, Snodgrass et al. 
А 967) have shown that distributions with 

Tounded mode and exponential tail (e.g., 
pos and so probably the convolution 
m mal and exponential distributions) are 
3 HE as descriptions of distribution 
E he distribution they find to give the 

5t fts to their data is the double mono- 
E distribution, 
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Presenting information about reaction 
time distributions by providing the param- 
eters of an explicit distribution function (that 
fits adequately) has the great advantage that 
it is easy for anyone to reconstruct a dis- 
tribution (from the formula) that has nearly 
the same shape as the raw data. This may 
prove extremely valuable for mathematical 
modelers who may not wish to invest a large 
amount of time in obtaining raw data until 
some initial checks have been carried out. 
Further examples and discussion of the use 
of explicit distribution functions as approxi- 
mations to reaction time distributions can be 
found in Ratcliff (1978) and Ratcliff and 
Murdock (1976). 


Conclusions and Summary 


Information about reaction time distribu- 
tions can prove very useful in model con- 
struction and model testing, but there are 
few methods available for analysis of dis- 
tributions. In this article I have presented 
a method for obtaining group reaction time 
distributions from experiments in which there 
are as few as 10 observations per subject cell. 
The method essentially involves estimating 
latency quantiles for each subject and then 
averaging these over the group of subjects. 
Several distributions were shown to average 
to give another distribution of the same 
family with parameters that were the mean 
of the parameters of the individual member 
distributions, Several Monte Carlo studies 
were performed using the distribution that is 
the convolution of a normal and an exponen- 
tial distribution, a distribution used to fit 
reaction time distributions. These studies 
showed that the parameters derived from 
the group distributions were the same as the 
parameters used to generate the individual 
pseudosubject distributions. Fits of the con- 
volution model to group distributions derived 
from data combined from three large experi- 
ments gave parameters that were almost iden- 
tical to average parameters from fits to the 
distributions of individual subjects. The close 


` correspondence between these methods of es- 


timating group averages shows that group 
distributions provide an excellent summary of 
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distributional information for the group and 
do not introduce any systematic bias into 
the estimate of shape. 

Methods of deriving shape information 
that use moments and cumulants were evalu- 
ated, and three major problems were pointed 
out. First, estimates of the higher moments 
and cumulants have large standard devia- 
tions; for example, 10,000 observations may 
be needed before the standard deviation on 
the fourth cumulant is as low as 10% of the 
size of the fourth cumulant. Second, esti- 
mates of moments from data are extremely 
sensitive to outlier reaction times; the addi- 
tion of 1% slow responses can change the 
fourth moment by a power of 10. This prob- 
lem is particularly severe if an unknown 
proportion of the slow latencies are spurious, 
that is, if they are not a result of processes 
under examination. Third, Figure 6 shows 
that the third and fourth moments tell one 
about portions of the distribution that may 
be of no theoretical interest. It is suggested 
that the mean and standard deviation to- 
gether with estimates of median, mode, and 
Pearson's skewness measures (Sx, and Sre) 
provide better information about distribu- 
tion shape. These statistics are adequate, but 
may not be the most convenient statistics for 
conceptualizing the distribution or for fitting 
the distribution to more complex theoretical 
models. It is further argued that fitting ade- 
quate, explicit probability density functions 
to the observed reaction time distributions 
may provide more useful summaries of dis- 
tributional information for researchers in- 
volved in mathematical modeling. 
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Several models of jury decision making are reviewed. In each instance the 
model is described and compared with related models, its assumptions are 
scrutinized, its fit to normative data is evaluated, and possible revisions and 
extensions of the model are discussed. Models reviewed include (a) multinomial 
decision schemes designed to adduce implicit decision rules used in jury decision 
making, (b) binomial models of jury voting that use simplifying assumptions 
about jury decision making to assess the impact of explicit decision rules and 
jury size on verdict distributions, (c) Bayesian models that use normative data 
to estimate prior probabilities of defendants' "convictability" and juror accu- 
racy, (d) models that assess the relationships among jury size, decision rule, 
and jury accuracy, (e) models that examine the relationship between juror and 
jury errors, and (f) a computer simulation that uses simple assumptions about 
group persuasion and individual differences in jurors' resistance to persuasion 


to model results from empirical studies of jury decision making. 


Until quite recently most of our knowledge 
about juror and jury behavior has been based 
on survey research such as Kalven and Zeisel's 
The American Jury (1966) and archival data 
collected in studies of jury utilization. Within 
the past decade these data have been supple- 
mented by a growing body of experimental jury 
research (e.g., Boehm, 1968; Davis, Kerr, 
Atkin, Holt, & Meek, 1975; Landy & Aronson, 
1969; Mitchell & Byrne, 1973; Saks, 1977; 
Simon, 1967; Simon & Kaplan, 1972; Strodt- 
beck, James, & Hawkins, 1957; Valenti & 
Downing, 1975; Padawer-Singer & Barton, 
Note 1); investigators have used simulated, 
laboratory juries to explore a wide variety of 
factors that theoretically affect juror and jury 
decision making. (For reviews of this research, 
see Davis, Bray, & Holt, 1977; Penrod, 
Note 2.) 
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Even more recently a number of researchers 
have begun using survey and experimental 
data to develop mathematical and computer 
models of jury decision making, and these 
models are the focus of this review. The models 
can be classified into six categories : (a) implicit- 
decision-rule models that use mathematical 
techniques to determine the implicit rules that 
juries appear to use in deliberation, (b) simple 
binomial probability models that use the 
binomial expansion to mimic the effects of jury 
size and explicit decision rules on the distribu- 
tion of jury verdicts, (c) Bayesian models that 
use Bayesian and binomial models to estimate 
the prior probability that defendants are 
guilty and the probability that jurors can 
accurately detect guilt or innocence, (d 
binomial models of juror accuracy and juror 
satisfaction and their relationship to jury size 
and decision rules, (е) models of the relation- 
ship between juror and jury errors, and (f) a 
computer model that uses computer-simulated 
jurors and data from jury studies to explore 
the relationships among the initial distribution 
of individual juror opinions, individual differ- 
ences in persuadability, rates of juror vote 
changes, coalition sizes within a jury, delibera- 
tion times, and the distribution of jury verdicts: 
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We examine these models giving particular 
attention to their structure, the plausibility of 
their underlying assumptions, their fit to 
available data, and their generalizability. 


Implicit Decision Rules 


Of the models reviewed here, the one that is 
most firmly grounded in empirical data—in the 
sense that it makes the fewest assumptions 
about jury behavior—is the model developed 
by Davis and his colleagues (Davis, 1973; 
Davis et al., 1975; Davis, Kerr, Sussman, & 
Rissman, 1974). Davis made a fundamental 
distinction between two types of decision rules 
that a jury uses: an explicit decision rule such 
as the requirement that juries reach a unani- 
mous verdict and an implicit decision rule that 
describes the method by which a jury actually 
arrives at a verdict. The assumption, of course, 
is that juries may in fact operate under a 
decision rule different from the one given to 
them in a judge’s instructions. To determine 
the implicit rule Davis has used two sources 
of data: Simon’s (1967) study of the insanity 
defense, in which 30 12-person mock juries 
deliberated on a housebreaking case and 68 
mock juries deliberated on an incest case, and 

the Davis et al. (1975) and Davis, Kerr, 

* Stasser, Meek, and Holt (1977) studies of a 
Tape case in which student jurors deliberated 
individually or in 6- or 12-person juries. These 
last juries were given either a unanimous or a 
two-thirds rule as their explicit decision rule. 

In all of these studies the researchers asked 
the mock jurors to reveal their personally pre- 

| ferred verdict before deliberation began. 
|. Knowledge of the predeliberation distribution 
Votes combined with knowledge of each jury's 
final verdict allows one to ask, can final verdicts 
be predicted by applying a particular decision 
Tule to the initial distribution of individual 
itor votes? Davis (1973) called such a rule 
an “implicit social decision scheme (p. 99).” 
Е етае the implicit decision rule Davis 
NE patel the set of “distinguishable 
A E of member preferences within a 
EL x 101)." In general, these distributions 
in rw etermined by a multinomial expansion 
ich the set of initial distributions m equals 
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where n equals the number of mutually exclu- 
sive outcomes available to each group member 
and r equals the number of members. For 
jurors there are, of course, two possible out- 
comes (guilt or innocence), so n = 2. (When 
n = 2 the appropriate expansion is binomial, 
and the model is much simpler. A detailed 
discussion of the binomial expansion can be 
found in the next section of this article.) For 
12-person juries (r = 12) there are 13 dis- 
tinguishable initial distributions: 


28112 —74 
( 12 је в-л. 


These distributions range from 12 votes for 
guilt and none for innocence (12, 0) to 12 votes 
for innocence and no votes for guilt (0, 12). 
Similarly in a 6-person jury there are 7 
possible initial distributions, ranging from 
(6, 0) to (0, 6). 

The question Davis has posed is, given 
actual data on the frequency of each initial 
distribution of individual juror verdicts, is 
there one general decision scheme that will 
accurately predict the distribution of final 
jury verdicts? Davis represented various 
possible decision schemes in the form of an 
m X s stochastic matrix in which s corresponds 
to the available group outcomes (e.g., a verdict 
of guilt, acquittal, or a hung jury) The 
entries in the matrix represent the probability 
that each of the m possible initial distributions 
will result in one of the s possible group out- 
comes. Each unique matrix thus represents a 
distinct decision scheme that may correspond 
to a jury's implicit decision rule. 

Davis et al. (1975) have tested 13 different 
decision schemes that can be applied to the 
initial vote distributions for 12-person juries in 
which final verdicts can be of guilt, innocence, 
or a hung jury, and Davis, Kerr, Stasser, Meek, 
and Holt (1977) have tested 15 similar models 
for 6-member juries. Fortunately, the 3 decision 
schemes that have provided the best fits can 
be succinctly characterized verbally. For 
instance, Decision Scheme 8 from Davis et al. 
(1975) specifies that if two thirds or more of 
the jurors agree on a verdict on the initial 
ballot, this agreement determines the verdict. 
Tf two-thirds agreement is not obtained on the 
first ballot, the decision rule specifies that the 
jury will hang. For Scheme 7, majorities win, 
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Table 1 
Distribution of Votes for Acquittal on First Ballot and Jury Decisions Ч 
ои == 
No. of votes for acquittal on first ballot 
0 1-5 6 7-11 12 
Total % of 
Final verdict n % n % п % п % n % n total 
Not guilty 0 0 5235 5 50 37 91 26 100 73 32 
Guilty 43 100 90 86 5 50 142 0 0 139 62 
Hung 0 0 10 9 0 0 3 7 0 0 13 6 
Total n 43 105 10 41 26 225 
% of total 19 47 4 18 12 100 


Note. These data are from Kalven and Zeisel (1966). 


otherwise the jury is hung. Scheme 3 reflects 
a majority persuasion effect in which persuasion 
depends on the size of the initial majority: 
When 11 or 12 jurors agree (or 5 or 6 jurors 
agree in a 6-member jury), the verdict is deter- 
mined; for distributions between (10,2) and 
(6, 6) or (4, 2) and (3, 3) in 6-member juries, 
the juries yield guilty verdicts with probability 
r,/r (where r, is the number of jurors voting 
for guilt) and not-guilty or hung verdicts with 
probability #11 — (r,/r)]; and the distribu- 
tions (5,7) to (2,10) [(2, 4) in six-member 
juries] yield not-guilty verdicts with proba- 
bility ”ng/r and guilty or hung verdicts with 
probability 4[1 — (rn,/r) J. 

Davis (1973) tested 5 different schemes on 
Simon’s (1967) data and found that Scheme 3 
made the best predictions for both of Simon’s 
cases. Similarly, he tested 13 decision schemes 
on his own (Davis et al., 1975) data. The best 
fitting scheme overall was the two-thirds- 
majority model (Scheme 8), although other 
models provided better fits for particular jury 
sizes and explicit decision rules. 

Finally, 15 schemes were tested on the data 
from Davis, Kerr, Stasser, Meek, and Holt 
(1977). The study used six-member juries that 
deliberated with an explicit 4/6 (ie. four 
members out of six must agree) decision rule. 
For these juries a modified two-thirds rule— 
similar to Scheme 8 except that 75% of (3, 3) 
juries acquit and 25% hang—best fit the data. 
Davis et al. concluded that in general juries 
appear to be operating under an implicit “two- 

thirds, otherwise hung” decision rule even 
when the explicit, judge-instructed rule is 
unanimity. 


Davis’s approach has been applied by other 
researchers with similar results. For instance, 
Saks (1977) studied the effects of explicit 
decision rule and jury size on the distribution 
of jury verdicts in two experiments with a total | 
of 85 juries. When Saks tested for implicit | 
decision rules he also found support for 
Decision Schemes 3 and 8, but he found even 
stronger support for a power function rule 
suggested by Latane and Borden (Note 3). 

Gelfand and Solomon (1975, 1977) have 
employed Davis's decision scheme method in 
their efforts to fit the data on 225 juries sup- 
plied by Kalven and Zeisel (1966). (Table 1. 
reproduces the Kalven and Zeisel data.) 
Although the first ballot distribution was not 
given for every possible initial jury split (non- 
unanimous majorities for guilt and innocence 
were separately pooled), with minor modifica- 
tions in Davis’s Scheme 3, Gelfand and 
Solomon estimated the probability of convic- 
tion to be .637, of acquittal to be .303, and of 
a hung jury to be .060—a very close fit to the 
Kalven and Zeisel data. The Gelfand and 
Solomon scheme, like Davis’s Scheme 5 
incorporates a strong majority persuasion 
effect, but elevates the probability of a hung 
jury for (10, 2) and (2, 10) initial distributions. 

Although these results as a whole support the 
argument that juries may use something other 
than their assigned explicit rule, some caution ; 
is in order, for the studies reported to date 
have failed to demonstrate that one particular 
model consistently makes accurate predictions 
for all jury sizes and explicit decision rules: 
Indeed, as Davis et al. (1975) have note® 
different implicit rules may apply to different, 


| 


| 


k 


ў 


JURY DECISION MAKING 


jury conditions; it may be that different im- 
plicit rules apply in criminal and civil cases, 
that different levels of a defendant's apparent 
guilt elicit different implicit rules, that complex 
cases involve yet another type of implicit rule, 
that different types of judge's instructions 
vary the implicit rule, and so forth. Only 
further research can resolve these problems. 
An even more subtle and potentially more 
important methodological problem of the 
decision scheme analysis must be addressed. 
To date the evidence suggests that the rate of 
accurate prediction for individual trials is not 
reliably high. Most of the tests of implicit rules 
rely on a comparison of (a) the distribution of 
verdicts predicted from the application of the 
decision schemes to initial vote distributions 
with (b) the final distribution of actual ver- 
dicts. But without detailed inspection it is not 
clear that the final verdicts considered indi- 
vidually are consistent with the predictions 
made by the various decision schemes. Scheme 
8, for instance, might predict an overall distri- 
bution of verdicts that would resemble the 
overall distribution of actual verdicts, but the 
entries in the stochastic matrix associated with. 
thé model might do a poor job of predicting 
actual verdicts on the basis of initial vote 
distributions. In other words, the model might 
fit the aggregated data, but for the wrong 
teasons. There could be an infinite number of 
matrices that would predict final distributions 
identical to those of Scheme 8, but only one of 
them would make the maximum number of 
Correct predictions. Optimally, one would 
Want to take a model and examine the accuracy 
of predictions for each entry in the matrix. In 
the case of Scheme 3, for instance, do 83% of 
the cases with an initial (10, 2) distribution of 
Votes end up with a guilty verdict, 8% with a 
hung verdict, and 8% with an innocent verdict? 
Do 50% of the (6, 6) juries vote for guilt while 
€ other 50% divide evenly to acquit or to 
hang? 
The best fitting Scheme 8 (the two-thirds 
Es rule) is at least partially defective in 
18 regard. One illustration of its failure is 
that It does not allow either for reversals of 
Mitia] majorities (cases in which a minority on 
e initial ballot ultimately prevails) or for 
Jung Juries in cases in‘ which eight or more 
Mors initially agree. And yet there is ample 
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evidence (e.g., Padawer-Singer & Barton, 
Note 1) that initial majorities are reversed and 
that juries can hang even when eight jurors 
initially agree on a verdict. Some of the models 
allow for such reversals and provide more 
sources of hung juries, but unless the rules are 
fitted to actual outcomes on a jury-by-jury 
basis rather than on an aggregate distribution 
of outcomes, it is premature to say that one of 
the schemes accurately reflects the implicit 
decision rule. 

In fact, the initial results obtained from 
nonaggregated analyses, although mixed, are 
on the whole disappointing. Grofman (1976) 
tested the fit of the two-thirds model on the 
Davis et al. (1975) and Davis, Kerr, Stasser, 
Meek, and Holt (1977) data and reported that 
the results were disappointing. We found that 
the two-thirds rule predicted the verdicts of 
66 of 100 juries in the Kerr et al. (1976) study 
and 59 of 90 juries in the Davis, Kerr, Stasser, 
Meek, and Holt (1977) study. In fact, even 
the best fitting Scheme 15 (the modified two- 
thirds decision rule) mispredicts 10 of 90 juries. 
On the other hand, the standard two-thirds 
model predicted 24 of 25 juries in a study by 
Grofman and Hamilton (Note 4). 

In a later section we discuss some of the 
factors that may contribute to the poor fit of 
social decision schemes when they are applied 
to individual juries, but we note briefly at this 
point that one major source of the problem 
may be a by-product of the case that Davis 
and his colleagues have used in their research. 
"The case is one of rape, and it evokes different 
reactions from male and female subjects. In 
both Davis, Kerr, Stasser, Meek, and Holt 
(1977) and Kerr et al. (1976), approximately 
60% of the females voted to convict on the 
first ballot, while only about 50% of the males 
voted to convict. During deliberation females 
changed their votes from conviction to 
acquittal at rather high rates (in the first 
study, 18.3% of the females shifted from guilt 
to innocence compared with a 5.5% shift in 
the other direction, and males shifted relatively 
little in either direction). As a result of the 
higher rate of change to acquittal by females, 
the overall distribution of verdicts is shifted in 
the direction of acquittal. Since the social 
decision schemes typically assume that jurors 
are equally likely to shift in either direction, 
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they fail to account for the bias in female juror 
behavior and are therefore unable to capture 
the shift to acquittals. Rape cases may be 
peculiarly susceptible to this sort of phe- 
nomenon, and this observation underscores 
the fact that case type has a significant impact 
on the deliberation process. 


Probabilistic Models 


Earlier it was noted that Davis's jury model 
is built around the binomial expansion. There 
are other modeling efforts that also make use 
of the binomial expansion to mathematically 
evaluate the effects that jury size and decision 
rule have on the distribution of jury verdicts. 
'These issues are of practical significance and 
have been the subject of litigation before the 

' U.S. Supreme Court. In 1970 the Court held 
that six-member juries did not violate a 
defendant's right to a trial by jury (Williams 
v. Florida!), and in 1972 the Court held by a 
narrow margin that nonunanimous juries were 
constitutional (Apodaca v. Oregon? and Johnson 
v. Louisiana?) More recently, in Ballew v. 
Georgia the Court, with Justice Blackmun 
writing the Court's main opinion, cited a num- 
ber of empirical and theoretical studies to 
support the holding that juries with fewer 
than six members violate a defendant's right 
to a trial by jury. 

The court has not yet reached the question 
of the constitutionality of nonunanimous six- 
member juries, but since the issue is likely to 
be pressed in the courts, empirical research and 
mathematical models directed to the question 
have acquired additional importance. The 
mathematical models that are most relevant 
are the binomial models of Walbert (1971) 
and Saks and Ostrom (1975). The binomal 
expansion is a mathematical expression that 
facilitates the calculation of the probability 
that a specified number of successes (or fail- 
ures) will occur in a given number of trials (in 
the nonlegal sense), where trials are inde- 
pendent of one another and the probability of 
success is the same for each trial. More simply 
(using juries as an example), the binomial 
theorem applies in situations in which there are 

two possible outcomes (e.g., a juror can vote 

for either guilt or acquittal), in which there are 

a fixed number of trials (e.g, there are 12 
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jurors who must vote), in which outcomes | 
have identical probabilities (e.g. jurors are 
randomly drawn from a large jury pool in 
which a certain percentage of jurors will vote 
for guilt), and in which trials are independent 
(Le. a juror's initial judgments of guilt or 
innocence are independent). 

To apply the binomial theorem to the jury 
it is clear that several assumptions have to be 
made. First, it must be assumed that all jurors 
are prepared to vote for either conviction or 
acquittal; there can be no undecided jurors. 
It is not clear how often this assumption is 
violated in practice, but for statistical purposes 
it must be assumed that all jurors have at 
least some inclination—however small—toward 
conviction or acquittal. Second, it must be 
assumed that jurors are drawn from a larger 
pool of jurors, in which a certain percentage of 
the jurors will vote for conviction or acquittal. 
Finally, it must be assumed that the jurors 
have not influenced one another’s judgments 
prior to deliberation (this is probably a fairly 
reasonable assumption in light of the fact that 
jurors are typically cautioned not to discuss а 
case or to form an opinion until they have 
heard all the evidence). 

With these assumptions it is possible to use 
the binomial theorem to determine the proba- 
bility that a jury will have sufficient votes to 
convict a defendant on the first ballot: 


ef" 
ко = ("а-в 0 
i=Q \ 1 

where p(G) is the probability that a jury will 
produce sufficient votes to convict on the first 
ballot, т is jury size, О is the minimum number 
of votes required to convict (required quorum), 
g is the probability that a randomly selected 
juror will vote for guilt, and 1— g is the 
probability that a randomly selected juror will 
vote to acquit. 

Note that in the case of a unanimous 
decision rule, p(G) can simply be written 25 
#(6) = p”. The same theorem can also eas y 
be written to determine the probability that ? 


1 Williams v. Florida, 90 S.Ct. 1893 (1970). 

2 Apodaca v. Oregon, 92 S.Ct. 1628 (1972). 

3 Johnson v Louisiana, 92 S.Ct. 1635 (1972). 

1 Ballew v. Georgia, 98 S.Ct. 1029 (1978). 
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Figure 1. Percentage of first ballot convictions as a function of probability of guilt, according to a 
single, binomial decision rule model. (Denominator of parameters indicates jury size, and numerator 
indicates decision rule verdict requirement, i.e., the number of votes needed for a conviction.) 


jury will produce sufficient votes to acquit on 
the first ballot: 


ко È (sja O 
ї=0 7: 

1 Thus, the probability p(V) that a jury will 

àve sufficient votes to produce either a guilt 

9r an acquittal verdict on the first ballot is 


the sum of the probabilities from Equations 1 
and 2: 


OV) = У ("еа E 


i-Q 


i522 (9 rupe 

i=Q \ 1 
pos can, of course, vary jury size m, the size 
€ required quorum (), and the probability 
bus à randomly selected juror will vote for 
min £ and use the binomial theorem to deter- 
m * the probabilities that juries of varying 
© drawn from differing jury pools, and who 


use different decision rules will produce first 
ballot verdicts for guilt or acquittal (one need 
only consult appropriate tables of the binomial 
distribution). Figure 1 summarizes the proba- 
bilities of first ballot verdicts for juries of 6 
and 12 members who use two-thirds, five- 
sixths, and unanimous decision rules, where 
the probability of an individual juror voting 
for guilt ranges from 0 through 1.0. (Note that 
the figure also shows the probability of an 
acquittal—when 1 — g is substituted for g, 
the results are exactly symmetrical.) 

Clearly, reductions in the size of the jury 
and relaxations of the quorum requirements 
produce similar effects: They heighten the 
probability that a jury will produce a verdict 
on the first ballot. Furthermore, if g is in any 
way an index of a defendant’s objective guilt 
(irrespective of the evidence against the 
defendant) and juries can accurately detect 
such guilt on the basis of the evidence pre- 
sented to them, then it is clear that objectively 
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guilty defendants fare better with larger juries 
using a unanimous decision rule, for these 
juries are less likely to convict on the first 
ballot. 

Thus far we have emphasized that by using 
the binomial theorem with the assumptions 
made earlier, it is possible to determine the 
probability that a jury will render a verdict 
on the first ballot taken during deliberation. 
Events after the first ballot, when a jury has 
failed to reach a verdict, have been a recurrent 
problem for the mathematical modelers, as 
most juries clearly do not reach a verdict on 
the first ballot, but typically continue to 
deliberate until they either reach the required 
quorum or find themselves hopelessly dead- 
locked (when a jury finds itself hung, the trial 
judge is forced to declare a mistrial). Figure 1 
shows that in relatively close cases (where g 
ranges from .3-.7), virtually no 12-person, 
unanimous juries reach a verdict on the first 
ballot. What is likely to happen in juries of 
this sort? Do they hang? Do they divide 
evenly in their verdicts? Does the majority 
tend to prevail? 

One answer to these questions is provided 
by Walbert's (1971) analysis. As one has 
already seen, Davis's (1973) implicit-decision- 
rule analysis suggests that initial majorities 
tend to determine final verdicts ; Walbert cited 
additional evidence to support the claim that 
deliberation after the first ballot is funda- 
mentally irrelevant to the mathematical 
analysis. He argued as follows: (a) Small- 
groups research shows strong majority persua- 
sion effects—with complex judgments minori- 
ties tend to conform to the judgments of the 
majority—that are accentuated by external 
pressures (such as judges' instructions) and 
are most evident in group discussions with 
leadership structures resembling those found 
in the jury room. (b) Empirical evidence indi- 
cates (Kalven & Zeisel, 1966) that majority 
persuasion operates in about 93% of all cases 
(minorities prevailed in 3%, and 4% ended 
with a hung jury). (c) Juries in which jurors 
initially divide evenly (6 to 6) tend to split 
evenly for conviction and acquittal (Kalven & 
Zeisel, 1966). 

If one takes these data and assumptions as 
valid and further assumes that the reported 
discrepancies (i.e., reversals of initial majori- 
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ties) are of minimal importance (assumptions 
that we examine below), it is possible to argue ; 
that first ballots basically decide almost all 
cases. With respect to the binomial theorem, 
these assumptions imply that one need only 
be concerned with two factors: the probability 
that a jury will produce a majority of votes for 
a verdict on the first ballot and the disposition 
of cases in which the jury splits evenly on the 
first ballot. 

If the initially evenly split cases yield final 
verdicts divided evenly between conviction 
and acquittal and the possibilities of hung cases 
and reversals of initial majorities are ignored, 
then the binomial theorem can be modified, as 
in Equation 3, to determine the probability 
that a jury will render a guilty verdict: 


MR x ("е £-27. @) 


im(n/2H1) \ 1 2 | 


The first term of Equation 3 is identical to 
Equation 1, except that Q is the majority of 
jurors and the second term is the probability 
of an even split in the initial ballot divided by 
two. A similar expression can be constructed 
for the probability of an innocent verdict and 
is analogous to Equation 2: 


«ne (7) 007 a 


#= (пан V t 2 


Note that by definition the sum of Equations 
3 and 4 is 1.00. Furthermore, in these equations 
the decision rule is essentially fixed at majorily 
wins, and only jury size affects the distribution 
of guilty and innocent verdicts. 

Walbert’s decision rule is quite similar (0 
Davis et al. (1975) Scheme 7, except that 
Walbert eliminated hung juries and assign 
evenly divided juries equally to guilt and 
acquittal. However, where Davis examin 
results by applying his schemes to initial votes 
from experimental studies to determine their 
fit to the data, Walbert made a more general 
argument. Starting from the assumptions out 
lined above, Walbert calculated the effect of | 
jury size on the distribution of jury verdicts 
given a wide range of binomially distribu 
initial individual votes and using (a$ 
Saks & Ostrom, 1975) varying levels of g (the 
probability that a randomly selected juror 
vote for guilt). Walbert’s results indicate 
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when g = 4, the use of a 6-member jury in- 
creases the probability of conviction from .25 
(for the 12-member jury) to .32. And when 
g = 2, the 6-member jury is six times more 
likely to convict (6% vs. 1%). Parallel results 
for acquittals are obtained when g > .5. 
If one makes the additional assumption that 
g is a fairly reliable indicator of a defendant's 
“convictability” (ї.е., g indexes the weight of 
the evidence against the defendant), then 
Walbert’s analysis implies that the more 
convictable defendants are better off with the 
smaller juries, whereas relatively unconvictable 
defendants fare better with large juries. 
Furthermore, if one regards convictions of 
relatively unconvictable defendants or acquit- 
tals of relatively convictable defendants as 
errors, it is clear that the larger jury is prefer- 


a reduction in jury size raises the probability 
of conviction when g< .5: For example, 


able. Indeed, it can be argued that the optimal 
pattern of conviction and acquittal would 
break sharply at g = .5. Juries would always 
convict when g was greater than .5 and acquit 
when g was less than .5. With these assump- 
tions, optimality would be achieved with in- 
finitely large juries using majority rule. 
Coincidentally, this optimality rule also maxi- 
mizes the likelihood that jury verdicts will 
accurately reflect the attitudes and values of 
the community from which the jury pool has 
been drawn (the representativeness of the pool 
itself would also be a consideration). Indeed, 
the binomial theorem can be used to determine 
the probability that a specified number of 
Jurors (analogous to Q) will be selected from a 
Jury pool in which a specified trait (say, race 
Or sex) occurs at a particular rate (analogous 
to g). Lempert (1975) discussed this repre- 
Sentativeness question at length, and in his 
article he summarized the probabilities that 
two, one, or no jurors will be selected with the 

, *PPropriate trait in 6- and 12-member juries 
Es rates of trait occurrence in the jury 

| p the assumptions made in Walbert's 
| 971) analysis, it is clear that the 6-member 
Jury is inferior to the 12-member jury: It 
Produces more errors and is less likely to 
"present the population from which it is 
drawn, 
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We note, in passing, that there is yet another 
way to think about g that could provide a 
quantifiable definition of reasonable doubt. If 
we were to define g, as the probability that a 
randomly selected juror will say that a 
defendant committed the charged acts beyond 
a reasonable doubt and g: as the probability 
that a randomly selected juror will vote to 
convict the defendant under a more-likely- 
than-not standard of proof, then we could 
determine a probabilistic value for RD (the 
additional certainty required for a reasonable- 
doubt standard) such that RD = gı — go. 

И by empirical methods we found that RD 
was greater than zero, we could use Walbert’s 
formulation of the binomial decision model to 
examine the effects of a relaxation in the 
standard of proof required in criminal cases. 
Tf, for example, да = .6, р = .5,and RD = .1, 
a relaxation from the reasonable-doubt stan- 
dard to the more-likely-than-not standard 
would increase the probability of conviction 
from .5 to .75 in the 12-member jury and from 
.5 to .68 in the 6-member jury (see Walbert’s 
Figure 1). Similar comparisons can be made 
for different values of g and RD. In every 
instance (for RD > 0) the relaxation of the 
standard accentuates the differences in 6- 
versus 12-member error rates. 

To date the empirical evidence regarding the 
value of RD is mixed, In an experimental study 
involving a theft case, Cornish and Sealy 
(1973) found that the probability of a convic- 
tion under the reasonable-doubt standard 
was .50 compared with .51 for the more-likely- 
than-not standard. Simon and Mahan (1971) 
asked judges, jurors and, students to rate the 
chances out of 10 that a defendant committed 
a crime if convicted beyond a reasonable doubt 
and if convicted by a preponderance of the 
evidence. All three groups gave a mean rating 
of approximately 8.5 out of 10 for reasonable 
doubt; but whereas judges rated preponderance 
of the evidence at 5.5, the other two groups 
gave a rating of 7. Similarly, in Kerr et al. 
(1976), subjects individually judged a video- 
taped rape trial using one of three definitions 
of beyond a reasonable doubt. Of the jurors who 
formed a verdict preference after viewing the 
trial, 51% of those using a stringent definition 
(moral certainty), 61.2% of those with no 
definition, and 66.3% of those using a lax 


470 


definition (substantial doubt) voted to convict 
the defendant. When the subjects rated the 
probability of guilt associated with the stan- 
dards, the mean rating for the stringent 
definition was .87 compared with .83 for the 
other two definitions. Nagel, Lamm, and Neef 
(Note 5), using a normative decision theory 
model, reported that student conviction proba- 
bility thresholds averaged about .55 for various 
types of criminal cases. Thus, the preliminary 
evidence indicates that the standards of proof 
and the definitions of those standards can affect 
juror judgments, although perhaps not to the 
extent intended by the legal system. To date 
there is little research on the factors that may 
affect the value of RD: the quality of judges’ 
instructions, differential experience with legal 
decision making, individual differences in 
jurors, differences in types of defendants and 
indictments, differences in available verdict 
alternatives, and differences in the types of 
judgments a jury must make (such as resolving 
issues of eyewitness identification vs. selecting 
a verdict or determining whether a defendant 
is guilty or innocent of any offense vs. deter- 
mining which offense a defendant has com- 
mitted). 

Before leaving Walbert’s (1971) model, it 
may be useful to note the points at which his 
analysis appears to be on the weakest ground. 
Unlike Davis, who attempted to account for 
reversals of initial majorities and hung juries 
in some of his models, Walbert assumed that 
these phenomena are unimportant when com- 
pared with the strong majority persuasion 
effects that he found in empirical data such 
as Kalven and Zeisel’s (1966). However, as 
Gelfand and Solomon (1974) pointed out, it is 
not clear that Walbert has properly charac- 
terized the Kalven and Zeisel data (see 
Table 1). These data summarized the relation- 
ship between initial ballot distributions and 
final verdicts for 225 criminal cases tried in 
Chicago and Brooklyn. On close inspection, it 
turns out that for all cases in which the jury 
was not evenly divided (6 to 6), the initial 
majority ultimately prevailed in 91.5% of the 
cases (compared with Walbert’s 93%), and the 
jury hung in 6.0% (compared with Walbert’s 
4%). Even assuming that Walbert’s character- 

ization of the Kalven and Zeisel data is the 
best possible, his model fails to account for 
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more than 8% of the total cases (six reversals 
of majorities and 13 hung juries out of 225 
cases), and it is not clear what implications 
these error cases have for his conclusions about 
the effects of jury size. 

Beyond the question of the model’s basic fit, 
one might want to know whether (and in what 
ways) cases involving reversals of majorities 
and hung juries differ from cases exhibiting 
majority persuasion effects. Intuitively, there is 
good reason to think that such cases are the 
most difficult for juries to decide ; the evidence 
is not sufficiently compelling in favor of a 
guilty or an innocent verdict for jurors to be 
able to readily agree. In other words, these 
cases may tend to be the ones in which g is 
close to .5. Some evidence for the difficulty of 
such cases is provided by the results from 
Padawer-Singer and Barton’s (Note 1) study 
of 92 6- and 12-member mock juries that 
deliberated after viewing a videotaped murder 
trial. This case was evidently fairly difficult 
to decide, for the probability that a juror ran- 
domly selected from the pool of jurors who 
viewed the trial would vote for guilt was = 41. 
And (summing across 6- and 12-member juries 
who used both unanimous and nonunanimous 
decision rules) for the 70 juries with initial 
majorities for a verdict, the majority prevailed; 
in only 68.6% of the cases, the minority pre 
vailed in 12.9%, and 18.6% were hung. ~ 
the 22 juries that were initially evenly split 
(6 to 6 or 3 to 3), 54.5% ultimately returned 
innocent verdicts, 40.9% returned guilty 
verdicts, and 4.5% returned hung verdicts. 

The Padawer-Singer and Barton data SUE 
gest that Walbert's model may provide th 
poorest fit for those cases in which accuracy 
and representativeness are most critical — the 
“hard to decide” cases. It would be prematutty 
however, to conclude that the postulated | 
of fit for cases with g near .5 undermine 
Walbert's basic arguments about the effects 0 
jury size on jury accuracy and representatlv® 
ness, for as long as majorities prevail more 
often than minorities, larger juries should D 
preferred to smaller juries. 


Bayesian Models 


One important question that a research 
might ask about juror performance relates 
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the juror's ability to accurately determine a 
| defendant's guilt or innocence. One would like 
to know how often, under the best of circum- 
stances, jurors and juries accurately determine 
a defendant's objective guilt or innocence. But 
of course there is no reliable method of deter- 
mining objective guilt and innocence. One can 
only ask how reliable jurors and juries are at 
the task of assessing guilt and innocence from 
the evidence presented to them at trial (and 
we would hesitate to argue that the quality of 
trial evidence is necessarily related to a 
defendant's objective guilt). 

The models examined thus far do not 
attempt to evaluate the acuity of jurors, but 
assume that the best available index of a 
defendant's guilt is the proportion of jurors in 
à jury pool who, after hearing all the evidence 
and testimony, are prepared to vote for con- 
viction. In this section we examine a jury 
model that uses a Bayesian analysis to deter- 
mine both the prior probability that a de- 
fendant is convictable and the probability that 
jurors will correctly assess this convictability. 

In a series of articles, Gelfand and Solomon 
(1973, 1974, 1975, 1977) have developed a 
model based on Poisson's (1837) application of 
probability theory to jury verdicts. Following 
» Poisson, Gelfand and Solomon began their 
analysis by suggesting that with adequate data. 
two important parameters can be estimated : 

1. T is the probability that before trial an 
accused is convictable, that is, the proportion 
of defendants brought to trial who are con- 
Victable or the probability that the weight of 
the evidence will be against a randomly 
Selected defendant. 

2. M is the probability that a juror will not 
Vote for the wrong verdict, the probability that 
à Jury will correctly assess and vote with the 
Weight of the evidence against a defendant. 
(For the purpose of modeling, Gelfand and 
Solomon assumed that T and M are inde- 
А Pendent and that M is а common value for all 
| Jurors.) 

Gelfand and Solomon made it clear that 
when they wrote about convictability, they 
Vere really discussing the standards of indict- 
ment and the community standards for con- 
Viction that might prevail in a criminal justice 
System and were not using convictability (or 

more loosely, guilt) to mean objective guilt. 
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The specification of these two parameters 
allows for the construction of a mixed binomial 
expression similar to Equation 1 in which it is 
possible to determine W,,;, the probability 
that a jury with » members will cast exactly 
ivotes for acquittal on the first ballot; Y,,,:, the 
probability that a jury with n members will 
cast at most i votes for acquittal on the first 
ballot; n,i, the probability that the defendant 
is guilty given exactly ? votes for acquittal on 
the first ballot; and P,,;, the probability that 
the defendant is guilty given at most 7 votes 
for acquittal on the first ballot, where 


P (omnea — My 


EM Dad M) (5) 
Wri = x Ра 
јео 
pi ()me-a = M) Yni; (6) 


Pai=T Е (yea = м), 


Note that У is the sum of correct and іп- 
correct votes for guilt and innocence and that 
W cumulates the probabilities of votes for 
innocence from 0 through 7 votes. The expres- 
sion for p is simply the proportion of votes for 
guilt that are attributable to guilty defendants, 
whereas P cumulates the proportion of votes 
attributable to guilty defendants from 0 
through 2 votes. 

Gelfand and Solomon (1973) demonstrated 
that the probability of a verdict is independent 
of T and that the correctness of a verdict is 
independent of jury size but does depend on 
the size of the quorum required for conviction. 
(This seeming contradiction of Walbert's 
analysis arises from the differences in the ways 
that the two models specify correctness.) 

Using Poisson's (1837) data on French civil 
and criminal trials of the period from 1825 
through 1833, Gelfand and Solomon (1973) 
obtained estimates of T and M by determining 
the probability of conviction in criminal cases 
at times when the required quorum for jury 
verdicts was either 7 of 12 votes (i.e., W1»,5 for 
the years 1825-1830) or 8 of 12 votes (W»,4 for 
the years 1831-1833). With the knowledge that 
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Yis = Wis — Ка, it was possible for 
Gelfand and Solomon to estimate the following 
overall parameter values: T — .7494, M 
= .6391, Piss = .9406, and ра а = .9943. 

In their second article, Gelfand and Solomon 
(1974) applied similar methods to Kalven and 
Zeisel’s (1966) data on first ballots and final 
verdicts in the 225 criminal cases cited by 
Walbert (1971). (See Table 1 for complete 
data.) In this instance their estimates of T and 
M were based on the overall distribution of 
verdicts reported in Table 1 and the various 
estimates this distribution provides for dif- 
ferent Ys. Thus, 43 of the 225 (19%) juries 
produced first ballot, unanimous votes for 
guilt, so that one reasonable estimate of 
Кле is .19. Similar estimates of other Уз 
produce the following simultaneous equations: 


Yr, 19; 
5 
x Кра = 47; 
= 

Yiz, = .04; 
n 
У Ура = 18; 
i 

Үз = 42. 


In their 1974 article, Gelfand and Solomon 
used the method of moments approach to find 
solutions for T and M, whereas in their 1975 
article they treated the first ballot results as 
independent observations from а five-cell 
multinomial distribution and employed mini- 
mum chi-square and maximum likelihood 
estimation procedures to determine values for 
T and M. In each instance they also evaluated 
a three-parameter model in which Mi is the 
probability that a juror will vote for guilt given 
a guilty defendant, M; is the probability that 
a juror will vote for innocence given an inno- 
cent defendant, and T is defined as before. The 
results of the three estimation procedures were 
roughly similar. The values of Mi and Ms 
clustered around .9 and did not appear to be 
significantly different (jurors appeared to be 
equally accurate in detecting guilt and inno- 
cence), while the values of T clustered 
around .7. 

As noted earlier, the Gelfand and Solomon 
model is analogous to Walbert’s binomial 
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model, but uses two parameters rather than 
one. Gelfand and Solomon’s model also allows 
a very simple comparison of the probability of 
conviction by 6- and 12-member juries for 
various values of Т and M. Assuming majority 
persuasion and an equal split for guilt and 
innocence in juries who are initially evenly 
divided (precisely the assumptions made by 
Walbert), Gelfand and Solomon set the proba- 
bility of conviction by a 12-member jury as 


5 
t= У Ур; ФУ 
i-o 


and the probability of conviction by a six 
member jury as 


2 
$25 Ys + Уба. 
=) 


| 
Table 2 compares the values of / and s for 
Т = 2, 4,.6, and .8 and M = .2, 4, .6, and 8. 

Gelfand and Solomon (1974) took the posi- 
tion (one they retreated from in later articles) 


Table 2 

Probability of Conviction by 12-Man Jury, 
tw(T, M), and 6-Man Jury, s(T, M), for 
Values of T and M 


T tw(T, M) s(T, M) 
M=.2 

2 793 1765 

4 .598 .588 

6 402 A12 

8 .207 .235 
м=а 

2 .652 .610 

4 .551 .531 

6 449 .463 

8 348 .390 
M-.6 

i2 .348 .390 

4 449 463 | 

6 551 537 

8 .652 :610 

ЕЛЫНА ої: O 

М = | 

2 .207 .235 

А .402 A12 

6 598 588 

8 .193 .165 


Note. Based on Gelfand and Solomon (1974). 


that the differences in performance between 
6- and 12-member juries are negligible over 
the full range of values for Т and M. In fact, 
itis clear from Table 2 that whenever jurors 
vote correctly more than half the time, 12- 
member juries perform better than 6-member 
juries (ie, they vote correctly more often). 
This is, of course, the same conclusion reached 
by Walbert (1971). Indeed, when one considers 
that there are thousands of criminal trials each 
year, it is also obvious that the otherwise 
negligible differences in performance may 
yield quite significant practical consequences, 
affecting thousands of lives. 

Of course, we noted earlier that the Walbert 
_ model is less than ideal insofar as it fails to 
account for reversals of initial majorities and 
hung juries. In an effort to incorporate these 
two violations of the simple majority persua- 
sion model, Gelfand and Solomon (1975) pro- 
posed that the Kalven and Zeisel data can 
provide the basis for more refined estimates 
of M, T, p, and P. Table 1 shows that all the 
cases with initial unanimous (12 to 0) votes for 
guilt ultimately returned guilty verdicts, 86% 
of the cases with between 7 and 11 votes for 
guilt did so, and half the cases with evenly 
divided juries did so, whereas 2% of the cases 
| with initial though nonunanimous majorities 
favoring acquittal were reversed, the minority 
ultimately prevailing. On the basis of these 
results Gelfand and Solomon suggested the 
following equation for the probability that a 
Jury will convict : 


P 5 
у 5 = Кро + 86 У, Yu; 


= 


п 
+ .50V а + 02% Vis, (7) 


= 


Where, for example, Уза, is the proportion of 

Juries who begin deliberation with no votes for 

ашна] Similarly, they suggested the follow- 
Rg equation for the probability of acquittal: 


11 
Р, = Vw + 913 Yi; 
je 


5 
i + 50¥ine + 053, Yu. (8) 
= 
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The probability of a hung jury is thus 
Pa = 1 — Ра — P.. Of course, not all acquit- 
tals and convictions are correct, but it is 
possible to determine the probability or pro- 
portion of convictions and acquittals that are 
correct. Thus, 


5 
Pore = (P12,0¥ 12,0 + 86 32 рг; 
= 


11 
+ .50p12,6V 12,5 + -025 рај Р1,)/ P»; 
~ 
11 
Pia = [Ра — (радо Ра + 917, fis; Y 12, 
j=1 


5 
+ -50b12,6 12,6 + .05 25 fis ;Yasj) ]/ P«. 
= 


Table 3 compares the results of the three 
methods of estimating the probabilities of 
interest for 12-member juries using Walbert's 
simple majority persuasion model and the 
more refined equations (7 and 8) proposed by 
Gelfand and Solomon. 

To assess the reasonableness of the refined 
model, Gelfand and Solomon (1975) computed 
the values of P, Pa Pr, Porc, and Pija for 
different values of M and T and compared the 
results with the distribution of verdicts from 
the 225 Kalven and Zeisel cases cited earlier 
and with the overall distribution of verdicts 
from the 3,576 trials used in the entire Kalven 
and Zeisel (1966) study. Again, the best fits 
were produced with M = .9 and T = un 

Finally, as noted above, Gelfand and 
Solomon (1975, 1977) used their maximum 
likelihood estimates of M and T in combination 
with Daviss (1973) social-decision-scheme 
analysis and by slightly modifying Davis 
et al.’s (1975) Scheme 3 produced a very good 
fit to the Kalven and Zeisel data: Р, = .637, 
Р, = 303, Р, = .060, Р. = 9779, and 
Pija = .9385—results that compare quite 
favorably with the results reported in Table 3, 


Juror Accuracy and Satisfaction 


Grofman (1976, in press) has employed a 
general binomial model similar to Gelfand and 
Solomon’s for two purposes: to examine the 
effect of applying several simplifying assump- 
tions to his model and to examine the implica- 
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ТаЫе 3 


Comparison of Walbert's (1971) and Gelfand and. Solomon's (1975) Jury Decision Models for 


12-Member Juries 
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Probability of outcome 


Sl 


Defendant Defendant 
guilty given innocent given 
that jury that jury 
Study Conviction Acquittal Hung convicts acquits 
Minimum x? estimate 
Walbert (1971) .6588 .3412 0 .9986 .9938 
Gelfand & Solomon (1975) .5843 .3433 .0724 .9882 .9092 
Maximum likelihood estimate 
Walbert (1971) .6897 .3103 0 .9997 .9984 
Gelfand & Solomon (1975) 6189 .3155 .0656 .9918 .9129 
Method of moments estimation 
Walbert (1971) .6999 .3001 0 .9998 .9993 
Gelfand & Solomon (1975) .6340 .3059 .0601 .9930 .9175 


tions for decision rule preferences of jurors' 
tolerance for verdict errors. We examine these 
analyses briefly, starting with the analysis of 
the simplifying assumptions. 


Juror Acuity Model 


Following Gelfand and Solomon (1973), 
Grofman (1976) associated a binomial p with 
the probability that a randomly selected juror 
will correctly judge innocent defendants to be 
innocent (Pzr) and guilty defendants to be 
guilty (Pea), where Pg is the proportion of 
defendants who are guilty, Pr is the proportion 
of defendants who are innocent (by definition, 
P; = 1 — Pg), Рс is the proportion of de- 
fendants convicted by juries, P4 is the propor- 
tion of defendants acquitted by juries, Py is 
the proportion of defendants whose juries hang, 
and q is the number of votes required for a 
verdict (either guilt or innocence) and corre- 
sponds to a de facto decision rule (when q votes 
are not obtained, the jury is assumed to hang). 
In developing his model, Grofman assumed for 
simplicity that Prr = Рос = p (ie, that 
jurors are equally good at correctly determin- 
ing guilt or innocence). 

In some respects Grofman’s model is a more 
general version of the Gelfand and Solomon 
(1973, 1974, 1975, 1977) model; Grofman’s p 
and Pg correspond to Gelfand and Solomon’s 


T and M. 


As the first step in the construction of his 
model, Grofman examined the probability that 
a majority of jurors (q = m) in a jury with an 
odd number of jurors (IV) will reach a correct 
verdict : 


P (correct verdict) 


AW ARM 
=> ( ea eph 

Ап А 
This expression is simply Equation 1 16 
written with Q = m and with p as the proba- 
bility that a juror will vote correctly. The 
general implications of this model are that 
when ф > }, increasing the size of the Jury 
also increases the probability that a majority 
will reach a correct verdict (while lowering the 
probability that a verdict will be reached); 
when р = 2, the probability that a jury 
reach a correct verdict is 4 and is indepen 
of jury size; and when р < 3, the larger thé 
size of the jury, the less likely it is to reach à 
correct verdict. 
Equation 9 can also be used to create expre 


dent 
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sions for Pc, Ра, and Prr: 


Panis [Cea — p)N Pa 


h=q 


Tra — pn], qm 


у м= E[G je - 2n 


h=q 


+9 – pn]. ap 


= N 
Ja h ae — p) 4: 


Pg = (12) 


Equation 12 is a corrected form of Grofman's 
expression. Note that the first terms of Equa- 

. tions 10 and 11 determine the probability that 
q or more jurors will correctly vote for guilt or 
innocence, whereas the second terms determine 
the probability that q or more jurors will in- 


correctly vote for guilt or innocence (cf. 
Equations 5 and 6 in Gelfand & Solomon’s, 


1973, model). The Py equation determines the 
probability that less than q jurors will concur 


on either guilt or innocence (note that when 
the decision rule is that the majority wins, 
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Pu is the probability of an even split in juries 
with an even number of jurors, and Py = .0 
vith odd-numbered juries). Note that Ри is 
independent of Pg and Py. 
Table 4 
Distribution of First Ballot Votes When the 
» Probability of a Juror Not Erring Equals .88 
and the Probability the Defendant Is Guilty 
Equals .69 
==________- | a 
No. of Total Correct Incorrect 
Votes ^ probability part part 
Votes to convict 
12 .1488 .1488 0 
oH 2435 .2435 0 
10 .1826 .1826 0 
9 0830 .0830 0 
8 0255 .0255 0 
7 .0056 .0055 0001 
Votes to hang 
6 0013 .0009 .0004 
Votes to acquit 
| 7 0026 .0025 :0001 
8 .0115 10115 0 
a 0373 .0373 0 
10 .0820 .0820 0 
t .1094 .1094 0 
| 2 0669 0669 0 


1 Note. Adapted from Gelfand and Solomon (1975). 
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Table 4 (based on Gelfand & Solomon, 1975) 
displays the probabilities of vote distributions 
from (0, 12) through (12, 0) that obtain from 
р = .88 and Pg = .69 and provides some in- 
sight into the operation of Equations 10-12. 
Note that the table is based on a 12-person 
jury using a majority decision rule. If the 
decision rule were 8 of 12 votes, the hung 
portion of the table would consist of the sum 
of the probabilities of the (7,5), (6, 6), and 
(5, 7) distributions. 

Grofman (1976) observed that if jurors are 
very accurate in assessing guilt and innocence, 
then the values for P and pc are approximated 
by the first terms of Equations 10 and 11. 
From Table 4 it is clear that for a 12-member 
jury with р = .88 and a majority decision 
rule, Grofman's observation is quite correct — 
deleting the “incorrect” portions of the Pa 
and Pe terms loses only .02% of the cases. 

Tf one further assumes that the decision rule 
is unanimity, then the equations simplify even 
further: 


Pc = рҮРо, (13) 
Pa = "Pr, (14) 
Ри = 1— 9, (15) 
where 
ees 
6" Pe + Pa’ 
РА 
а 


pe (Ре + Ра)". 


Grofman correctly observed that with the 
appropriate normative data, it is relatively 
easy to estimate the parameters Pa, Pa, and p, 
but in making his estimates he assumed that 
the appropriate data are the distribution of 
final jury verdicts when, in fact, the appro- 
priate data are the distribution of initial 
ballots. Grofman assumed that he could treat 
q as though it were an effective decision rule 
(similar to the analyses of Davis, 1973, and 
Walbert, 1971); however, the simplitying 
assumptions that led him from Equations 10, 
11, and 12 to Equations 13, 14, and 15 used 
restrictions on q that effectively changed the 
decision rule from a majority decision to 
unanimity rule. As Saks and Ostrom (1975) 
demonstrated, the effect of such restrictions is 
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to lower the probability that a jury will produce 
a verdict on the first ballot. (As can be seen 
from Table 4, when p = .88 and pe = .69, 
only slightly more than one fifth of the juries 
can be expected to produce a verdict on the 
first ballot with a decision rule of unanimity.) 

It is, therefore, appropriate to estimate the 
parameters of the simplified model using data 
on first ballot distributions. The estimates 
obtained using the Kalven and Zeisel (1966) 
data are Pg = .6232, Pr = .3768, p = .9062, 
and Py = .6933. That the estimates are 
roughly equivalent to those made by Gelfand 
and Solomon (1975) should not surprise one, 
since Grofman’s simplifying assumptions allow 
a straightforward estimation of the same 
parameters. The estimates are cruder because 
the simplified equations are based on initial 
vote distributions at the tails of the binomial 
distribution—(12,0) and (0,12)—but the 
similarity of the estimates produced by the 
two procedures tends to confirm the validity of 
Grofman’s simplifying assumptions. 

One might take this analysis one step further 
and use the data on final verdicts to assess first 
ballot parameter values. If, for instance, one 
regards final verdicts as the best available 
index of defendants’ convictability or “acquit- 
tability,” then the Kalven and Zeisel (1966) 
data in Table 1 indicate that 139 (61.78%) of 
the 225 defendants were convictable, that 73 
(32.44%) were acquittable, and that in 13 cases 
(5.7%) the guilt or innocence of the defendant 
was ambiguous. One can now ask, what is the 
probability that a convictable (acquittable) 
defendant will be convicted (acquitted) on the 
first ballot? The answer is that 43 of the 139 
defendants ultimately convicted were con- 
victed on the first ballot (30.94%), whereas 
26 of the 73 defendants ultimately acquitted 
were acquitted on the first ballot (35.62%). 

Thus, the probability that an individual 
juror will vote to convict a convictable 
defendant on the first ballot is .3094!/ or 
.9069, and the probability that an individual 
juror will vote to acquit an acquittable de- 
fendant on the first ballot is .35621/ or .9176. 

These results are in close agreement with the 
Gelfand and Solomon and Grofman estimates 
and suggest that jurors may be somewhat more 
reluctant to vote to convict an apparently 
guilty defendant on the first ballot than they 
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tells one nothing about the deliberation process 
itself; it merely provides a crude method for 
assessing first ballot, individual juror accuracy, 


Juror Tolerance for Errors 


Following the analyses of Rae (1969), Taylor 
(1969), Curtis (1972), and Badger (1972), 
Grofman (1976) has used his basic binomial 
model (Equations 8, 9, and 10) to explore the 
decision rule implications of various levels of 
juror tolerance for erroneous jury decisions. 
He first specified a trade-off ratio R that re- 
flects the number of guilty defendants a juror 
would be willing to set free to avoid the 
erroneous conviction of one innocent defend- 
ant. Two ratios can be constructed using R: 
Pr=R/(R+1) is the relative weight 
attached to avoiding false convictions (avoid- 
ing a Type I error) that Grofman analyzes as 
the weight attached to assuring correct con- 
victions and 1 — Pr = R/(R + 1) is the rela- 
tive weight attached to assuring correct 
convictions of guilty defendants (avoiding à 
Type II error) that Grofman analyzes as the 
relative weight attached to avoiding false 


convictions. By way of example, if a juror 


are to vote to acquit an apparently innocent 
defendant. As before, of course, this analysis 


willing to set five guilty defendants free to 
avoid one false conviction, then Pr = 5(5 + 1) 
= 5/6 and 1 — Pr = 1/6. (For discussions of 
Type I and Type II errors in legal contexts 
see also Feinberg, 1971; Friedman, 1972; 
Tribe, 1971.) 

By weighting the probability of correct 
voting to reflect the trade-off ratio Pr for 
Type I and Type II errors, а new equation can 
be formed for the weighted probability that ^ 
quorum q will produce a correct verdict: 


El( с. је а = рута | 


+ iGo — g)'^Ps — а]. 
(16) 


The second term of Equation 16 corresponds 
to the first term of Equation 10 and is an 
the weighted probability that g or more UP) 
will correctly vote for conviction. he fT 
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- term of Equation 16 is related to the first term 

| of Equation 11, but in addition to including 
all the cases in which g or more jurors correctly 
vote for acquittal, the first term of Equation 16 
also includes the cases in which the jury hangs 
(from A — 0 through д — 1). In essence, 
Grofman's analysis treats hung juries as 
correct acquittals. 

Although Grofman again regarded q as the 
effective decision rule, his analysis is appro- 
priately treated as an analysis of first ballots 
rather than of final verdicts. This means, as 
we have noted before, that the analysis fails to 
account for those cases in which final verdicts 
are not characterized by majority persuasion 
(approximately 5% of the cases reported in 
Kalven & Zeisel, 1966). Given the limitations 
on one's knowledge about the deliberation 
process, Grofman's treatment of q as an 
effective decision rule is not fatal and has the 
advantage of simplifying his analysis. The 
question of interest, of course, concerns what 
decision rule will maximize the value of 
Equation 16. 

For fixed ns, Grofman determined the values 
of д that maximize the expression for given 
levels of p, Ра, and Pp. Of greatest interest are 
those conditions in which juror accuracy is 

. Steater than one half (in which case q should 
be equal to or greater than a majority m). 
Grofman noted in particular that when 
Ра > 1/2, the optimal decision rule approaches 
or equals g = л as Pp approaches 1. Grofman 
Concluded “that it does not require an infinite 
value of R to justify in normative terms a 
p.11). rule requiring unanimity to convict!" 

In à similar fashion, Grofman constructed 
a Index of juror satisfaction (D) that reflects 

€ subjective ratio of disappointment a juror 
experiences when an apparently innocent 

efendant is convicted compared with the 
E pontment experienced when an appar- 

Y guilty defendant is set free. (Disappoint- 
ment is subjective insofar as a juror can rarely 

Oe With certainty that a defendant is 
Duo dy guilty or innocent and can only 
tiv me the basis of trial evidence.) The rela- 
AN E ond of a Type I error can be 
Por t by P» = D/ (D + 1) and a Type П 

Eis Dy l-Pp- D/(D + 1). Thus, if a 

| "or is nine times more disappointed by an 
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erroneous conviction than by an erroneous 
acquittal, D = 9, Pp = 9/(9 + 1) = .9, and 
1— Pp = 1. 

Using the trade-off ratio for disappointment, 
the weighted probability that a randomly 
selected juror will agree with a verdict (i.e., 
the juror’s perception of the correctness of 
other jurors’ assessments of the evidence) is 
expressed by 


N-1 


x ( if 5, ра = pNP GPo 


л=4—1 


N-1 


an 


h=q-l 


(5, era рта - во) 
а? 


(This expression includes a correction of а 
typographical error in Grofman, 1976.) 

The first term of the expression is the 
weighted probability that a juror will join a 
quorum that will correctly convict, while the 
second term is the weighted probability that 
a juror will join a quorum that will correctly 
acquit. Grofman concluded from his analysis 
that the value of Expression 17 is maximized 
when the decision rule corresponds directly to 
the juror’s trade-off ratio Рр (for a related 
analysis, see Curtis, 1972). Thus, if a juror is 
as disappointed by seeing two guilty defendants 
acquitted as by seeing one innocent defendant 
convicted (Pp = 2), that juror should prefer 
a two-thirds decision rule. Again, of course, 
we caution that this analysis is most appro- 
priate for first ballot distribution and would 
require modification if it were extended to 
final verdicts. 

Both the Gelfand and Solomon and the 
Grofman analyses of prior probabilities of guilt 
and juror accuracy are based on data from a 
wide range of cases. In some of those cases the 
evidence of guilt was probably overwhelming, 
in some cases the evidence of guilt was probably 
fairly weak, and in many cases the evidence 
was probably relatively balanced. Because the 
estimates of the parameters used in the models 
are based on a range of cases, they may not 
be directly applicable to any particular case. 
In general, the results suggest that approxi- 
mately 70% of the defendants brought to trial 
(in the cases examined by Kalven and Zeisel) 
are convictable and that jurors are on the 
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average about 90% accurate in their assess- 
ments of defendants' convictability. This does 
not mean that any particular defendant has a 
70% chance of conviction, nor does it mean 
that one can expect 10% of the jurors to err in 
their judgments of a particular defendant's 
guilt. The Gelfand and Solomon and Grofman 
analyses simply are not appropriate for 
analysis of individual cases. Indeed, it is 
possible that the convictability and juror 
accuracy parameters (assumed to be inde- 
pendent in the two models) are in fact inter- 
related and that juror errors are most likely to 
occur in cases in which the evidence against 
and in favor of a defendant is relatively 
balanced. In cases in which the evidence points 
unequivocally in one direction or another, it 
is probably less likely that jurors will err. 
Existing data on juror behavior are not 
adequate to the task of determining the rela- 
tionship between apparent guilt and jury 
acuity, but the reasonableness of the assump- 
tion of independence should be kept in mind 
when evaluating the parameter estimates 
provided by the models we have examined. 
The aspect of both Gelfand and Solomon's 
and Grofman's analyses that we find most 
disturbing is the implication that there are 
only two distinct types of defendants: defend- 
ants who clearly should not be convicted on 
the weight of the trial evidence (23097 of all 
defendants) and for whom the probability that 
a juror will erroneously vote to convict is ~.1 
and defendants who clearly should be convicted 
on the weight of the trial evidence (~70% of 
all defendants) and for whom the probability 
that a juror will correctly vote to convict is 
=.9 (see sample results reported in Table 3). 
Although such a view may provide a reason- 
able characterization of trial evidence and of 
jurors’ assessment of that evidence, we think 
that this analysis obscures the fact that the 
evidence presented at trials probably varies 
widely in the extent to which it indicates a 
defendant’s guilt or innocence, In some cases 
the evidence may overwhelmingly and un- 
mistakably point to guilt or innocence, and 
with such evidence we would not be surprised 
to see unanimous first ballot verdicts. In other 
instances the evidence may be very close (some 
of it pointing to innocence and some pointing 
to guilt), and in these trials we would not be 
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surprised to find the jury dividing equally for 
conviction and acquittal on the first ballot, 

Basically, we think it misleading to conceive 
of juror accuracy in Gelfand and Solomon's 
terms—juror accuracy is, after all, limited by 
the quality of the evidence presented at trial, 
Optimal juror performance can probably be 
attained only under circumstances such as 
those outlined in the discussion of Saks and 
Ostrom’s (1975) and Walbert’s (1971) binomial 
models: With very large juries, defendants 
should only be convicted when a majority (or 
some other critical proportion) of the jurors 
vote to convict after hearing all the trial 
evidence. We think it more realistic to assume 
that the weight of trial evidence (the extent 
to which the evidence would convince a juror 
of the defendant’s guilt) varies widely across 
trials and that the best indication of the varia- 
bility in trial evidence weight is the variability 
in jurors’ first ballot votes for conviction and 
acquittal. 

Judging from Kalven and Zeisel’s (1966) 
data on first ballot votes (Table 1), it appears 
that the weight of evidence is bimodally 
distributed and that in approximately 70% of 
the cases the evidence points to conviction. 
Because Table 1 provides only summary data 
on the distribution of votes (i.e., juries produce 
unanimous verdicts for conviction in 19% of 
the cases, produce nonunanimous majorities 
for acquittal in 18% of the cases, and produce 
unanimous verdicts for acquittal in 12% of 
the cases), one cannot fix the actual distribu- 
tion of first ballot votes (or the underlying 
distribution of evidentiary weight that pro- 
duces the first ballot distribution of votes), 
Still, one can test some possible distributions 
of evidentiary weight across trials (in whic 
such weights represent the probability that & 
randomly selected juror who has heard the 
evidence from a particular trial will vote t0 
convict) for their fit to the Kalven and Zeis 
first ballot vote distribution. 

For example, the Gelfand and Solomon 
(1975) model that produces the distribution 9 
votes shown in Table 4 assumes a distribution 
of evidentiary weight in which the probability 
that a juror will vote to convict is .88 for 69/0 
of the cases and .12 for 31% of the cases. ThS 
very simple bimodal distribution of evident? 
produces a moderately good fit to the Kalve? 
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and Zeisel data. We have tested several other 
types of evidentiary weight distributions (rela- 
tively flat but skewed, unimodal, and bimodal) 
and have assumed in each instance that the 
weights are distributed in probability intervals 
of 1. We have found that bimodal distributions 
of evidence such as the one in Figure 2 produce 
the best fits to the Kalven and Zeisel data. 
(We caution that this estimation enterprise is 
crude; it is subject to the limitations of the 
data, it is post hoc, and it ignores that the 
distribution of evidence is probably continuous 
rather than discrete. Still, the distribution in 
Figure 2 is psychologically plausible and, as 
we demonstrate, produces a good fit to the 
distribution in Table 1.) 

To explain our method briefly, one can see 


that the distribution of weights in Figure 2, 


implies that in 13% of all cases jurors are 
expected to judge the evidence against a 
defendant as unmistakably indicating that the 
defendant is guilty. In these cases the proba- 
bility that jurors will vote to convict is equal 
to 10—Each of these juries „УШ return 
unanimous verdicts for conviction (the ex- 
pected outcome for a binomial p = 1.0 and 
for N = 12). In 8% of the cases Figure 2 im- 
Plies that all jurors will vote for acquittal 
(0 = .00). Similarly, in 16% of the cases the 
(binomial) probability that a randomly 
Selected juror will vote to convict equals .8. 


З 


a 


a 


PERCENTAGE OF CASES 
5 


о 

0.42.3. 4.5 „бо тв но 
BINOMIAL PROBABILITY 

Figure 2. Hypothetical distribution, across cases, of 


*videntiary Wei uu 
ight for conviction (based on data 
collected by Kalven & Zeisel, 1966). 
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By consulting binomial tables or using the 
binomial expansion for V = 6 and N = 12, 
one can determine the expected distribution of 
first ballot votes for the distribution of ps 
shown in Figure 2. 

Figure 3 shows the expected distributions of 
votes for each of the binomial probabilities for 
12-member juries. The figure also shows the 
cumulative probability of each vote distribu- 
tion (ie., if the evidence weight is distributed 
as shown in Figure 2, then in 12-member juries 
a unanimous verdict for acquittal should occur 
in 11.7% of the cases, with the bulk of these 
verdicts occurring in trials in which the evi- 
dence points unequivocally to innocence). The 
cumulative distribution of first ballot votes in 
the 12-member jury (shown at the top of 
Figure 3) provides a very good fit to the Kalven 
and Zeisel data (11.7%, 18.4%, 3.8%, 46.5%, 
and 19.4% in the model vs. 12%, 18%, 4%, 
47%, and 19% in the respective categories for 
the normative data). 

By making a few simple assumptions, one 
can also assess the distribution of first ballot 
votes for accuracy. For example, if one makes 
the crude assumption that juries should con- 
vict whenever the binomial probability of guilt 
is equal to or greater than .6 (i.e., when 60% ог 
more of all jurors who might hear a case would 
vote to convict), should hang when р = .5, and 
should acquit when р < 4, then one can 
easily determine the number of juries that 
began deliberation with “errorful” first ballot 
distributions (e.g., in Figure 3 when p 2 .6, 
all the juries who have produced less than 
seven votes for conviction have made errors, 
since one has assumed that defendants with 
evidentiary weights of .6 or more should be 
convicted). In Figure 4 we display the propor- 
tions of cases in each category of initial votes, 
ranging from (0, 12) to (12,0), that would 
convict, hang, and acquit if one applied the 
simple rules outlined above. Overall, the 
probability that a jury will begin deliberations 
with a majority erroneously preferring acquit- 
tal is 1.9% in 12-member juries, while the 
probability that a jury will begin deliberations 
with a majority erroneously preferring con- 
viction is .1%. These values can be compared 
with the much lower error rates produced by 
Gelfand and Solomon's model (see Table 4). 
Although the absolute value of the estimates 
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CUMULATIVE PERCENTAGE | 


184 
wr 57 47 3 


PERCENTAGE OF TOTAL VOTES 


о се ЗАМЕ 


NUMBER OF VOTES 
FOR CONVICTION 


Figure 3. Distribution of first ballot vote percentages for c 
tion of a bimodal distribution of evidence such as the one illustrated in Figure 2. (The parameter of the 
plotted subdistributions in Figure 3 corresponds to evidentiary weight in Figure 2. The cumulative 
distribution totals, 18.4 and 46.5, correspond to the tabulation of votes for conviction reported in Kalven 


& Zeisel, 1966, and reproduced in Table 1.) 


produced by the model we have developed in 
this section can be questioned (more complete 
data on the Kalven and Zeisel juries would 
heighten our confidence), we think that this 
model of juror accuracy is superior to Gelfand 
and Solomon's and Grofman's because it makes 
more plausible assumptions about the distribu- 
tion of trial evidence weights. Since the distri- 
bution of first ballot votes produced by the 
model fits the normative data better than 
Gelfand and Solomon's model, we are also 
confident that Gelfand and Solomon's model 
underpredicts first ballot error rates. 
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action across juries, based on the assump- 


Relationship Between Juror and Jury Errors 


Ultimately, of course, we would like to know 
the error rates in jury verdicts (rather than 
just the first ballot votes) ; and, in particular, 
we would like to have some idea of the effects 
that variations in jury size and decision rule 
have on these error rates. The problem of find- 
ing a satisfactory method of relating first | 
ballots to final verdicts is one we have € 
countered several times in our discussion 0 
various mathematical models. Although W* 
have criticized all of the proffered solutions (0 
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| this problem, we think that a decision scheme 


approach such as Davis's (1973) offers the 
most promise. Simply for purposes of illustra- 
tion, we have applied the decision schemes in 
Table 5 to the initial vote distributions in 
Figure 2 to acquire a sense of the impact that 
different decision rules (unanimous vs. two 
thirds) might have on jury accuracy. 

The decision schemes in Table 5 are based 
largely on the results of a computer simulation 
of jury decision making that is discussed later 
in this article (see also Penrod & Hastie, Note 
6). These decision schemes are constructed 
to produce a slight postdeliberation bias toward 
acquittal, such as is observed in the Kalven 
and Zeisel (1966) data (in Table 1, more 
majorities reverse in the direction of acquittal 
than in the direction of conviction). These 
decision schemes are also roughly comparable 
with Gelfand and Solomon's (1975, 1977) 
version of Davis et al.’s (1975) Scheme 3. 

Before discussing the results of this decision 
Scheme analysis, it should be noted that these 
decision Schemes reflect relatively unfavorable 
views of the deliberation process insofar as 
they assume that both correct and errorful 
initial majorities are equally likely to be 
reversed during deliberation. In other words, 
these decision schemes do not assume that 
deliberation serves to correct the errors made 
in first ballot votes. If deliberations did serve a 
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Figure 4. Distribution of percentage of votes for con- 
viction across cases generated by the assumptions of 
the analysis outlined in Figures 2 and 3 and in the text. 
(Solid areas represent juries under conditions with 
evidentiary weights for conviction greater than .50, 
hatched areas represent juries with weights equal to .50, 
and clear areas represent juries with weights less 
than .50.) 


correcting function, then one would expect the 
probability of a reversal in a first-ballot-error 
case to be higher than the probability of a 
reversal in an initially correct case. As one 
will see, our analysis indicates that the selection 
of optimal jury sizes and optimal decision rules 


Decision Schemes 
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p of verdict in unanimity 


(12/12) rule 


of verdict in 8/12 rule 


Votes for : 
conviction Acquit Hang Convict Acquit Hang Convict 
0 1.0 10 
1 1.0 1.0 
2 1.0 1.0 
3 1.0 1.0 
4 .92 07 01 1.0 
5 20 26 04 70 26 04 
6 35 40 25 35 40 25 
T 44 26 60 44 26 .60 
8 .04 07 .89 1.0 
9 1.0 1.0 
10 1.0 1.0 
n 1.0 1.0 
1.0 1.0 


12 у 
Eun — ee ee 


oe. In 12/12 and 8/12 rule, the denominator indicates jury size and the numerator denotes the number of 


Y 
(е needed for a decision. 
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depends on whether deliberation operates to dios 
minimize first ballot errors. =|у5ем 15:5 
The results obtained by applying each of X] sizes а чи 
the decision schemes to the initial vote distri- 54 X 
butions in Figure 2 are summarized in Table 6. on 2) s si со 25 
(The table also summarizes the results for 8 alol SSSR С) 
6-member juries; for complete details on dat ie nee 2 
juries; for complete details on data 5 a Sob 
and methods see Penrod & Hastie, Note 6.) 5 ui BO 
Tf one ignores hung juries, it is clear that th 3 SIS[SSES | оба а 
ignores hung j , it is clear that the 3 6|8§ 55 | 284 g 
12-member juries produce fewer false acquittals = &|8888 = ce ou: 
and false convictions than do the 6-member * 15 3 3 
juries: The smaller jury, regardless of decision 8 зоо | seo Би 
rule, produces 50% more erroneous acquittals 3 418255 a) 5 
(about 5 in 100 compared with about 3 in 100) N 555 З 
and four times as many erroneous convictions R E Sete Ses Y 
(about 6 in 1,000 compared with about 1.4 in 3 Enc Ze BOR 
1,000). These same relationships are also 5 wv eumd 
evident in the probabilities that a defendant E 5 83 ? 
is guilty if convicted (Роџс), innocent if con- = |s|FP es59|32$ % 
victed (Pr/c), innocent if acquitted (Pr,4), E 6 |5 |4995 952. 8 
and guilty if acquitted (Рог). (These proba- 3 e a Te с 
bilities can be compared with the 12-member Ej Slaconl|< 3 M 
probabilities produced by Gelfand & Solomon's zi Saaga | poo 4 
model; see Table 3.) 5 8 SESS gun og 
Somewhat surprisingly, the error rates in 5 < asy 5 
nonunanimous juries are lower than the rates = z 5288 ЕЕ 5 E 
in the corresponding unanimous juries. The 3 s 2685 8 
reason for this is that the decision schemes * 3 z quus BBa | 
used to generate the final distributions of 9 ERED у eae B 
verdicts do not use the corrective factor = 8 Bag 8 
discussed earlier. This point can be illustrated ә E 595% 58 5 E 
by examining the distribution of error cases $ о |2885 | 515 7 
іп 12-member juries who produce eight votes 5 See 5 
for conviction on the first ballot. One can see 8 5 змее | 154 $ 
in Figure 4 that most such juries' votes аге * 9 X ag S z y5 > 
consistent with the evidence, but a few * Pied le ЧС 
“should” have begun deliberation with even 5 Ej 5 Ress 555 8 
splits. If these juries use a unanimous decision Y = PEE 5 
rule, one can see from Table 5 that 4% of the 8 Е БТЕ E 8 8 & 
juries can be expected to reverse the original KE o ў mS i 
majority and produce verdicts for acquittals. X Š о "a 232 Е 
Since the decision scheme does not distinguish um ә E es ao] 88 3 
between error cases and correct cases, the 35 3 mos У 
result is that 4% of the cases that start with ig E m esos |5 53 E: 
errors will be corrected, while 4% of the cases & a MD ZEIT А 8 MER: 
that begin deliberation with majorities cor- È |= m|mses|E5 52 55 
rectly preferring conviction will erroneously 5 5 S m = Sx 8 2 E 58 
acquit. In essence, the decision scheme ulti- 8 с Echo x big 2 5 
mately creates more errors than it corrects. oss Bou 82323 
For juries using an 8/12 decision rule, this ae * TE 5555 55 552 
problem is avoided—none of the original error EAA B Sour 25 EB 


sare corrected, but no new error cases are 
ated. 
i Although we have only limited confidence in 
absolute value of the entries in Table 6 
псе neither the initial distribution of evi- 
ice nor the decision schemes are well- 
wounded empirically), these results suggest 
t one’s ability to distinguish among optimal 
ision rules may ultimately depend on one’s 
ability to specify the extent to which jury 


‘deliberations serve to correct the “sampling 
errors” in the distribution of initial votes. Our 
"guess is that a unanimous decision ultimately 


"operates to minimize jury errors: First, it 
minimizes the probability of an incorrect 
verdict on the first ballot, and second, it 
“Maximizes the opportunities for jurors who 
have correctly assessed the weight of the 
evidence to communicate the grounds for their 
assessments to other jurors, who may then 
persuaded to adopt a more accurate view 
of the evidence. Stated another way, we 
contend that it is unlikely that reversals of 
-majorities are equally likely for initially correct 
"and incorrect juries. Given the small magni- 
—tude of the advantages enjoyed by the non- 
“unanimous juries under the equally likely 
| decision schemes we have used (Table 6 shows 
that the advantages in the critical false- 
Ша] and false-conviction categories are 
_ Very small), it is clear that even a relatively 
“small correcting factor in jury deliberations 
- Would give the edge to the unanimous decision 
тше, As à practical matter, our results indicate 
that it is important to direct future research 
to the question of whether deliberation serves 
to Minimize verdict errors. If there is some- 
~ thing about the deliberation process that does 
Serve to raise the probability that initial errors 
ll be corrected, then the unanimous decision 
tule is probably preferable. у 
То conclude the section on models of juror 
- accuracy we examine a model developed by 
"Nagel and Neef (1975). Nagel and Neef have 
also tackled the question of the relationship 
: ig Jury size, decision rules, and error rates, 
: ап approach that is similar to but less 
E than Grofman’s (1976) approach. 
“ethaps the most important difference be- 
j the two lines of analysis is that Nagel 
ES Neef were concerned with defendants’ 
‘objective or true guilt and innocence rather 
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than the guilt or innocence indicated by the 
evidence presented at trial (what Gelfand and 
Solomon called “convictability”). A second 
important difference is that Nagel and Neef 
made specific assumptions about the values of 
the parameters they used in their model and 
then examined the results obtained with these 
specific parameter values. 

Their analysis proceeded in two stages: 
First, they analyzed individual juror behavior 
(what we regard as first ballot behavior), and 
then they examined individual plus collective 
behavior (what we would consider to be an 
analysis of jury verdicts). Nagel and Neef’s 
analysis can most clearly be understood by 
noting the assumptions about objective guilt 
and innocence and juror accuracy on which 
their analysis is based. First, they assumed 
that the number of truly guilty defendants 
brought to trial is relatively large (95%), while 
the number of truly innocent defendants is 
relatively small (5%). Although we consider 
the grounds for this assumption rather curious 
(it is based on an analogy to the .05 significance 
level used in statistical analysis), we do not 
quarrel with the reasonableness of the assump- 
tion; indeed, we have heard experienced defense 
attorneys make even lower estimates of the 
error rate in indictments. (Of course, since only 
about 10% of all criminal indictments reach 
trial—most defendants plea bargain—it is 
possible that truly innocent defendants are 
overrepresented at the trial stage.) The basic 
implication of Nagel and Neef's assumption is 
that it is relatively uncommon to find instances 
in which evidence sufficient for indictment 
points to the wrong defendant and that 
prosecutors are reasonably accurate in their 
indictments. 

Next, Nagel and Neef assumed that 40% of 
all truly innocent defendants are erroneously 
convicted by juries (ie. 2 in 100 cases yield 
false convictions), compared with a 70% 
conviction rate for truly guilty defendants 
(they made this estimate by reference to the 
Kalven & Zeisel, 1966, data that are presented 
in Table 1). Disregarding deliberation effects 
(a point to which we return), Nagel and Neef 
used the 40% and 70% figures to posit that 
the probability that an individual juror will 
erroneously vote to convict a truly innocent 
defendant is .41/° ог .926, whereas the proba- 
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bility that an individual juror will correctly 
vote to convict a truly guilty defendant is .7!/" 
or .971. Finally, Nagel and Neef assumed they 
could use these individual probabilities in the 
binomial expansion to determine the weighted 
probabilities of false convictions and false 
acquittals for various jury sizes and decision 
rules (fundamentally the method used in 
Grofman's, 1976, analysis). For ease of presen- 
tation, Nagel and Neef reported their results 
in terms of 1,000 cases in which 95% (950) of 
the defendants were truly guilty and 5% (50) 
were truly innocent. The results of their 
analysis (in which false convictions were given 
a weight of 10 compared with a weight of 1 for 
false acquittals) indicate that with binomial 
probabilities of .926 and .971, seven-member 
unanimous juries produce the minimum 
weighted sum of errors and 11/12 and 10/12 
juries produce the lowest weighted sum of 
errors for nonunanimous juries of various sizes. 
Unfortunately, we regard this analysis as 
faulty at several points. First, by taking the 
12th root of the .4 and .7 probabilities, Nagel 
and Neef implicitly adopted the 12-member 
unanimous jury as the absolute standard 
against which all other jury sizes and decision 
rules are to be evaluated. Nagel and Neef 
failed to justify using p'/ as a standard 
individual probability. Second, because they 
used these probabilities in the binomial expan- 
sion to determine the probability that X or 
more jurors (where X is the decision rule 
quorum) in a jury with Y members will vote 
to convict in Y binomial trials, their initial 
analysis produced the curious result that all 
juries failing to attain the required quorum 
in Y binomial trials were treated as acquittals, 
even though they may have been only one 
vote short of the necessary quorum for con- 
viction. As one saw before (e.g., in Walbert's, 
1971, analysis), for any fixed binomial proba- 
bility, any reduction in the number of binomial 
trials (ie., any reduction in the number of 
jurors who constitute a jury) increases the 
probability that all the jurors (or some set 
proportion of the jurors) will agree on an 
outcome. Given this fact, the real problem 
with Nagel and Neef’s analysis is that they 
failed (as did some of the other mathematical 
modelers we have considered) to make an 
adequate distinction between individual juror 


STEVEN PENROD AND REID HASTIE 


accuracy (first ballots) and jury accuracy 
(verdicts). 

It is appropriate to say that Nagel and Neef's 
initial analysis assessed weighted errors in the 
first ballot verdicts produced by juries of vary- 
ing size who used various decision rules. Their 
analysis is comparable with the binomial 
analysis in Figure 1 and Gelfand and Solomon’s 
(1973, 1974, 1975) analysis. What Nagel and 
Neef’s method potentially adds to these other 
analyses is the emphasis on weighting errors in 
first ballot verdicts. With respect to these 
weights, we note that changes in the relative 
weights attached to false acquittals and con- 
victions will affect the optimal jury size and 
decision rules. (For example, if false convictions 
are given a weight of 14 rather than 10, the 
advantage in weighted errors on first ballot 
verdicts shifts to the 12-member jury.) 

Although Nagel and Neef failed to mention 
several of the shortcomings in their model, they 
were aware that the model does a poor job of 
accounting for the effects of group deliberation 
on the accuracy (and even the distribution) of 
jury verdicts. They attempted to overcome 
this difficulty by arguing that the final verdict 
distribution is a product of both individual 
(or independent) factors and collective factors. 
Thus, that 64% of the defendants in the 
Kalven and Zeisel trials (see Table 1) were 
convicted by unanimous juries is taken as an 
index of the collective factors, whereas .64!/? is 
taken as an index of the independent factors. 
The collective factors plus the independent 
factors are presumed to be reflected in the fact 
that after deliberation 67.7% of the jurors in 
the 225 Kalven and Zeisel trials voted to j 
convict (when jurors in hung juries are taken 
into consideration). This 3.7% difference 
(67.7% — 64%) is, according to Nagel and 
Neef, attributable to a weighted combination | 
of individual and collective factors: | 


_ [WCo2)u» + .64 
617 = Pate | 


where W = .13 and n = 12. Nagel and Neef’s 
analysis assumed that this relative weighting 
of independent and collective factors (117 
independent and 89% collective) ргеуа!® 
across all decision rules and jury sizes. 

We are troubled by this conception of the 
deliberation process and the calculations t° 
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Jury Verdict Errors for Objectively Innocent and Objectively Guilty Defendants 
___________________________________-_ 


Jury % of total no. convictions 9% of total no. acquittals ^ Unweighted Weighted 
| decision sum of sum of 
| rule* Correct Incorrect Correct Incorrect errors errors? 
l 12/12 60.60 3.19 1.583 30.07 320.6 619.7 
| 8/12 61.71 3.25 1.577 29.95 332.0 624.5 
i 6/6 59.26 3.12 1.647 31.29 344.1 624.9 
4/6 60.64 3.19 1.644 31.24 344.3 631.4 


‘The denominator of the decision rule indicates jury size, and the numerator indicates the number of 


votes needed for a decision. 
+The weight ratio was 10:1. 
| 

which it gives rise. First, it is not clear what 
| this “combined” model really captures. The 
"relative weight of the independent factor is 
| determined solely by the relative imbalance of 
juror votes in hung juries (13 of the 225 juries 
in the Kalven and Zeisel trials) averaged over 
all 225 juries. (Actually, it is not clear that 
Nagel & Neef’s analysis was based on the 
‘distribution of votes in the 13 hung cases re- 
ported in Table 1; although they implied that 
they used these 13 cases, their footnotes indi- 
| Cate they may have drawn on a different 
Sample of 48 hung juries.) Irrespective of their 
ta Source, it is clear that Nagel and Neef’s 
analysis depends on the level of disagreement 
Ms hung juries—if there were no hung juries 
(or if in all hung juries the proportion of jurors 
favoring guilt were the same as the proportion 
of juries who convicted), there would be no 
independent factor; the combined probability 
- Would be identical to the collective probability, 
m W would equal zero. What Nagel and Neef 
b done is to take a small bias (toward con- 
| ту in the distribution of votes in a very 
Eo. number of cases and argue that this bias 
a cts the individual juror's imperviousness 
I. ied influence. Furthermore, by assuming 
a is independence factor accounts for 11% 
aah decision making in all jury sizes and 
ilit E rules, they have ruled out the possi- 
7 4 at differing jury sizes and decision rules 
3 es. directly to affect the extent of group 
m d on individual jurors (and thereby to 
fu € distribution of verdict errors). In 
» the evidence that jury size can affect 
Pada tates (e.g, Kalven & Zeisel, 1966; 
1 t-Singer & Barton, Note 1) suggests 
€ relative impact of independent factors 


varies with jury size. (Though it is perhaps a 
less telling defect, Nagel and Neef did not 
include the possibility of hung juries being 
produced by their model.) 

In addition to the conceptual problems of 
the combined model, Nagel and Neef's analysis 
of verdicts is also based on a binomial model 
in which the probability of a conviction equals 
the probability that a quorum will be obtained 
in Y binomial trials. The verdicts of non- 
quorum juries are once again treated as 
acquittals, even though the juries may fall 
only one vote short of a quorum (many such 
cases end up being added to the error cases 
for false acquittals). 

Although we have chosen not to report the 
results of Nagel and Neef's verdict analysis 
(for the reasons enumerated above), we do 
applaud their efforts and think that with a 
better formulated verdict model, it would be 
worthwhile to pursue and test their assump- 
tions about objective guilt and innocence and 
the effects that different jury sizes and decision 
rules have on error rates. 

One possible approach might make use of the 
model, developed earlier in this section, that 
assumes the weight of trial evidence is bi- 
modally distributed. In Figure 2 we used the 
binomial expansion to generate a distribution 
of first ballot votes and then applied the 
decision schemes shown in Table 5 to assess 
the error rates in final verdicts. If, in common 
with Nagel and Neef, we are interested in 
testing inferences about the effects of jury size 
and decision rule on the weighted probability 
of false convictions of truly guilty defendants 
and false acquittals of truly guilty defendants, 
we can use the same model and, by making 
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Figure 5. Hypothetical distribution across cases of 
evidentiary weight for conviction, incorporating the 
Nagel and Neef (1975) assumption that the distribution 
of weights is dependent on objective guilt or innocence 
of the defendant. 
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reasonable assumptions about the distribution 
of evidentiary weight against objectively guilty 
and innocent defendants, assess the proba- 
bilities of conviction and acquittal errors for 
objectively guilty and innocent defendants. 

For instance, if we assume (as did Nagel and 
Neef) that 5% of all defendants are truly 
innocent and make the further assumption 
that the evidence against truly guilty and 
innocent defendants is identically distributed, 
we can examine the overall distribution of 
verdicts in Table 6 to assess error rates for 
different jury sizes and decision rules. As the 
results in Table 7 demonstrate, with a weight 
ratio of 10:1, the 12 unanimous juries produced 
the lowest weighted sum of errors. 

A somewhat more complex analysis might 
incorporate Nagel and Neef’s assumption that 
truly innocent defendants are less likely to be 
convicted than truly guilty defendants (.4 
vs. .7). This assumption can be incorporated 
into the model by assuming that the evidence 
against truly innocent defendants is not dis- 
tributed identically to the evidence against 
truly guilty defendants, but is skewed in the 
direction of acquittal. A distribution of 
evidence such as the one in Figure 5 captures 
this notion. This distribution of evidence could 
be used to generate a distribution of verdicts 
that could then be subjected to an analysis for 
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errors (including an analysis of the weighted 
sum of errors). 

An analysis such as the one advanced here 
has the advantage of testing the effects of jury 
size and decision rule variations on error rates 
(given a variety of assumptions about the 
distribution of the weight of evidence against 
objectively guilty and innocent defendants) in 
the context of a model that straightforwardly 
relates jury verdicts to initial votes by indi- 
vidual jurors. 


Summary Comments on Mathematical Models 


As we have noted above, one of the major 
shortcomings of the mathematical models of 
jury decision making is the weakness of their 
assumptions about the relationship between 
first ballots and finaffballots. As one has seen, 
Saks and Ostrom (1975) did not confront the 
problem. Walbert (1971) made the simple 
assumption that verdicts are governed by 
majority persuasion, with initially evenly 
divided juries splitting equally for conviction 
and acquittal, but his model fails to account 
for reversals of initial majorities and juries that 
ultimately hang. Gelfand and Solomon (1973, 
1974, 1975, 1977) gave little consideration to 
the relationship of first ballot distributions (0 
final verdicts, for they were able to estimate 
their parameters simply by examining post hoc, 
aggregate relationships between first ballots 
and final verdicts without making assumptions 
about the intervening processes. А 

Grofman's (1976) and Nagel and Neef's 
(1975) models of juror accuracy also suffer from 
an inability to treat final verdicts, except under 
the simplest of assumptions about the reli 
tionship of first and final ballots. 

(1975) 


Davis (1973) and Davis et al. 
addressed the problem of modeling 500 
processes and employed a post hoc analysis 4 
first ballots and final verdicts to find ti? 
implicit or effective decision rule that best fits 
the aggregate data. Although good aggregat? 
fits are obtained, no one decision scheme hi 
consistently provided the best ft. Further 
more, when nonaggregated analyses of init 
ballots and final verdicts from individual Juri 
have been made, the predictive accuracy р 
the best fitting models has proven unrelia " 
(Davis, Kerr, Stasser, Meek, & Holt, 191 
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Figure 6. Flowchart summary of our model for jury decision making. 


Grofman, 1976; Kerr et al., 1976; Grofman & 
Hamilton, Note 4). 

The binomial model that we have presented 
avoids some of these problems by introducing 
more plausible assumptions about the weight 
of trial evidence and uses a decision scheme 
E to determine the distribution of final 
E ada its characterization of the 

eration process is still rather barren. 
E: M present we conclude that the mathe- 
E models built around the binomial 
tively ( e quite adequate for dealing descrip- 

Ea us. "d a lesser extent, predictively) 

uir n ationship between jury size and 

Eu d ot distributions. However, these 

E es yet adequate for analysis of 

of the act c M we have better knowledge 
Esos we eliberation process (theoreticians 
Bun universally lament the paucity of 
E. data) and this knowledge has been 
E EM into the mathematical models, 
Sbeificari € cautious about accepting existing 

E lons or estimates of the following 
UM i the accuracy of juror assessments 
of Madea oo the prior probabilities 


convictability, th f 
Jury Verdicts, eee oe, 


decision rules that maximize 


juror satisfaction and performance, and the 
implications of changes in jury size. 

We do not mean to imply that these parame- 
ters or even the deliberation processes cannot 
be mathematically modeled, for we are 
confident that with better and more extensive 
data, adequate mathematical models can be 
specified that will provide reliable estimates of 
the parameters of interest. 

In the remainder of this article we sum- 
marize the results of our efforts to develop a 
multiparameter computer model of the delib- 
eration process (the model is presented in detail 
in Penrod & Hastie, Note 6) that produces 
output that fits empirical data at a number of 
critical points (including, but not limited to, 
reversal and hanging rates) One major 
advantage of the computer modeling approach 
is that it allows analysis of the dynamic aspects 
of the jury decision-making process (ie., the 
computer model can easily represent the 
process by which a jury moves from an initial 
vote distribution to a final verdict). Further- 
more, the simulation method, although com- 
patible with mathematical models, is more 
flexible in its ability to represent complex 
hypotheses about juror and jury behavior 
(e.g., Abelson, 1968). 
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The model summarized here is somewhat 
broader in scope than the mathematical models 
we have examined, but it addresses similar 
issues and rests on similar assumptions about 
juror behavior. The model is named DICE 
after the Greek goddess frequently depicted 
wearing a blindfold and holding the scales of 
justice in one hand and a sword in the other. 
It rests on the same assumptions made by 
Walbert (1971) and Saks and Ostrom (1975) : 
that at the conclusion of a trial a certain pro- 
portion of the jury pool (consisting of those 
jurors who are not excluded by the voir dire) 
will be prepared to vote for guilt (р) or inno- 
сепсе (1— р) (ог in a civil case, for the 
plaintiff [p] or the defendant [1 — 2]). Thus, 
the probability that a randomly selected juror 
will vote to convict is фр. 

The model represents the deliberation pro- 
cess in a form that makes it possible to deter- 
mine the probability that a randomly selected 
jury of size п, drawn from a pool in which a 
specified proportion of the jurors will vote to 
convict (p), and using any specified decision 
rule question (g) will produce a conviction, an 
acquittal, or will hang. Furthermore, the model 
represents the deliberation process in such a 
way that it is possible to determine the verdict 
probabilities for any potential first ballot 
alignment of votes. In the model decision 
making is largely characterized by majority 
persuasion, but in a few cases initial majorities 
fail to prevail in the deliberations and are 
either reversed (persuaded by the minority) or 
do not reach a quorum and hang. Similarly 
(depending on the case), juries who initially 
divide evenly sometimes reach verdicts and 
sometimes hang. Hung juries result when 
juries fail to attain quorums after extended 
deliberation. Figure 6 is a simplified representa- 
tion of the deliberation process embodied in 
the model, but it does capture the basic 
structure of DICE. 


Parameters 


DICE is based on six major parameters: 

1. Jury size: Although DICE can operate 
with any jury size, simulations have concen- 
trated on 6- and 12-member juries. 
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any decision rule, ranging from majority to 
unanimity. For example, in modeling the results 
from a major study by Padawer-Singer and 
Barton (Note 1), the simulations used the 
two decision rules employed in that study; 
unanimity (12/12 and 6/6) and five sixths 
(5/6 and 10/12). 

3. Binomial probability for guilty votes: 
The initial assignment of votes to simulated 
individual jurors in DICE is accomplished by | 
establishing a probability that a randomly 
selected juror will vote to convict on the first 
ballot of the jury simulation. This parameter 
paralles the Walbert (1971) and Saks and 
Ostrom (1975) conviction probability parame: 
ter. An initial binomial probability of 47 
produces a distribution of first ballot votes 
nearly identical to the distribution produced 
by the jurors in the Padawer-Singer and Barton 
study. 

By using a random number generator and 
the binomial value, all the jurors in a simula 
tion are assigned an initial verdict preferenct 
(Step 1). 

4. Transition probability function: Several 
methods of modeling the persuasion/delibera: 
tion process have already been noted; Walber 
(1971) assumed simple majority persuasion 
and Davis (1973) offered a wide range of 
decision schemes. In contrast with these 
models, which make no attempt to model ог 
explain the deliberation processes that occut 
between the first and last ballots, DICE 
follows an alternative approach proposed by 
Rothschild, Klevorick, and McNeil (Note 7) 
They have suggested that the deliberation 
process can be modeled as a continuous-timó 
birth-and-death Markov process in which the 
probability that the number of votes for 
conviction (or acquittal) will increase by om 
from Time 1 to Time 2 is equal to the propor 
tion of jurors who voted to convict (or acquit 
at Time 1. A similar approach has been adopte 
by Stasser and Davis (1977) and Davis (1918) 
Experimentation with various transition func; 
tions has shown that a function exhibiting | 
group momentum effect best fits the available 
empirical data (Penrod & Hastie, Note 9) 
The transition functions used in DICE 97 
shown in Figure 7. Curve A in Figure 7 sig 
the probability that a coalition of any sn 


2. Decision rule: DICE can operate with | 
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(from 0-12 in a 12-member jury) will remain 
l intact (i.e., none of the coalition members will 
change their verdict preference) from one 
ballot (or time period) to the next. Curve B 
displays the corresponding probability that 
| individual jurors will not change their verdict 
preference—the probability values are the 7th 
roots of the group probabilities. Briefly, the 
"group transition function (Curve A) captures 
| the following phenomenon: In juries that are 
roughly equal in size (in which the majority 
| has no more than eight adherents), the 
“majority’s persuasive advantage is slight; but 
as the majority coalition grows in size the 
‘probability that it will continue to grow 
increases exponentially, and single holdouts 
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inet MS two functions summarize vote-changing 
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have a very low probability of not joining the 
majority (the probability of not changing on 
any one ballot is .183 for the lone holdout). 
Readers familiar with early conformity studies 
(e.g., Asch, 1951; Sherif, 1935) will probably 
note that these characteristics of the transition 
function are consistent with conformity re- 
search findings. More direct empirical con- 
firmation of the group size effect can be found 
in research by Godwin and Restle (1974). 

5. Individual differences: Padawer-Singer 
and Barton (Note 1) reported that jurors in 
juries who ultimately reached a verdict were 
three times more likely to change their verdict 
preference during deliberation than jurors in 
juries who hung. This result and other similar 
results (e.g., the sex difference in change rates 
in the Kerr et al., 1976, study and the Davis, 
Kerr, Stasser, Meek, & Holt, 1977, rape case 
studies) indicate that some jurors are more 
susceptible to group persuasion than are other 
jurors. In DICE all jurors are assigned indi- 
vidual persuasion resistance scores that reflect 
these individual differences. 

6. Maximum number of ballots: Not all 
juries reach a verdict; often members feel that 
they can no longer make progress toward a 
verdict because individual jurors are en- 
trenched in their preferred verdict. To simulate 
hung juries DICE sets a limit on the number of 
ballots that a jury may take. 


Evaluative Criteria 


The Padawer-Singer and Barton (Note 1) 
study provides the widest range of data avail- 
able on jury deliberations and specifically 
allows us to test DICE’s operation with four 
different criteria: (a) the distribution of jury 
verdicts (convictions, acquittals, and hung 
juries); (b) the number of reversals of initial 
majorities, that is, juries in which a majority 
of members initially prefer conviction (ac- 
quittal) but the jury ultimately renders a 
verdict for acquittal (conviction); (c) the 
difference in the rate at which jurors in verdict- 
reaching juries change their verdict preferences 
and the rate at which jurors in juries who 
ultimately hang change their verdict prefer- 
ences; and (d) the mean and variance in 
deliberation times for juries who convict, 
acquit, and hang. 
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Table 8 
Summary of Simulation Results 
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mode e ss oo 


12-member jury 


6-member jury 


Unanimous Nonunanimous Unanimous Nonunanimous | 
Type of jury PSB DICE PSB DICE PSB DICE PSB DICE 
Verdict 
Guilty 8 7.53 9 7.53 8 7.71 9 1.19 
Innocent 10 10.70 9 10.70 11 9.99 14 10.13 
Hung 5 4.76 5 4.76 4 5.3 0 5.08 
Reversal of majority 
All juries 2 8 4 8 1 3 2 3 
% of jurors who changed votes 
Verdict juries 34.0 38.4 26.1 23.0 24.6 31.6 28.3 14,9 
Hung juries 11.7 15.1 84 15.1 8.3 9.6 — 9.6 
Deliberation time 
Verdict juries 169.3 168.0 1777 134.6 1264 104.9 119.0 76.9 
Hung juries 327.1 322.6 286.] 322.6 2530 2512 — 251.2 
All juries 203.6 200.9 2001.8 173.6 153.3 138.6 1199 1154 


Note. PSB refers to Padawer-Singer and Barton's (Note 1) study, and DICE refers to our compute 


model of jury decision making. 
Simulation Resulls 


Table 8 summarizes the results of simulations 
of Padawer-Singer and Barton’s 6- and 12- 
member juries, who used unanimous and five- 
sixths decision rules, and compares results 
produced by the actual juries and those ob- 
tained from the DICE simulations. Briefly, 
the distribution of verdicts is quite satisfactory 
—the poorest fit is in the 6-member, non- 
unanimous condition, in which none of the 
actual juries hung. Without exception the 
simulation juries produced lower reversal rates 
than the actual juries. This result suggests 
that the transition probabilities for juries of 
nearly equal sizes are probably flatter than is 
reflected in Curve B of Figure 7. 

The rate of vote changing in simulated juries 
is very close to the empirical rate for unanimous 
juries, but is too low for nonunanimous juries 
who reach a verdict. Similarly, the average 
deliberation times for simulated and actual 
juries are very close for unanimous juries, but 
the simulated juries produce lower deliberation 
times in the nonunanimous conditions. At 
present, DICE renders a verdict as soon as 


the requisite quorum of votes is reached. 
However, Saks (1977) has found that nom 
unanimous juries frequently continue дей 
erating even after they have attained sufficient 
votes to render a verdict. In fact, between 20/0 
and 31% of the total deliberation time for 
nonunanimous juries in Sak's study wa 
accounted for by postquorum deliberatio 
Furthermore, jurors often changed their ver 
dict preferences during this postquorum m 
terval (additional jurors joined in the prefer 
verdict). If simulated nonunanimous jute 
were allowed to continue deliberating f% 
similar intervals (i.e., an increase of betweel 
25% and 40% in elapsed time), the average 
deliberation times and average rates of vol 
changing in these simulated juries WO 


approach those found in the nonunanimol 
Padawer-Singer and Barton juries. 

The results obtained with DICE sugges! 
that the simulation method may provide SU” 
stantial insights into the deliberation proc 
by revealing the relationship between grou 
size and persuasion and by providing а те 
for assessing the relative impact of indiv! 
differences on the persuasion process. 
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The simulation approach can serve as a 
useful complement to the mathematical models 
discussed earlier. DICE provides an alternative 
method for exploring the implicit decision rules 
that have been studied by Davis and his 
colleagues and has the advantage of making 
explicit and testable assumptions about events 
that occur between the first ballot and final 
verdicts. The aspect of the deliberation process 
that most closely approximates Davis's im- 

` plicit rule is embodied in the transition func- 
tion and can be summarized by saying that a 
majority’s persuasiveness increases exponen- 
tially as the size of the majority increases. This 
general rule is likely to prevail across juries 
and cases, but the model will no doubt require 
modification when a case produces unusual 
individual reactions (e.g., DICE has produced 
excellent fits to the Davis, Kerr, Stasser, Meek, 
& Holt, 1977, rape data when the differential 
tates of vote changing by males and females 
are incorporated into DICE's individual dif- 
ferences parameter (see Penrod & Hastie, 
Note 6), 

DICE also complements the binomial models 
used by Saks and Ostrom (1975) and Walbert 
(1971), insofar as the initial distribution of 
votes in DICE is produced by a binomial 
function. Furthermore, DICE avoids Walbert’s 
Simplistic assumptions about majority per- 
Suasion effects and uses a number of criteria to 
evaluate the assumptions about persuasion 
that are implicit in the transition function. 

Е inally, DICE promises to provide an 
empirical basis for the mathematical analyses 
of juror error rates and juror satisfaction 
Proposed by Grofman (1976) and Nagel and 
Neef (1975). The principal defect of the 
“isting analyses is that they are unable to 
make a Satisfactory transition from first ballots 
pa verdicts, DICE makes the relationship 
à ias initial votes and final verdicts quite 
E "cit and therefore provides a basis for 

ending the existing mathematical models. 
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job alternatives, insufficient multivariate 


model of the turnover process. 


Employee withdrawal, in the form of turn- 
over, has sustained the interest of personnel 
Tesearchers, behavioral scientists, and man- 
agement practitioners. At the macro level, 
economists and personnel researchers have 
demonstrated the relationship between. turn- 
Over rates and the aggregate level of eco- 
nomic activity, employment levels, and va- 
сапсу levels (see, e.g., Armknecht & Early, 
1972; Forrest, Cummings, & Johnson, 1977; 
Price, 1977; Woodward, 1975-1976). At the 
Micro level, behavioral research has estab- 
lished a consistent, although generally weak, 
Correlation between job dissatisfaction and 
Brevet (Brayfield & Crockett, 1955; Locke, 

; Porter & Steers, 1973; Vroom, 1964; 
*rzberg, Mausner, Peterson, & Capwell, 
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Review and Conceptual Analysis of the 
Employee Turnover Process 


W. H. Mobley, R. W. Griffeth, H. H. Hand, and B. M. Meglino 
Center for Management and Organizational Research 
University of South Carolina 


Research on employee turnover since the Porter and Steers analysis of the 
literature reveals that age, tenure, overall satisfaction, job content, intentions 
to remain on the job, and commitment are consistently and negatively related 
to turnover. Generally, however, less than 20% of the variance in turnover is 


model, failure to consider available 
research, and infrequent longitudinal 


studies are identified as factors precluding a better understanding of the psy- 
chology of the employee turnover process. A conceptual model is presented that 
suggests a need to distinguish between satisfaction (present oriented) and 
attraction/expected utility (future oriented) for both the present role and 
alternative roles, a need to consider nonwork values and nonwork consequences 
of turnover behavior as well as contractual constraints, and a potential mecha- 
nism for integrating aggregate-level research findings into an individual-level 


Note 1). While the economic and job dis- 
satisfaction contributions to turnover are 
well established, they are conceptually sim- 
plistic and empirically deficient bases for un- 
derstanding the employee turnover process. 
Recently, a number of authors (Forrest et 
al, 1977; Locke, 1976; Mobley, 1977; 
Porter & Steers, 1973; Price, 1977) have 
advocated abandoning further replication of 
bivariate correlates of turnover, particularly 
job dissatisfaction, in favor of well-developed 
conceptual models of the turnover process. 
Such a model is one objective of this article. 
Employee turnover is a behavior of interest 
to many disciplines and is subject to analysis 
and discussion at many levels of discourse. 
The approach taken in this article is basically 
psychological and rests on the belief that 
turnover is an individual choice behavior. 
Thus, the individual is the primary unit of 
analysis. Selecting the individual as the unit 
of analysis does not mean that turnover re- 
search at the unit, organizational, or other 
aggregate level is not of value and interest. 
However, to conclude that such studies 
clarify the individual turnover decision pro- 
cess may be tantamount to what Robinson 
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Relation to turnover 
individuals with higher need for achievement had 
higher turnover in nonentrepreneur groups 


lower turnover rate than other occupational groups; 
had higher turnover 


r = —.12**; employees who grew up near the factory 


Entrepreneurs had higher need for achievement and 
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1.033 


Population 


Entrepreneurs 
Middle managers 


Engineers 
Accountants 


Japanese electrical company employees 


New Zealand 


(continued) 
Factor 
Marsh & Mannari (1977) 


Hines (1973) 


Distance migrated 


Table 1 
Personality 
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(1950) has termed the ecological fallacy. 
For example, the relationship between aggre- 
gate unemployment levels and turnover rates, 
although well established (see, e.g., Arm- 
knecht & Early, 1972; Price, 1977; Wood- 
ward, 1975-1976) adds little to understand- 
ing individual turnover decisions. A linking 
mechanism is needed that considers the in- 
dividuals's perception and evaluation of 
available alternatives relative to the present 
position. 

At the individual level, satisfaction is the 
most frequently studied psychological vari- 
able thought to be related to turnover. How- 
ever, the satisfaction-turnover relationship, 
although consistent, usually accounts for less 
than 16% of the variance in turnover (Locke, 
1976; Porter & Steers, 1973). It is apparent 
that models of the employee turnover process 
must move beyond satisfaction as the sole 
explanatory variable. 

Recently, the constructs of organizational 
commitment (Porter, Crampon, & Smith, 
1976; Porter, Steers, Mowday, & Boulian, 
1974; Steers, 1977), organizational attach- 
ment (Koch & Steers, 1978), role attach- 
ment (Graen, 1976; Graen & Ginsburgh, 
1977), and behavioral intentions (Kraut, 
1975; Mobley, 1977; Newman, 1974) have 
been offered as explanatory concepts in the 
turnover process. However, the conceptual 
and empirical identity of these concepts and 
their interrelationships have not always been 
clear. An additional objective of the present 
article is to attempt to clarify and integrate 
these concepts into a general model of the 
individual employee turnover process. 

A third objective of this article is to up- 
date earlier reviews of the literature. The 
last major review of turnover from the in- 
dividual perspective was that of Porter and 
Steers (1973). For somewhat more limited 
reviews of certain aspects of turnover, see 
Goodman, Salipante, and Paransky (1973) 
on the hardcore unemployed and retention 
and Pettman’s (1973) partial review of the 
March and Simon (1958) model. More re- 
cently, Price (1977), a sociologist, has pub- 
lished a significant book that seeks to codify 
the turnover literature from a variety of dis- 
ciplines and cultures. The Price work con- 
tains a number of references generally not 


ns 


1,033 


Japanese electrical company employees 


Marsh & Mannari (1977) 


Note. ISR = Institute for Social Research. 


Number of previous jobs 
*p < .05. **р < Qt. 
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included in the psychological and manage- 
ment turnover literature cited in the United 
States; however, it does not deal with post- 
1974 research and is incomplete in its cover- 
age of the psychological and management 
literature on employee turnover. Forrest et al. 
(1977) also recently presented a partial re- 
view of the turnover literature. However, the 
latter review, which deals with a broader 
spectrum of organizational participation be- 
haviors, contains no post-1973 research and 
has a conceptual rather than an empirical 
emphasis. The Forrest et al. model is dis- 
cussed in a later section of this article. 

In summary, the major objectives of this 
article are (a) to update the last major re- 
views and analyses of the turnover literature, 
(b) to attempt to clarify the distinctions 
among various constructs that have recently 
been suggested as explanatory variables in 
the turnover process, (c) to develop a con- 
ceptual model of the individual-level em- 
ployee turnover process that is consistent 
with the research literature, and (d) to sug- 
gest areas of further research. 

This article focuses on voluntary, that is, 
self-initiated, turnover rather than on or- 
ganization-initiated terminations. This dis- 
tinction, discussed in a subsequent section, is 
not always made clear in specific research 
studies. Additionally, this article does not 
deal with absenteeism. Whether absenteeism 
is best thought of as having no consistent 
relationship to turnover (March & Simon, 
1958), as a precursor of turnover (Herzberg 
et al, Note 1), or as an alternative form 
of withdrawal behavior (Hill & Trist, 1955; 
Rice & Trist, 1952) is an important research 
question (Burke & Wilcox, 1972), but is 
beyond the scope of the present article. 


Update of Turnover Analyses and Reviews 


The last major reviews of the turnover 
literature were by Porter and Steers (1973) 
and Price (1977). This section summarizes 
the recent research not included in these 
reviews and offers the conclusions of the 
authors of this article. A subsequent section 
summarizes and integrates the results of this 
and the two previous reviews. Although no 
taxonomic schema is entirely satisfactory, 
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the research summary is divided into the 
following sections: (a) individual demo- 
graphic and personal variables, (b) overall 
satisfaction, (c) organizational and work 
environment factors, (d) job content fac- 
tors, (e) external environment factors, (f) 
occupational groupings, (g) recently devel- 
oped constructs, and (h) multivariate stud- 
ies. Most studies reviewed take a bivariate 
approach to turnover, but this emphasis is 
reflective of the current literature rather than 
of the present authors' belief in the relative 
merit of this approach. 


Individual Demographic and 
Personal Factors 


Included in this category are age, tenure, 
sex, family responsibilities, education, per- 
sonality, other personal considerations, and 
weighted application blanks. Table 1 sum- 
marizes recent research on these variables, 
which were not included in the Porter and 
Steers or Price reviews. 

Age. Recent research, with the exception 
of Hellriegel and White (1973) who reported 
no differences, indicates a negative relation- 
ship between age and turnover. However, the 
amount of variance explained is less than 
7%. One should note that since age is cor- 
related with many other variables, it alone 
contributes little to the understanding of 
turnover behavior. As is noted later, а con 
ceptual model and multivariate studies are 
required to adequately comprehend the psy” 
chology of the turnover process. This obi 
servation also applies to each variable dis: 
cussed below. l 

Tenure. Three recent studies, cited M 
Table 1, showed a negative relationship b 
tween tenure and turnover. Mangione (Nott 
2) concluded on the basis of a multivariat 
study (see Table 12 for a summary of multi 
variate studies) that length of service is On 
of the best single predictors of turnover. , 

Sex. Of the two studies relating an 1 
dividual’s sex to turnover, Marsh and 
nari (1977) observed that female Japar 
manufacturing employees had higher turn 
over than males. Mangione (Note 2) fou" 
no relationship. 
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Family responsibilities. "Three of the four 
studies summarized in Table 1 indicate that 
family responsibility, including marital sta- 
tus, is associated with decreased turnover. 

Education. Of the recent studies dealing 
with education, one found that female credit 
union employees with higher education had 
lower tenure (Federico, Federico, & Lund- 
quist, 1976), whereas Mangione (Note 2) 
and Hellriegel and White (1973) discovered 
no differences. Lack of variance in education 
in studies such as Hellriegel and White's, 
which used certified public accountants, pre- 
cludes adequate evaluation of the role of 
education. 

Weighted application blanks. Weighted 
application blanks use a procedure for weight- 

` ing information on an employment applica- 
tion form so as to predict some aspect of job 
performance, including turnover. Schwab and 
Oliver (1974) have raised serious questions 
regarding the utility of weighted application 
blanks for predicting turnover. In four sam- 
ples, they found that validities shrunk below 
Statistical significance upon cross-validation. 
However, in two recent studies Cascio (1976) 
and Lee and Booth (1974) reported signifi- 
Cant relationships that were cross-validated. 
The utility of weighted application blanks 
for employee selection continues to require 

_ Situation-specific validation (and regular 
cross-validation), and alone they offer little 
Contribution to understanding the psychology 
of the turnover process. 

Other personal variables. Table 1 cites 
other studies that dealt with personality, dis- 

. tance migrated, and number of previous jobs. 
€cause the number of studies is small, no 
generalizations are possible. 


Overall Job Satisfaction and Turnover 


Studies involving overall job satisfaction 
. 41е summarized in Table 2. With one ex- 
“tion (Koch & Steers, 1978), these studies 
| Indicate à negative relationship between over- 
n Satisfaction and turnover. It is important 
? note, however, that the amount of variance 
{counted for is consistently less than 14%. 
ES în subsequent sections, when satis- 
Mile 1s included in multiple regressions 

Variables such as intentions and com- 


Studies of Relation of Overall Job Statisfaction to Turnover 


Table 2 


Relation to turnover 


Population 


Study 


Questionnaire 
r = —.37** for overall satisfaction 


r = —.31** on Minnesota Satisfaction 
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in second sample 


r= —.27* 
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mitment, its effect on turnover may become 
nonsignificant (Marsh & Mannari, 1977; 
Mobley, Horner, & Hollingsworth, 1978). 


Organizational and Work 
Environment Factors 


Pay and promotion. Research dealing 
with pay and promotion is summarized in 
Table 3. Federico et al. (1976) found that 
higher salary was associated with longer 
tenure, whereas higher salary and the differ- 
ence between expected and actual salary were 
associated with shorter tenure. Mangione 
(Note 2) found a significant negative corre- 
lation between pay satisfaction and turnover. 
Hellriegel and White (1973) discovered that 
“Jeavers” had more negative attitudes toward 
pay than “stayers” and also reported signifi- 
cant increases in pay on their new jobs. Evi- 
dence from five other recent studies suggests 
a lack of relationship between pay satisfac- 
tion and turnover. 

Also evident in Table 3 is the general lack 
of relationship between satisfaction with pro- 
motion and turnover, although Hellriegel and 
White did find that leavers had more nega- 
tive attitudes toward promotion than stayers. 
Marsh and Mannari (1977) reported a cor- 
relation of —.22 between perceived chances 
of promotion and turnover. 

Supervision. Table 4 summarizes recent 
studies relating satisfaction with supervision 
to turnover. In four of the studies a non- 
significant relationship between satisfaction 
with supervision and turnover was found. 
Hellriegel and White (1973) and Ilgen and 
Dugoni (Note 3) found a significant nega- 
tive relationship. The study by Graen and 
Ginsburgh (1977) is of particular interest. 
Here, leadership was significantly associated 
with turnover. The leadership variable, how- 
ever, was not satisfaction with supervision 
but specific aspects of the leader-member 
exchange. Contrasted with the conclusions of 
recent studies using satisfaction with super- 

vision as the independent variable, the Graen 
and Ginsburgh results suggest the need for 
more detailed study of the leader-member 
exchange rather than reliance on generalized 
supervision affect measures. 


Table 5 


Teamwork, and Satisfaction with Co-Workers 


Studies of Relations Among Group Cohesiveness, 


Relation to turnover 


Population 


Factor 


Effectiveness, and Cohesiveness 


Co-Workers, Teamwork, Team 
Hellriegel & White (1973) 


349 “Generally” more negative for turnovers (few significance 


Certified public accountants 


tests reported) 


117 ms (co-workers) 
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r — —.21* (co-workers) 


77 
911 


agency employees 


Nonmanagement entry-level public 
Salesmen 


Retail clerks 


Ilgen & Dugoni (Note 3) 
Koch & Steers (1978) 


ns (teamwork) 
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Table 7 (continued) 


Relation to turnover 


Population 


Factor 


Significant main effect (p < .002) 


89 


University service employees 


Graen & Ginsburgh (1977) 


with resignations 


ns 


117 


Retail clerks 


Ilgen & Dugoni (Note 3) 
Self-Perception of task-relevant abilities 


Satisfaction with duties and policies 


r = —.27** (sum of 6 measures) ; higher turn- 


Auto assemblers 123 


Ekpo-Ufot (1976) 


over associated with lower self-perceived 


abilities 


ns (self-evaluation of coping with critical 


17 


1 


Retail clerks 


Ilgen & Dugoni (Note 3) 


Coping behavior 


incidents) 
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Perceived performance 


r = —.14; poorer performers had higher 


1,033 


Japanese electrical company employees 


Marsh & Mannari (1977) 


turnover 


*p < 05. ** p < .01. 


Overall, recent studies offer moderate sup- 
port for the negative relationship between 
satisfaction with supervision and turnover. 
However, the number of studies finding no 
significant relationship between these vari- 
ables indicates the need to more closely ex- 
amine the nature of leadership measures, to 
conduct more microanalyses of the leader- 
member exchange, and to assess the con- 
tribution of supervision in multivariate de- 
signs that consider other salient variables. 

Peer group relations. Research on peer 
relations and turnover is shown in Table 5. 
In seven of the nine studies, no significant 
results were reported. Koch and Steers (1978) 
found a significant correlation between satis- 
faction with co-workers and turnover, but 
only 4% of the variance was explained. The 
studies summarized in Table 5 do not sup- 
port the generalization of a strong relation- 
ship between group relations and turnover. 
Individual differences in variables such as 
need for affiliation, the role of other vari- 
ables, for example, required task interaction 
and external job alternatives and the method 
of measuring group relations contribute to 
the difficulty in explicating these findings. 

Other variables. Table 6 describes recent 
studies dealing with a variety of other or- 
ganizational and work environment factors. 
Both Hellriegel and White (1973) and 
Marsh and Mannari (1977) found a negative 
relationship between preceived status and 
turnover. This status may have come, in part, 
from the work role or from organizational 
affiliation. Knowledge of organizational pro- 
cedures and perceptions of control processes 
were each shown to be negatively related to 
turnover. In three separate studies, role pres- 
sures, climate, and satisfaction with the com- 
pany were not significantly related to turn- 
over. Ilgen and Dugoni (Note 3) found a 
significant negative correlation between satis- 
faction with hours of work and turnover 
among retail clerks. Mangione (Note 2) dis- 
covered a significant negative correlation be- 
tween resource adequacy and turnover. 


Job Content Factors 


Recently job content has become one of 
the more active areas of industrial-organiza- 
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tional research. A number of studies in this 
area have employed turnover as a criterion. 
Inspection of the recent studies shown in 
Table 7 indicates that job content factors 
are significantly related to turnover. Satis- 
faction with work itself exhibits a uniform 
negative correlation with turnover, although 
the amount of variance explained is con- 
sistently less than 16%. 

Additional studies indicate that the per- 
ceived intrinsic value of work, intrinsic mo- 
tivation, and intrinsic satisfaction are all sig- 
nificantly and negatively related to turnover. 
Graen, Orris, and Johnson (1973) and Graen 
and Ginsburgh (1977) demonstrated that 
role orientation, defined as the perceived rele- 
vance of the job for the worker’s career, was 
significantly related to turnover. 

In a particularly interesting study, because 
it took a somewhat different approach, Ekpo- 
Ufot (1976) found that self-perception of 
task-relevant abilities was significantly and 
negatively associated with auto assembler 
turnover. z 


= —.18**; 
ative signifi- 
415595 


Relation to turnover 


vacancies, R? = .66; unemployment, vacancy, and a 


seasonal dummy, R? = .83 
comparable job moderated the correlation between 


attitude and turnover (office workers, 7 
managers, r — —.11*) 
cantly correlated with intention to quit (r 


In both samples, the perceived expectancy of finding a 
Expectancy of finding an acceptable altern: 


Local labor market: unemployment, R? = .34; unfilled 


not reported 
222 
354 
203 


External Environment 


The probable role of the availability of 
alternative jobs in employee turnover has 
long been recognized; see, for example, 
March and Simon (1958). Economists and 
sociologists have documented the aggregate- 
level relationship between economic indica- 
tors such as employment levels or job va- 
cancy rates and turnover rates (Price, 1977). 
However, research on individual-level turn- 
over has infrequently assessed perceived 
alternatives (Forrest et al, 1977; Locke, 
1976). Conceptually, the perception and eval- 
uation of alternatives seems to be a crucial 
variable in the individual turnover process. 
Empirically, assessment of the relationship 
between turnover and personal, organiza- 
tional, job content, or other variables is in- 
exorably bound to consideration of the per- 
ception and evaluation of alternatives. 

Table 8 summarizes the limited amount of 
recent research dealing with alternatives. 
Woodward (1975-1976) reaffitmed the ag- 
gregate-level negative relationship between 
unemployment and turnover and the positive 


Population 
British chemical company 
Office workers 
Managers 
Hospital employees 


& Graen (1974) 


Factor 
Mobley, Horner, & Hollingsworth (1978) 


Woodward (1975-1976) 


Studies of the Relation Between External Alternatives and Turnover 
Dansereau, Cashman, 


Expectancy of finding alternative 


Economic conditions 


Table 8 


intention to search correlated significantly with turn- 


over (r — .29**) 


Sep «05. ** рускі 


< 


K 


\ 


- Braphic or 
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relationship between unfilled vacancies and 
turnover rates. At the individual level, Dan- 
sereau, Cashman, and Graen (1974) found 
that the expectancy of finding an alternative 
job moderated the correlations between atti- 
tude and turnover. Mobley et al, (1978) 
found that expectancy of finding an accept- 
able alternative position was significantly and 
positively related to intention to quit but 
not to actual quitting, although intention to 
quit was significantly and positively related 
to turnover. It is evident that much addi- 
tional research is required to explicate the 
role of perception and evaluation of alterna- 
tives in the individual turnover process. 


Occupational Groupings 


Price (1977) reviewed research on occu- 
pational characteristics and found moderate 
Support for the proposition that unskilled 
blue-collar workers have higher turnover than 
White-collar workers. He found only weak 
Support for the propositions that nonman- 
agers have higher turnover than managers, 
that nongovernment employees have higher 
turnover than government employees, and 
that higher professionalism is associated with 
higher turnover. 

_ Since most individual-level turnover stud- 
15 are carried out within occupational group- 


| ings, the Present review adds little to Price’s 


analysis. It is apparent, however, that any 
complete model of individual turnover be- 
havior should be able to account for differ- 
ences in turnover among occupational group- 
ngs. The perception and evaluation of al- 
tetnatives is one obvious link between the 
two levels of analysis. A second is exempli- 
fled in the research of Herman and Hulin 
(1972) ang Herman, Dunham, and Hulin 
E), Which demonstrated that organiza- 
onal variables such as position level may 
* better predictors of behavior than demo- 
personality variables. The frame 
Ee provided by position level may 
Hons ce values, perceptions, and expecta- 
bu thus linking organizational variables 
individual behavior. 


of 
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Recently Explored Variables and Processes 


Since the Porter and Steers (1973) review, 
interest has developed in a variety of addi- 
tional variables, constructs, and processes, 
including behavioral intentions, organiza- 
tional commitment, realistic expectations, 
and the centrality of work values. 

Behavioral intentions. The Fishbein 
(1967) and Fishbein and Ajzen (1975) 
model of the relationships among beliefs, at- 
titudes, intentions, and behaviors emphasizes 
the role of intentions in understanding the 
link between attitudes and behavior. The 
Locke (1968) model of task motivation also 
conceives of intention as an immediate pre- 
cursor of behavior. Drawing on these and 
other related theoretical models, a number 
of recent studies have assessed the role of 
intentions in predicting and understanding 
turnover. Table 9 summarizes these studies. 

It is evident from these studies that be- 
havioral intentions to stay or leave are con- 
sistently related to turnover behavior. It is 
also evident that this relationship generally 
accounts for more variance in turnover than 
does the satisfaction-turnover relationship. 
Conceptually this appears appropriate, since 
satisfaction is an affective or emotional re- 
sponse, whereas intentions are statements re- 
garding the specific behavior of interest, in 
this case, turnover. It is possible, as Mobley 
et al. (1978) suggested, that intentions also 
capture the individual's perception and eval- 
uation of alternatives. 

Although the relationship between inten- 
tions and turnover appears to be consistent 
and generally stronger than the satisfaction- 
turnover relationship, it accounts for less 
than 2446 of the variance in turnover. Among 
the possible reasons for this are that inten- 
tions do not account for impulsive behavior, 
that they do not adequately capture the per- 
ception and evaluation of alternatives, and 
that along with personal, organization, and 
external conditions, they may change be- 
tween original measurement and the observa- 
tion of actual behavior. The more specific 
the behavioral intention statement and the 
less time its measurement and the behavior, 
the stronger the relationship should be. How- 
ever, as Graen and Ginsburgh (1977) noted, 
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the more specific the intention measure and 
the closer the person is to actually quitting, 
the more trivial the prediction. Additionally, 
without analyses of the precursors of inten- 
tions, little knowledge of the psychology of 
turnover behavior is generated. 

Also included in Table 9 are two variables 
related to intentions in the Fishbein (1967) 
model; attitude toward the act of quitting 
and normative beliefs regarding quitting. The 
Newman (1974) study is one of the few that 
tests the Fishbein model with a turnover cri- 
terion. Both variables, as well as intentions, 
were significantly related to turnover. 

It is evident that intentions are a signifi- 
cant variable in the turnover process. How- 
ever, additional research is required on the 
antecedent and covariates of intentions, the 
manner in which intentions change over time, 
and the reasons for the lack of a stronger re- 
lationship between intentions and turnover. 

Organizational commitment, involvement, 
and job attachment. A number of research- 
ers have recently focused on the antecedents 
and consequences of organizational commit- 
ment (see, e.g., Porter et al. 1974; Steers, 
1977). Porter et al. (1974, p. 604) defined 
organizational commitment as a more global 
evaluative linkage between the employee and 
the organization, which includes job satisfac- 
tion among its components. More specifi- 
cally, organizational commitment was defined 
as the strength of an individual’s identifica- 
tion with and involvement in a particular 
organization and is characterized by (a) a 
strong belief in and acceptance of an or- 
ganization's goals and values, (b) a willing- 
ness to exert considerable effort on behalf of 
the organization, and (c) a definite desire to 
maintain organizational membership. Porter 
et al. stated that intention to remain is a 
component of commitment. 

More recently, Koch and Steers (1978) 
suggested that job attachment may be a pri- 
mary precursor of turnover. They defined 
job attachment as an attitudinal response to 
one's job characterized by (a) a congruence 
between one's real and ideal jobs, (b) an 
identification with one's chosen occupation, 
and (c) a reluctance to seek alternative em- 
ployment. Koch and Steers further noted 
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that job attachment is clearly related to or- 
ganizational commitment, although it focuses 
more specifically on one's occupation or job 
than on the organization as a whole, that job 
attachment should be more closely related to 
turnover than is satisfaction because of its 
conative or intentional component, and that 
it should be influenced relatively more by in- 
dividual than by job characteristics. 

Table 10 summarizes recent studies of the 
relation between these variables and turn- , 
over. Porter et al. (1974), Porter et al. 
(1976), and Steers (1977) all found that 
commitment was more significantly and nega- 
tively related to turnover than was satisfac- 
tion. Marsh and Mannari (1977) discovered 
a significant but weak negative correlation 
between commitment and turnover among 
Japanese employees, whereas Мігуіѕ and 
Lawler (1977) observed that organizational | 
involvement, one component of commitment, 
was significantly and negatively related to 
turnover. 

Koch and Steers (1978) found job attach- 
ment to be significantly and negatively re- 
lated to turnover. It should be noted that 
role orientation in the previously reviewed 
Graen et al. (1973) and Graen and Gins 
burgh (1977) studies (perceived relevance 
of the job for the worker's career) is related 
to at least one aspect of Koch and Steerss 
definition of job attachment. The Graen stud. 
ies found role orientation to be significantly 
and negatively related to turnover. 

The developing body of research on com- 
mitment and attachment suggests that these 
concepts are significantly and negatively те | 
lated to turnover and more strongly relat 
to turnover than to satisfaction. Howeven 
both commitment and attachment, as defined 
in the research cited above, are such complex 
constructs as to make generalizations rather 
tenuous. For example, is it the inclusion 0 
intentions in the operational definitions 0 
commitment and attachment that account 
for their improved prediction of turnover 
Is it not possible for congruence between 1 
dividual and organizational goals and values 
to vary independently of the other two com 
ponents of commitment? Perhaps a 20 
microanalytic treatment of these constructs 
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would prove useful. A model that incorpo- 
rates some components of commitment and 
attachment is discussed in a subsequent sec- 
tion. 3 

Met expectations. Porter апі Steers 
(1973) suggested that met expectations pro- 
vide a conceptual framework for the diverse 
turnover literature. They viewed this con- 
cept as the discrepancy between what a per- 
son encounters on the job in the way of posi- 
tive and negative experiences and what was 
expected. They predicted that “when an in- 
dividual’s expectations—whatever they are— 
are not substantially met, his propensity to 
withdraw would increace” (Porter & Steers, 
1973, p. 152). 

Since the Porter and Steers review, there 
have been several studies relevant to the met 
expectations hypothesis. A subset of these 
studies dealt with expectations at the time 
of original organizational entry. Table 11 
summarizes studies since the Porter and 
Steers review that are most relevant to met 
expectations. 

The Dunnette, Arvey, and Banas (1973) 
study found that leavers exhibited a greater 
discrepancy between original expectations 
and actual experiences than did stayers. 
However, significance levels were not re- 
ported and leavers’ perceptions of their last 
job were retrospective, suggesting the possi- 
bility of postdecision distortion. 

Farr, O'Leary, and Bartlett (1973) and 
Преп and Seely (1974) found some evidence 
that individuals given realistic information 
about the job (via a work sample and a 
booklet) exhibited lower turnover. However, 
in neither study were expectations or subse- 
quent experiences-directly assessed. 

Ilgen and Dugoni (Note 3) sought to as- 
sess directly the met expectation hypothesis, 
but found that met expectations were incon- 
sistently related to satisfaction or turnover. 
Wanous (1973) discovered that a realistic 
job preview, compared with a more tradi- 
tional orientation, lowered both expectations 
and thoughts of quitting, but did not signifi- 

‘cantly influence turnover. 

Direct support of the met expectations 
hypothesis is rather weak. In an insightful 

discussion of the conceptual and empirical 
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support for met expectations (particularly 
as related to realistic job previews), Tlgen | 
and Dugoni concluded that to expect realistic 
job previews to influence satisfaction, and 
subsequently turnover, through the mecha- 
nism of met expectations is naive. Specifi- 
cally, the hypothesis is theorized to inade- 
quately reflect individual differences in 
values. However, it should be noted that 
Porter and Steers (1973) appeared to ac- 
count for individual differences through “de 
sired expectations.” Ilgen and Dugoni also 
noted that accurate expectations cannot com- 
pensate for deficiencies in the immediate job 
environment. Additionally, the met expecta- 
tions hypothesis appears to give insufficient 
attention to the socialization and assimilation 
processes. | 

Although expectations may play an im- 
portant role in attachment, satisfaction, and 
turnover, a more complex conceptualization 
than the met expectations hypothesis ap- 
pears necessary. One such conceptual model 
is proposed in a subsequent section of this 
article. 


Multivariate Studies 


The preceding sections of this review have 
repeatedly suggested that multivariate studies 
are necessary in turnover research. Such stud- 
les are necessary in order to interpret the 
relative efficacy of numerous variables and 
constructs thought to be related to turnover, 
to resolve apparently contradictory bivariate 
Studies, to attempt to account for a greater 
proportion of the variance in turnover, and 
to move toward a more complete understand- 
ing of the turnover process. 

Table 12 summarizes recent multivariate 
studies that have used turnover as the cri- 
terion, Graen and Ginsburgh (1977) found 
that role orientation, leader acceptance, an 
their interaction accounted for 23% of the 
variance in university service employee turn- 
over. That the interaction accounted for 6% 
of the variance suggests that noncompens4 
tory models of turnover may be required. 

Mobley et al. (1978) tested a simplified 
version of a model of possible intermediate 
linkages between job satisfaction and d 
over (Mobley, 1977). Although a number 0 
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demographic, satisfaction, and perceived al- 
ternative measures exhibited significant bi- 
variate relations with turnover, multiple re- 
gression analysis revealed intention to quit 
as the only significant coefficient (7° = 24%). 

Mangione (Note 2) used 15 demographic, 
satisfaction, and occupational variables to 
predict turnover. He found the strongest re- 
gression coefficients to be satisfaction with 
comfort, satisfaction with co-workers, re- 
wards, industry type, age, tenure, occupa- 
tion, and satisfaction with financial rewards 
(r? = 40%; adjusted r? = 22%). Of particu- 
lar interest was the fact that three different 
classes of variables (satisfaction, demo- 
graphic, and occupational) were represented 
among the strongest regression coefficients. 
Satisfaction did not subsume the unique vari- 
ance in the demographic and occupational 
variables. Unfortunately, this study did not 
include perception and evaluation of alterna- 
tives. 

The Marsh and Mannari (1977) study is 
of particular interest because it dealt with 
a Japanese sample. These authors found only 
four variables with significant coefficients 
(sex, organizational stature, performance, and 
number of previous jobs). Commitment and 
satisfaction were among those that did not 
exhibit significant regression coefficients. 
This study serves to emphasize the necessity 
of evaluating models that can generalize be- 
yond the United States and Western indus- 
trialized nations. 

Newman (1974) conducted one of the few 
direct tests of the Fishbein (1967) model. 
Although individual regression coefficients 
were not reported, he discovered that 23% 
of the variance in turnover was accounted for 
by satisfaction, attitude toward quitting, nor- 
mative beliefs regarding quitting, and actual 
intentions to quit. The multivariate study by 
Waters, Roach, and Waters (1976) found a 
coefficient of 25%. Intentions accounted for 
18%, whereas the Job Descriptive Index 
(JDI) Work scale (Smith, Kendall, & Hulin, 
1969) and tenure added the additional 7%. 

Porter et al. (1974) observed that or- 
ganizational commitment and the JDI ac- 
counted for 21% of the variance in turn- 

over at two different points in time with age 
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partialed out. Commitment made the stronger 
contribution. 

Several generalizations are possible from 
these studies. First, each of the studies ас 
counted for more variance in turnover than 
did satisfaction or any other of the single 
variables. Thus, satisfaction does not appear 
to be an adequate composite of other pre. 
cursors and correlates of turnover. Also, it is 
evident that intentions, whether measured 
directly or included in commitment, enhance 
the prediction of turnover. With the excep- 
tion of the Mobley et al. (1978) study, other 
variables when combined with intentions en- 
hanced the prediction. It should be noted 
that few of the multivariate studies included 
either perception and evaluation of alterna- 
tives or cross-validation. These omissions 
continue to be major shortcomings of the 
research, 


Summary 


Employee turnover remains a frequently 
researched phenomenon. This is evident from 
the number of studies since the Porter and 
Steers (1973) review. Many of these studies 
have dealt with only a small subset of the 
variables potentially relevant to turnovel, 
and many are not based on a clear concep: 
tual model. This precludes making strong 
summary generalizations of the research 
studies. 

Table 13 presents a summary of the Porter 
and Steers (1973) review, the Price (1977) 
review, and the present authors’ conclusions 
based on the recent research. In the plact 
ment of categories in Table 13 an attempt 
has been made at maintaining the integrity 
of the various authors’ classification schemi 
and at calling attention to possible overlap 1 
classification groupings. In interpreting the 
table, negative refers to a negative relation 
ship, that is, the higher the variable the 
lower the turnover, while positive refers t0 B 
positive relationship. In the case of normi 
variables, the nature of the relationship * 
specified. 

The qualifers consistent, moderate, wesh 
or inconclusive are used in Table 13. The 
qualifiers refer to the consistency with W 
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a significant relationship was found and to 
the relative number of studies reporting such 
a relationship. These qualifiers do not refer 
to the strength of a relationship in terms of 
the size of a correlation or variance ex- 
_ plained. 
The present review, in agreement with the 
earlier reviews of Porter and Steers and 
| Price, found age, tenure, overall job satis- 
‘faction, and reaction to job content to be 
consistently and negatively associated with 
| turnover. Among the more recently studied 
variables, intentions and commitment-attach- 
| ment were found to consistently relate to 
| turnover. Because of the relatively few mul- 
tivariate studies, an ordering of these vari- 
ables in terms of relative contribution to 
turnover is tenuous. However, it appears 
that intentions and commitment-attachment 
(which includes intentions) made a stronger 
contribution to turnover behavior than did 
Satisfaction and demographic variables. Fur- 
ther research is needed for an adequate map- 
ping of the antecedents of intentions. 
Moderately consistent support for the neg- 
ative relationship between supervisory style 
and turnover was evident, a somewhat more 
qualified conclusion than that reached by 
Porter and Steers (1973). Recent research 
reveals an inconclusive pattern of results with 
respect to pay, promotion, and peer group 
relations, These results stand in contrast 
with the consistent negative generalization 
of Porter and Steers’s review. Differences in 
the availability of alternatives, the lack of 
Multivariate studies, the lack of multiple 
“Measures of perception or affect, and the 
| lack of a clear conceptual model make inter- 
Pretation of these differences difficult. 
4 Тће compelling conceptual argument that 
К аге an important variable in the 
in d Process continues to be supported 
B&regate-leve] studies, but has weak sup- 
E at the individual level because it has 
infrequently studied. 
Bet support for the met expectations 
E E. is weak. Although realistic job 
MN s have been shown to be a possible 
“this Bo uen turnover, the psychology of 
ect is not well understood. 
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Finally, the limited number of multivari- 
ate studies indicates that greater variance in 
turnover can be explained by using multiple 
variables, that a great deal of variance is still 
unexplained, that inclusion of intentions sig- 
nificantly enhances the prediction of turn- 
over, and that satisfaction is an inadequate 
summary variable for capturing the effects of 
other demographic, organizational, occupa- 
tional, or external variables. 


Methodological and Conceptual Comments 
Predictive Designs 


It is encouraging to note that an over- 
whelming majority of recent research designs 
have been predictive rather than retrospec- 
tive. However, few studies have used repeated 
measures of the perceptual or affective vari- 
ables, the Porter et al. (1976) and Graen and 
Ginsburgh (1977) studies being substantive 
exceptions. To the extent that turnover is a 
dynamic process, longitudinal designs with 
repeated measures should be of high utility. 


Linear Models 


Most research has been based on the as- 
sumptions of a linear and compensatory 
model. Fleishman and Harris (1962) earlier 
demonstrated the possibility of a nonlinear 
relationship between supervisory style and 
turnover. Mangione (Note 2) was one of the 
few authors to examine possible nonlinear 
relationships. Additionally, the Graen and 
Ginsburgh (1977) finding of a significant 
interaction between role orientation and 
leader acceptance calls attention to the need 
for further exploration of interaction terms. 


Criterion 


A troublesome issue in turnover research 
concerns the definition of turnover rates and 
types of turnover at both the aggregate and 
individual levels (see Price, 1975-1976, 1977, 
for an evaluation of a number of aggregate 
measures of turnover). At the individual 
level, one of the more troublesome issues is 
the distinction between voluntary and invol- 
untary turnover. The bulk of the individual- 
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level turnover research focuses on voluntary 
turnover. 

Precise definitions of voluntary turnover 
are infrequently given, and what is included 
as voluntary may differ across studies. For 
example, Marsh and Mannari (1977) incor- 
porated pregnancy under voluntary turnover, 
whereas Mirvis and Lawler (1977) and Wa- 
i ters et al. (1976) excluded pregnancy. The 
LU definition of voluntary may well have con- 
tributed to Marsh and Mannari’s finding 
of a significant difference in turnover as a 
function of sex. Much more subtle effects 
may be associated with results as a function 
of the definition of voluntary. 

The categorization of reasons for turnover 
is frequently taken from company records. 
Many personnel practitioners readily admit 
that a variety of factors influence the ad- 
ministratively recorded reason for attrition. 
Lefkowitz and Katz (1969) reported signifi- 
cant differences in administrative and self- 
reported reasons for termination. In agree- 
ment with Forrest et al. (1977), further 
efforts to clarify the implications of different 
operational measurements of turnovers are 
appropriate. A multiple measure approach 
to identifying reasons for turnover would be 
useful. 

Finally, little research has addressed the 
relationship between voluntary and involun- 
tary turnovers. To assume that these are com- 
pletely independent phenomena, especially 
in the case of discipline-related terminations, 
appears simplistic. 

Although turnover is frequently thought 
of as a “clean” objective criterion, the issues 
raised above suggest the need for greater at- 
tention to the criterion problem in turnover 


research. 


Present 
review 


Consistent negative 


Consistent positive 
Weak negative 


Consistent positive 
Weak positive 


Price (1977) 


Consistent positive 


————UMá РУМА" ___ 


Porter & Steers (1973) 


Measures of Satisfaction 


In recent years, the JDI (Smith et al., 
1969) has become the predominant measure 
of satisfaction with various facets of the job 
setting. The majority of satisfaction-turn- 
over studies reviewed in this article used the 
JDI. The careful development of the JDI is 
well documented, and there is a clear ad- 
vantage to using a common satisfaction mea- 
sure across a variety of studies. However, 


Variable 
Level of employment/opportunity 
Perceived alternatives 
Commitment/attachment 


Intentions to quit 
Met expectations 


Table 13 (continued) 
External environment 
Recently studied variable 
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overreliance on any single measure raises the 
possibility that method variance has con- 
taminated supposedly generalizable relation- 
ships. As Gillet and Schwab (1975) sug- 
gested, it seems prudent to use multiple mea- 
sures of the same construct wherever possible. 


Time 


The role of time in turnover research is 
evident in a number of ways. As noted 
earlier, there is a consistent negative rela- 
tionship between tenure (length of time on 
the job) and turnover. Some studies have 
ignored tenure, some have partialed out its 
effect, and others have included it in a multi- 
variate design. Understanding the psychology 
of the tenure effect is probably best facili- 
tated by the latter treatment. 

The time variable is also part of the cri- 
terion problem to the extent that different 
studies measure turnover over different 
lengths of time. Marsh and Mannari (1977) 
collected their turnover data over a 4-year 
period, whereas other studies have looked 
at turnover over a matter of weeks, for ex- 
ample, Newman (1974). The effect of differ- 
ing lengths of time between measurement of 
independent. variables and the turnover be- 
havior is infrequently studied. Porter et al. 
(1976) and Waters et al. (1976) are excep- 
tions. This appears to be a topic in need of 
additional research. 

Finally, the temporal dimension may be 
relevant to the extent that different variables 
or combinations of variables exert a differen- 
tial influence on turnover as a function of 
stages in the organizational socialization pro- 
cess. Graen and Ginsburgh (1977) discussed 
this possibility. 


Primacy of Work 


Turnover is generally conceptualized in 
terms of demographic, organizational, and 
individual affective factors and on infrequent 
occasions in terms of perceived alternatives. 
While such conceptualizations may reflect in- 
dividual values relative to the work setting, 
they do not reflect the importance of work- 
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related values relative to other life values and 
interests. The work of Dubin and his associ. | 
ates (see, e.g., Dubin, Champoux, & Porter, 
1975) has demonstrated that differences in 
central life interests are related to differences 
in evaluations of the work environment and 
in organizational commitment. Marsh and 
Mannari (1977) found a significant negative 
relationship between primacy of work values 
and turnover. It appears that future turn- 
over research should deal not only with the 
work environment and external alternatives 
but also with the centrality of work relative 
to other life values and interests. 


Conceptual Model of Employee Turnover 


Drawing in part on the review and analysis 
presented earlier, this section develops a con- 
ceptual model of the employee turnover pro- 
cess. A simplified schematic representation of 
this model is presented in Figure 1. Among 
the characteristics of this model are the 
following: 

1. It is a model of individual-level turn- 
over behavior. Individual differences in pet 
ceptions, expectations, and values are ex 
plicitly recognized. Further, individual dif 
ferences in personal and occupational vari- 
ables are included. 

2. Perception and evaluation of alternative 
jobs is given explicit treatment. 

3. The probable roles of centrality of work 
values and interests relative to other values 
and interests, beliefs regarding nonwork сој“ 
sequences of quitting or staying, and col 
tractual constraints are specifically recog: 
nized. 

4. The possible joint contribution to tur 
over of job satisfaction (present affect); joh 
attraction (expected future affect), and at- 
traction of attainable alternatives is propose" 

5. Intention to quit is considered to be 
the immediate precursor of turnover, W! 
impulsive behavior and the time betwee 
measurement of intentions and behavior 3 
tenuating this relationship. pel 


The rationale for the model is describ 
working | 


starting with turnover behavior and 
back through its antecedents. 


iate precursor of behavior is 
be intentions (Dulaney, 1961, 
ein & Ajzen, 1975; Locke, 1968; 
“8 Ryan, 1970). Therefore, the 
of turnover should be intention 
, Kraut, 1975; Mobley et al., 
1974; Waters et al., 1976). 
hip between turnover and in- 
ld be stronger the more specific 
n statement and the closer in 
rement of the intention and 
Impulsive behavior attenuates 


-Ginsburgh (1977) observed 
Specific and closer to the act 
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the intention measure comes, the more trivial 
the prediction. Although probably valid, this 
observation should not be interpreted as in- 
dicating that understanding the turnover 
process is not facilitated by including inten- 
tions and evaluating their precursors. Neither 
should the personnel-planning utility of as- 
sessing even more distant intentions (see, 
eg. Kraut, 1975) be overlooked. 

There are at least two intentions of in- 
terest, intention to search and intention to 
quit. Mobley (1977) suggested that intention 
to search and search behavior should gen- 
erally precede intention to quit and turnover. 
Exceptions include impulsive behavior and 
nonsolicited attractive alternatives. Lack of 
perceived attractive alternatives or an un- 
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successful search may lead to forms of with- 
drawal other than turnover intentions and 
behavior. Relations among alternative forms 
of withdrawal and the effects of no alterna- 
tive or an unsuccessful search continue to 
need additional research. The primary deter- 
minants of intentions are thought to be (a) 
satisfaction, (b) attraction expected utility of 
present job, and (c) attraction expected util- 
ity of alternative jobs or roles. 


Satisfaction 


The present review and previous reviews 
(eg., Locke, 1976; Porter & Steers, 1973; 
Herzberg et al., Note 1) have documented 
the consistent and negative, although mod- 
erate, relationship between job satisfaction 
and turnover. Satisfaction is seen as the affec- 
tive response to evaluation of the job. This 
evaluation is considered to be a function of 
perceptions of various aspects of the job 
relative to individual values (Locke, 1969, 
1976). It is important to note that satisfac- 
tion is present rather than future oriented. 
The behavioral implication of satisfaction— 
dissatisfaction is a tendency toward ap- 
proach-avoidance. However, whether this 
approach-avoidance tendency is expressed in 
the form of turnover is thought to be related 
to at least three other classes of variables: 
attraction expected utility of the present role, 
attraction expected utility of attainable al- 
ternative roles, and centrality of work values, 
beliefs regarding nonwork consequences of 
quitting-staying, and contractual constraints. 
Failure to consider these classes of variables 
may explain the absence of a stronger rela- 
tionship between job satisfaction and turn- 
over. 


Attraction and Expected Utility of 
Present Job 


Whereas satisfaction is present oriented, 
attraction is considered to be future oriented. 
Attraction is seen as being based on the ex- 
pectancies that the job will lead to future 
attainment of various positively and nega- 
tively valued outcomes. When combined with 
the expectancy of being able to retain the 
present job, an index can be generated that 
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is analogous to Vroom’s (1964) force fora 
single alternative and the expected utility 
index of Dachler and Mobley (1973) or 
Graen (1976). 

The concept of expected utility is applied 
in a variety of models. For example, econo- 
mists and decision theorists frequently use 
expected utility and expected value, concepts 
analogous to the expected utility previously 
described (see March & Simon, 1958, pp. 
137-138). In one interdisciplinary analysis, 
Blau, Gustad, Jessor, Parnes, and Wilcox 
(1956) conceptualized the evaluation of oc 
cupational alternatives as “the individual's] 
valuation of the rewards offered by different 
alternatives and his appraisal of his chances) 
of being able to realize each of the alterna- 
tives” (p. 533). | 

It is thought that attraction expected util-] 
ity of the present role, like satisfaction, con- 
tributes to the tendency toward approach- 
avoidance. Although many studies have ana 
lyzed the satisfaction-turnover relationship, 
the dual contribution of satisfaction and at 
traction expected utility to turnover has not 
been researched. While there may be some 
correlation between satisfaction and attrac 
tion expected utility, these variables are com 
ceptually distinct and should have separate 
effects on intentions (to search or to quit) 
and turnover. For example, individuals may 
be satisfied (or dissatisfied) with their pres 
ent job, but may expect the present job to 
be relevant (or irrelevant) to their subst: 
quent career. Graen and Ginsburgh (1977) 
found the latter belief to be significantly 16 
lated to resignation. On the other hand, 02 
may be dissatisfied with one's work grout) 
but be attracted to it because of expectation 
that it will facilitate the future attainment 
of valued outcomes or goals. The above re: 
lationships can easily be extended to include 
a variety of other job factors, for example, 
supervision, benefits, and so forth. 

Just as there are multiple dimensions 
satisfaction, there are multiple dimensions 0 
attraction. The salience of these different $ 
mensions is а function of individual diffe" 
ences in values. The values may be relat 
to occupation, position level, age, tenure, a” 
other personal variables. 


of 


, One dimension of satisfaction and attrac- 
tion concerns organizational goals and values. 
The congruence between individual and or- 
ganizational goals and values has been de- 
fined by Porter et al. (1974) and Steers 
(1977) as one component of organizational 
commitment, As noted earlier, these authors 
also included willingness to exert effort, de- 
sire and intention to remain in the organiza- 
tion, and job satisfaction in the definition of 
commitment. The model suggested here seeks 
to subdivide the complex variable of com- 
mitment, Congruence between individual and 
organizational goals and values may be an 
important variable, but can be seen as dis- 
tinct from, and only one of several focuses 
of, both satisfaction (present) and attraction 
*(future), which are in turn related to turn- 
over intentions and behavior. 

If both satisfaction and attraction expected 
utility contribute to intentions to search and 
quit, and to turnover, then it is necessary to 
analyze the conditions under which one or 
the other makes the most contribution to 
variance explained. It may be that individ- 
ual-level variables such as the need for im- 
mediate versus delayed gratification (see, e.g., 
Mischel, 1976) will aid in predicting whether 
Satisfaction (present) or attraction (future) 
15 most strongly related to turnover inten- 
tions and behavior for given individuals. 


Attraction and Expected Utility of 
Alternatives 


Цена both satisfaction and attrac- 
expected utility should increase our un- 
f oid and prediction of turnover in- 
fios Erud behavior. However, the attrac- 
gm and attainability of alternative jobs or 
a also be considered. March and 
partic inne presented an organizational 
kole ee model that gave a prominent 
: B Visibility of alternatives. The March 
bility WE Components of “perceived desira- 
ceived e €aving the organization” and “per- 
E td of movement from the organiza- 
Pectancieg 25) roughly correspond to “ex- 
Pectanc: d future job outcomes" and "ex- 
ae ae attaining alternative” in Figure 
NI [ea earlier, these variables have re- 
| ! € attention in turnover research. 
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The present model suggests that it is not 
merely the visibility of alternatives but the 
attraction of alternatives and the expectancy 
of attaining the alternatives that are most 
salient. Forrest et al. (1977), Mobley (1977), 
and Schneider (1976) are among recent 
authors who make a strong argument for 
inclusion of the variable attraction of alterna- 
tives in turnover research. Attraction of al- 
ternatives is defined in terms of expectations 
that the alternative will lead to the future 
attainment of various positively and nega- 
tively valued outcomes. When combined with 
the expectancy of being able to attain the 
alternative, an index can be generated that 
is analogous to Vroom’s (1964) force for a 
single alternative and to the expected utility 
index of Dachler and Mobley (1973) or 
Graen (1976). (See Mobley, 1977, for a 
microanalytic treatment of the possible role 
of alternatives in search and turnover in- 
tentions and behavior.) 

As noted in Figure 1, there may well be 
some covariation among satisfaction, attrac- 
tion expected utility of the present job, and 
attraction expected utility of alternatives. 
This is to be expected, since values are com- 
mon to all three and the presence or absence 
of attractive alternatives may result in the 
revaluation of one’s satisfaction with or the 
attraction of the present role. 


Moderating Variables 


Although satisfaction, attraction expected 
utility of the present job, and attraction ex- 
pected utility of alternatives are considered 
to be the primary determinants of turnover 
intentions and behavior, several other vari- 
ables can be expected to moderate these re- 
lationships. To the extent that nonwork 
values and interests are not central to an 
individual’s life values and interests (Dubin 
et al., 1975) and to the extent that an in- 
dividual associates significant nonwork con- 
sequences with quitting (see, e.g., Newman, 
1974), the relationships among satisfaction, 
attraction, and turnover intentions and be- 
havior will be attenuated. Additionally, to 
the extent that an individual is bound by a 
contract, as for example in professional 
sports, the military, and certain professions, 
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the relationships will be attenuated during 
the term of the contract. Under such circum- 
stances, it can be hypothesized that the in- 
dividual who is dissatisfied, perceives little 
attraction in the present job, or perceives an 
attractive alternative may engage in other 
* forms of avoidance and withdrawal behavior. 
These suggested moderating influences, es- 
pecially nonwork values and interests and 
nonwork consequences of turnover behavior, 
call attention to the need to look beyond 
the work setting for a complete understand- 
ing of the psychology of the turnover process. 


Antecedents 


The antecedents of satisfaction and attrac- 
tion are considered to be organizational vari- 
ables as perceived by the individual, eco- 
nomic variables related to availability of 
alternatives as perceived by the individual, 
and individual-level occupational and per- 
sonal variables as they influence individual 
values, perceptions, and expectations. Al- 
though a detailed analysis of these antece- 
dents is beyond the scope of this article, it is 
important to emphasize that the influence of 
various organizational, economic or labor 
market, occupational, and personal variables 
is through individual perceptions, expecta- 
tions, and values. 


Research Implications 


The model described here indicates the 
need for multivariate research on the turn- 
over process. As noted in the review section 
of this article, although the negative rela- 
tionships between both age and tenure and 
turnover are well established, the amount of 
variance explained is low and the psychology 
oí the relationships is not well understood. 
The model proposed here suggests that mul- 
tivariate research that concurrently assesses 
values, job-related perceptions, external per- 
ceptions, and the previously mentioned mod- 
erating variables should facilitate an under- 
standing of the relationship of age and tenure 
to turnover. 

Similarly, multivariate research that con- 
currently assesses individual-level occupa- 
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tional and personal variables, job-related per- 
ceptions, external perceptions, individui 
values, and potential moderating variable 
provides a framework for integrating and un- 
derstanding, at the individual level, the ag. 
gregate-level effects of various organizational 
and economic or labor market variables sum- 
marized by Price (1977). | 

Graen (1976), among others, noted that} 
neither individuals nor organizations are fixed 
or static; neither is the economy or labor 
market. The clear implication is that under 
standing the turnover process will require 
longitudinal as well as multivariate research, 
Longitudinal research, not simply in terms 
of the collection of criterion data over time 
but also in terms of repeated measures ol 
the independent variables, as recently exem: 
plified by Graen and Ginsburgh (1977) andy 
Porter et al. (1976), is needed. 


Conclusions 


In 1973 Porter and Steers observed that 
the then existing body of research left much 
to be understood about the psychology @ 
the employee withdrawal process. Review of} 
the subsequent research leads to a similar 
observation. The conceptual model suggested 
here calls attention to the possible mam 
effects of satisfaction (present oriented), at 
traction expected utility of the current role 
(future oriented), and attraction expect? 
utility of alternative roles. A number of mot 
erating variables and constraints were sug 
gested. The need for integrative, multivariate 
longitudinal research is evident if significan 
progress is to be made in understanding t 
psychology of the employee turnover process 
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Excitatory Pavlovian conditioning of a discrete conditioned stimulus is attenu- 


ated by prior exposure to the unconditio 


ulus preexposure phenomenon is observed in 
procedures as diverse as eyelid conditioning, 


ned stimulus. The unconditioned stim- 
a variety of Pavlovian conditioning 
the conditioned emotional response, 


and conditioned taste aversion learning. This article discusses the variables that 
affect the unconditioned stimulus preexposure phenomenon and uses this in- 


formation in evaluating both associative 


phenomenon. At least one associative ac 


at least one nonassociative account, based 


and nonassociative accounts of the 
count, based on context blocking, and 
on central habituation of the emo- 


tional response to the unconditioned stimulus, remain viable. 


4 


The primary goals of research in Pavlovian 
Conditioning are to determine the variables 
that influence the formation of conditioned 
responses and then to specify their mecha- 
nisms of action. A number of investigators 
are currently examining how one such vari- 
‘able, the organism's experience with the un- 
conditioned stimulus, affects the course of 


- Conditioning, It has been shown in a variety 


of Pavlovian conditioning paradigms that ex- 
Posure to the unconditioned stimulus (UCS) 
Prior to the initiation of Pairings of the con- 
ditioned stimulus (CS) and the UCS attenu- 
ates the formation of the excitatory condi- 
tioned response (CR). The UCS preexposure 
effect interests many investigators because of 

eir conviction that a thorough analysis of 
= 15 phenomenon will further our understand- 


Ing of the necessary conditions for Pavlovian 


Conditioning, 
In general, the UCS preexposure effect has 
П viewed from two different perspectives, 
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each of which bears on the possible mecha- 
nism(s) of Pavlovian conditioning. Associa- 
tive accounts of the UCS preexposure effect 
argue that during the preexposure phase, an 
animal learns some relation between the oc- 
currence of the UCS and other events that 
occur in the situation, such as the relation 
between the UCS and contextual stimuli. 
The learning that occurs in the preexposure 
phase somehow interferes with the formation 
of an association between a nominal CS and 
the UCS in the second phase of the experi- 
ment. For example, a context-blocking inter- 
pretation of the UCS preexposure phenome- 
non (e.g. Rescorla & Wagner, 1972) posits 
that some stimulus aspect (X) of the experi- 
mental situation acquires associative strength 
during preexposure to an aversive UCS. Con- 
ditioning of Stimulus X reduces the amount 
of associative strength that a nominal stim- 
ulus (A) can acquire during subsequent ex- 
citatory conditioning in that environment 
(ie., pairings of AX with the UCS). As a 
result, the rate of acquisition of an excitatory 
CR to Stimulus A is attenuated. On the other 
hand, nonassociative accounts generally hold 
that preexposure to the UCS reduces the or- 
ganism's reactivity to subsequent applications 
of that UCS. Such a reduction in reactivity 
may result from either central habituation 
or peripheral sensory adaptation of the re- 
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sponse to the UCS. In any case, reduced re- 
activity to the UCS should result in attenu- 
ated excitatory conditioning to a CS paired 
with that UCS. 

The present article reviews the effects of 
preconditioning exposure to the UCS on the 
acquisition of excitatory CRs and evaluates 
the various hypotheses that have been ad- 
vanced to explain this phenomenon. This re- 
view focuses on Pavlovian conditioning using 
aversive UCSs, since most of the published 
experiments on this phenomenon have used 
such stimuli. Moreover, the review focuses 
only on what Domjan and Best (1977) re- 
ferred to as the durable UCS preexposure 
effect, which occurs even when long delays 
intervene between UCS preexposure and con- 
ditioning. The recent demonstrations of a 
transient or proximal UCS ргеехроѕше ef- 
fect, which occurs only when the UCS is 
preexposed a short time before the condi- 
tioning trial, are not discussed (cf. Domjan 
& Best, 1977; Terry, 1976; Wagner, 1976). 


Preconditioning Exposure to the UCS 


In a typical Pavlovian conditioning ex- 
periment, a neutral stimulus (CS) is pre- 
sented prior to and terminates with the 
presentation of a UCS. After repeated pair- 
ings of the CS and UCS, the previously neu- 
tral stimulus comes to elicit a CR. In this 
section, studies that present the UCS prior 
to the initiation of CS-UCS pairings, or 
preexpose the UCS, are organized according 
to the nature of their experimental paradigm. 
The paradigms include human and rabbit 
eyelid conditioning procedures, conditioned 
emotional response (CER) procedures, and 
conditioned taste aversion procedures. 


Human and Rabbit Eyelid Conditioning 


Taylor (1956) preexposed each of three 
groups of human subjects to 50 presentations 
of an air puff UCS to the cornea of the eye. 
The intensity of the UCS was either 15 mm, 
30 mm, or 80 mm. In each group the ampli- 
tude of the unconditioned eyeblink response 
(UCR) decreased significantly with repeated 
presentations of the UCS, but there were no 
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significant between-groups differences in the 
amplitude of the UCR. Each subject then 
received 50 presentations of a 520-msec light 
paired with an air puff of 30 mm to the 
cornea, The number of eyeblink CRs was 
greatest in the group that received no pre- 
exposure to the UCS, whereas the number 
of CRs in the groups preexposed to the UCS 
was an inverse function of the intensity of 
the UCS used during the preexposure phase. 

Taylor considered the possibility that 
groups preexposed to the air puff UCS 
showed a reduced number of eyeblink CRs 
during excitatory conditioning because of 
peripheral sensory adaptation of the UCR. 
However, she rejected peripheral sensory 
adaptation as a complete explanation of these 
effects because there was no evidence of dif- 
ferential UCR magnitudes in the three UCS 
intensity groups at any time during the pre- 
exposure phase. Such differences would be 
expected to accompany differential sensory 
adaptation. 

As an alternative explanation, Taylor sug- 
gested that a more general emotional or fear 
response elicited by the UCS was reduced in 
magnitude as a result of preexposure to the 
UCS. This argument requires two assump- 
tions. It assumes that acquisition of an е 
citatory CR depends in part on an associa- 
tion formed between the CS and the orga 
ism's emotional reaction to the UCS. Konorski 
(1967) made a similar, albeit more general, 
point, asserting that in all Pavlovian aversive 
conditioning, a fear (preparatory) response 
is first conditioned to the CS and that the 
presence of this preparatory CR is necessaty 
for the formation of a consummatory CR, 
for example, the eyeblink response. Second, 
the argument assumes that preexposure to) 
the UCS reduces the magnitude of this 8°% 
eral emotional response elicited by the UCS, 
even though the specific UCR (in Konor 
ski’s terms, the consummatory response) me 
sured by an experimenter may be relativel 
unaffected. The slower rates of acquisition 
of an excitatory CR in subjects preexpost? 
to the UCS would then reflect a reduced em% 
tional responsiveness relative to that of sub: 
jects not preexposed to the UCS. У 


PREEXPOSURE PHENOMENON 


Kimble and Dufort (1956) also examined 
the effect of preexposure to an air puff UCS 
on subsequent acquisition of the eyeblink 
response in humans, One group of subjects 
were preexposed to 20 air puffs (90 mm), 
while a second group of subjects received no 
preexposure treatment, Both groups then re- 
ceived 60 conditioning trials in which a .25- 
sec light CS was paired with an air puff 

+ UCS (90 mm). The group preexposed to the 
UCS acquired the eyeblink CR slower than 
the no-preexposure controls. These authors 
also offered an explanation based on motiva- 
tional factors, but in one respect it is the 
opposite of the account suggested by Taylor. 
According to their view, preexposure to the 
UCS produces an increase in drive, and this 
drive energizes the subject's dominant habit 
regardless of its nature. If the subject's domi- 
nant habit before conditioning were not the 
CR and, indeed, if the dominant response to 

| the CS were antagonistic to the CR, then in- 
creasing the subject's drive would interfere 

, With subsequent excitatory conditioning. 

In a similar experiment, Hobson (1968) 
preexposed college men and women rated as 
high- or low-anxiety subjects on the Taylor 

Manifest Anxiety Scale to air puffs (1.5 1b/ 
Sq. in.) delivered to the cornea of the eye for 
0, 35, or 70 trials. Following preexposure 
to. the UCS all subjects received excitatory 
xconditioning of the eyelid response. The num- 
ber of CRs obtained during excitatory con- 
ditioning was inversely related to the num- 
ber 9f preexposures to the UCS. In addition, 
high-anxiety subjects acquired the CR sig- 
nificantly faster than low-anxiety subjects, 
although both classes of subjects showed a 
_ Preexposure effect, If manifest anxiety is con- 
_ Sideted a source of drive, then Kimble and 

E account predicts less interference 

_ lence faster eyelid conditioning in low- 

axlety subjects than in high-anxiety sub- 
Eo the other hand, perhaps Hobson's 
E variable represents some aspect of 
E emotional responsiveness. Low-anxi- 
го. us preexposed to the UCS may 
m m t c excitatory CR at an even slower 

B: v ps high-anxiety subjects preexposed to 

d E. because of a reduced emotional re- 
: !Veness to aversive stimuli. 
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Siegel and Domjan (1971) preexposed one 
group of rabbits to 550 100-msec, 200-V 
shocks to the infraorbital region of the eye, 
whereas a second group of rabbits received 
no preexposure to the UCS. Infraorbital 
shock elicits an unconditioned extension of 
the nictitating membrane, or third eyelid, in 
the rabbit. The nictitating membrane re- 
sponse was then conditioned in both groups 
by pairing a 500-msec tone with a UCS of 
the same intensity as that used during the 
preexposure phase. The rabbits preexposed to 
the UCS showed a slower rate of acquisition 
of the nictitating membrane CR than the 
no-preexposure controls. 

In a more extensive study, Mis and Moore 
(1973) preexposed groups of rabbits to elec- 
tric shock delivered to the infraorbital region 
of the eye and varied both the number of 
preexposed UCSs (0, 50, 200, or 350) and 
the interval between the last preexposed UCS 
and subsequent excitatory conditioning of 
the nictitating membrane response (30 sec 
vs. 24 hours). There were no significant 
changes in the amplitude of the UCR over 
trials during the preexposure phase. The rate 
of acquisition of the nictitating membrane 
response was reduced in both the 30-sec- 
delay and the 24-hour-delay groups, relative 
to controls that received no preexposure, and 
was an inverse function of the number of 
shocks administered during the preexposure 
phase. Moreover, rabbits in the 30-sec-delay 
groups acquired the excitatory CR at a 
slower rate than their counterparts in the 24- 
hour-delay groups. In general, there were 
no differences in the asymptotic levels of | 
excitatory conditioning in the various groups 
after repeated CS-UCS pairings. 

In a second experiment, groups of rabbits 
were preexposed to (a) no shocks, (b) 450 
1-mA shocks, (c) 450 3-mA shocks, or (d) 
450 5-mA shocks to the infraorbital region 
of the eye. There were no decrements in the 
amplitude of the UCR over successive pre- 
exposures to the UCS, and the UCRs to 
shocks of various intensities were compar- 
able. All groups then received immediate ex- 
citatory conditioning of the nictitating mem- 
brane response; the CS was paired with the 
3-mA UCS., The nictitating membrane CR 


526 


was acquired fastest in the no-shock and 
1-mA groups, and slowest in the 3-mA and 
5-mA groups. However, all groups eventually 
attained approximately the same asymptotic 
level of conditioning. 

Mis and Moore’s experiments show that 
the decremental effect of preexposure to the 
UCS on the rate of acquisition of the nicti- 
tating membrane response is a direct func- 
tion of the number of preexposures to the 
UCS (cf. Hobson, 1968), a direct function 
of the intensity of the prior UCSs (cf. Tay- 
lor, 1956), and an inverse function of the 
time interval between the last preexposure to 
the UCS and the initiation of excitatory con- 
ditioning. Thus, analogous parametric ma- 
nipulations during preexposure to the UCS, 
for example, variations of the number of 
preexposures, produce similar effects on con- 
ditioning of the eyelid response in humans 
and conditioning of the nictitating membrane 
response in rabbits. 

It is unlikely that sensory adaptation to 
the UCS can account for the retarded ac- 
quisition of excitatory conditioning in Mis 
and Moore’s experiments because the am- 
plitude of the UCR did not change during 
preexposure to the UCS. Furthermore, diff- 
erent UCS intensities produced comparable 
UCRs, as was the case in Taylor's study of 
the human eyeblink. 

Several investigators (e.g, Rudy, Iwens, 
& Best, 1977; Tomie, 1976) have asserted 
that the. UCS preexposure effect can be ex- 
plained in terms of blocking of conditioning 
to the nominal CS in the second phase of 
the experiment as a result of prior excitatory 
conditioning of some cues that were present 
in both phases (Kamin, 1969). In Kamin's 
original demonstration of blocking, one group 
of rats were exposed to repeated presenta- 
tions of Stimulus A paired with electric shock 
in a CER paradigm (Estes & Skinner, 1941), 
whereas a second group of rats received no 
treatment. Then both groups received pre- 
sentations of a compound stimulus (AB) 
paired with an electric shock of the same in- 
tensity as that used during the first phase 
of the experiment. Following the compound 
conditioning, the conditioned properties of 

A and B were assessed. The group that had 
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received no treatment in the initial phase 
showed strong conditioned suppression dur- | 
ing both Stimulus A and Stimulus B. The 
group that had received conditioning of Stim- 
ulus A in the initial phase showed strong con- 
ditioned suppression during test presentations 
of Stimulus A alone but little or no sup- 
pression during presentations of Stimulus B 
alone. Thus, prior conditioning of Stimulus A 
is said to block the conditioning of Stimulus 
B that normally occurs when the compound 
stimulus, AB, is paired with a UCS. 

A blocking interpretation of Mis and 
Moore’s UCS preexposure effect would hold 
that some static cues present in the experi- 
mental environment during the preexposure 
phase, call them contextual cues, acquire as- | 
sociative strength by virtue of their presence 
when the UCS occurs. Then when the excita- 
tory conditioning phase begins, the nominal 
CS conveys less information about the UCS 
than in control groups because contextual 
cués already predict the occurrence of the 
UCS. Hence, acquisition of the excitatory 
CR is retarded. 

Mis and Moore concluded that a blocking 
interpretation of their data is highly specu- 
lative because there is no independent evi 
dence that the rabbit nictitating membrane 
response can be conditioned to background 
cues (cf. Plotkin & Oakley, 1975; Moore, 
Note 1). However, this would not preclude 
conditioning of some preparatory, emotional 
response to the background stimuli. Suppost 
that the nictitating membrane CR can only 
be evoked by CSs that already evoke an 1m 
crease in a preparatory, emotional response. 
Then prior conditioning of the emotion 
response to contextual cues could block con 
ditioning of the emotional response to the 
nominal CS in the second phase, thereby 
blocking conditioning of the nictitating menm 
brane response. However, if one assumes © 
the background stimuli are much less salient 
stimuli than the nominal CS, then one shou i 
expect a relatively small blocking © | 
(Feldman, 1975; Hall, Mackintosh, Goodall 
& dal Martello, 1977). 

Mis and Moore's interpretatio: 
results was that preexposure to the 
made the rabbits less emotionally тей 


tive 


PREEXPOSURE PHENOMENON 


to the UCS and that reduced responsiveness 

“to the UCS slowed the rate of excitatory 
conditioning—an argument identical to that 
proposed by Taylor (1956). 


Conditioned Emotional Response Paradigms 


Kamin (1961) examined the effect of prior 
exposure to electric shock on the subsequent 
acquisition of a conditioned emotional re- 
sponse (CER). In Experiment 1, two groups 
of rats were trained to bar press for food re- 
inforcement on a  variable-interval (VI) 
Schedule. One group of rats then received 
10 days of exposure to a .5-sec, .8-mA elec- 
tric shock delivered through a grid floor at 
the rate of four shocks per day. A second 
group of rats received no preexposure treat- 
ment, but both groups of rats continued bar 
pressing for food reinforcement during this 
period. AIl rats then received CER training 
in which pairings of a 3-min white noise CS 
and а .5-ѕес, .8-mA electric shock were super- 

Imposed on the VI baseline. The group pre- 
exposed to the UCS showed a retarded rate 
of acquisition of conditioned suppression rel- 
ative to the no-treatment controls. 

Kamin replicated these findings in a second 
experiment using a 1.0-mA electric shock in 
both the Preexposure and the conditioning 
phases, However, he observed that the decre- 
menta] effect of preexposure to the UCS was 
Teduced relative to the first experiment. 
Further, Kamin commented that a group of 
rats preexposed to a .72-mA electric shock 
Teadily acquired the CER if the intensity of 

- the UCS used in the excitatory conditioning 
Phàse was 2.4 mA. 
- Ina third experiment, Kamin assessed the 
effect of UCS intensity. Essentially the same 
“Procedure was used as in Experiment 1, ex- 
cept that three groups of rats were preex- 
dm to shock intensities of .28 mA, .49 mA, 
E m. mA. All groups then received CER 
nditioning with a .85-mA UCS. Groups 
pore Preexposed to the UCS showed a 
"T dh Tate of acquisition of the CER rela- 
аца Do-preexposure controls, and ће mag- 
| ES 9f the retardation effect was a direct 
\ топ of the intensity of the UCS used 


» 
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during preexposure (cf. Mis & Moore, 1973; 
Taylor, 1956). 

A final experiment was conducted to de- 
termine whether the delivery of shocks dur- 
ing the preexposure phase retarded the ac- 
quisition of a CER because of contiguous 
pairings of the bar-press response and the 
UCS. One group of rats were preexposed to 
shock with the bar present and the VI contin- 
gency in effect (on-baseline treatment), 
whereas a second group of rats were pre- 
exposed to shock with the bar absent and 
the VI contingency eliminated (off-baseline 
treatment). Both groups showed a slower 
rate of acquisition of the CER than did the 
no-preexposure controls, but the rats that 
received shocks in the off-baseline treatment 
acquired the CER faster than those that re- 
ceived the on-baseline treatment. Kamin re- 
ported that on- versus off-baseline preex- 
posure treatments did not produce differences 
in the baseline rates of responding on the VI 
schedule during excitatory conditioning. 

Kamin suggested that some central habitu- 
ation of emotional reactivity to the UCS re- 
sulted from preexposure to the UCS and 
produced retarded acquisition of the CER. 
This account is similar to the general emo- 
tional reactivity account of the preexposure 
phenomenon suggested by Taylor and by Mis 
and Moore. Note that such an explanation 
can also account for the differences between 
the on- versus off-baseline preexposure ef- 
fects. Suppose that central habituation of 
emotional reactivity is relatively specific to 
the situation in which shock is administered. 
Then one would expect more stimulus gen- 
eralization decrement of habituation, and 
hence faster excitatory conditioning, when 
the stimulus conditions are different in the 
preexposure and conditioning phases. 

Kamin's experiments extend the UCS pre- 
exposure phenomenon to a new species, rats, 
and to another widely studied Pavlovian con- 
ditioning paradigm, CER. Further, they sug- 
gest that the magnitude of the retardation of 
the acquisition of a CER is a direct function 
of the intensity of the UCS used during 
preexposure. However, Kamin included no 
group preexposed to a UCS intensity that 
exceeded the intensity used in the subse- 
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quent excitatory conditioning phase. Thus, 
it is possible that the decremental effect of 
preexposure to a UCS on the acquisition of 
a CER simply reflects similarities between 
the intensity of the preexposure UCS and the 
intensity of the UCS used during excitatory 
conditioning, rather than an effect of UCS 
intensity during preexposure per se. Such an 
account suggests that the magnitude of the 
UCS preexposure effect would also be dimin- 
ished when the UCS is much more intense 
during preexposure than during excitatory 
conditioning. Indeed, Randich (1978) ob- 
tained this outcome using electric shock as 
a UCS and the CER as a measure of excita- 
tory conditioning. Groups of rats were pre- 
exposed to 0-mA, .3-mA, .5-mA, .8-mA, or 
1.3-mA unsignaled electric shocks for 10 
days. All groups then received pairings of a 
3-min white noise CS with a .8-mA electric 
shock. All groups preexposed to electric 
shock showed retarded acquisition of the 
CER relative to the no-preexposure control 
group (0 mA), but the greatest attenuation 
of CER acquisition occurred in the group 
both preexposed and conditioned with the 
.8-mA electric shock. The groups that re- 
ceived a shift in UCS intensity between the 
preexposure and CER conditioning phases 
acquired the CER faster than did the non- 
shifted group. 

Thus, if groups of subjects are preexposed 
to a wide range of UCS intensities and all 
are conditioned at an intermediate UCS in- 
tensity, then the magnitude of the UCS pre- 
exposure effect is a direct function of the 
intensity of the preexposed UCS in both 
human eyelid (Taylor, 1956) and rabbit 
nictitating membrane (Mis & Moore, 1973) 
paradigms, but is an inverted-U-shaped func- 
tion of the intensity of the preexposed UCS 
in the CER paradigm (Randich, 1978). The 
different functions yielded by different re- 
sponse systems have important implications 
for understanding the mechanism(s) of the 
UCS preexposure phenomenon. 

For example, a context-blocking account 
of the UCS preexposure phenomenon, at least 
in the interpretation offered by the model of 
Rescorla and Wagner (1972), predicts that 
the magnitude of the UCS preexposure phe- 
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nomenon should be an increasing function 
of the intensity of the UCS used during pre- 
exposure. However, such a view would have 
difficulty explaining the inverted-U function 
obtained by Randich, unless one assumes 
that shifts in UCS intensity between the 
preexposure and conditioning phases result 
in stimulus generalization decrement of the 
associative strength of contextual cues and 
thus in a decrease in context blocking. Grant- 
ing that possibility, however, why would 
stimulus generalization decrement not occur 
in the eyelid and nictitating membrane pro- 
cedures? 

Brimer and Kamin (1963), Kremer 
(1971), and Baker (1974) have also re- 
ported similar attenuation of a CER follow- 
ing off-baseline preexposure to shocks, On! 
the other hand, Mackintosh (1973) failed to 
observe a significant preexposure effect ina 
CER procedure, although preexposed rats 0n 
the average acquired the CER slower than 
no-preexposure controls (see also, Pearce & 
Dickinson, 1975, Experiment 1). 

Rescorla (1973). preexposed one group of 
rats to 72 2-sec loud noise presentations 
(112 dB; SPL), whereas a second group of 
rats received no preexposure treatment. Both 
groups then received pairings of а 30-86 
visual stimulus with the loud noise UCS in 
a CER procedure. The group preexposed toi 
the loud noise UCS acquired conditioned 
suppression at the same rate as no-preexpo- 
sure controls, that is, there was no preex 
posure effect on acquisition of the CER. 
However, during a 3-day extinction proce: 
dure, in which the CS was presented alone)! 
the CER of the group preexposed to loud 
noise extinguished more rapidly than did the 
CER of the no-preexposure controls. It E 
difficult to explain the failure of this pre: 
exposure treatment to retard the formation 0 
a CER. Further investigations of this is 
crepancy should focus on the distal паше 
of the UCS, the nature of the UCR to E 
noise (cf. Bolles & Seelbach, 1964), and fh 
parametric effects of intensity with a 10 
noise UCS. 4) 

In another experiment, Rescorla e 
preexposed three groups of rats to à total © 
eight .5-sec shocks of .5-mA, 1,0-mA, or | | 

| 
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„mA intensity. Each preexposed shock was 
preceded by a 2-min flashing light in an 
effort to prevent fear conditioning to con- 
textual stimuli. Following a 72-hour delay in 
which the food-reinforced baseline response 
was reestablished, a 2-min tone CS was re- 
peatedly paired with a .5-sec, .5-mA electric 
shock. Rescorla found that the rate at which 
the CER was acquired was a direct function 
a of the intensity of the UCS used during the 
preexposure phase. However, a control group 
that received no preexposure treatment was 
not included in this experiment, making it 
impossible to determine whether the groups 
preexposed to a UCS showed an accelerated 
or retarded rate of acquisition of the CER. 
Randich (1978) provided a more complete 
investigation of the effects of preexposure to 
4 signaled UCS on acquisition of a CER by 
including a no-preexposure control group as 
well as a group preexposed to a weaker shock 
than the shock used during CER condition- 
ing. Groups of rats were preexposed to a total 
| of 30 .5-sec shocks of .5-mA, .8-mA, or 1.3- 
_ mA intensity, distributed over 10 days. Each 
preexposed shock was preceded by a 3-min 
light Stimulus, and a no-preexposure control 
group (0 mA) received exposure to the light 
Stimulus alone, All groups then received 
pairings of a 3-min white noise CS with the 
‘8-mA electric shock. All groups preexposed 
¿to electric shock acquired the CER at a 
Slower rate than the group preexposed to the 
light stimulus alone, and the greatest atten- 
uation of CER conditioning occurred in the 
group both preexposed and conditioned with 
the ‘8-mA electric shock. This pattern of re- 
ps includes a replication of those obtained 
Я ета (1974). Moreover, the pattern 
t Sid ts obtained with preexposure to sig- 
E shocks duplicates the pattern of re- 
E obtained by Randich (1978) with pre- 
à e to unsignaled shocks. Since one 
uld assume that signaling the preexposed 
Bus minimize conditioning of con- 
obtained wat the similar pattern of results 
Unsignal Nm Preexposure to signaled versus 
E electric shock suggests that, at 
RR um, context blocking is not the only 
B no responsible for the UCS preexposure 
E menon with a CER procedure. 
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Conditioned Taste Aversion Paradigms 
Within UCS Comparisons 


Elkins (1974) preexposed groups of rats 
to injections of the emetic cyclophosphamide 
(12.5 mg/kg) for 0, 1, 3, or 6 days over the 
course of a 2-week period. All subjects were 
then deprived of water. They were then pre- 
sented with a .1% saccharin solution for a 
minimum of 10 minutes and for 5 minutes 
after the onset of drinking. Five minutes after 
the removal of saccharin, each group was in- 
jected with cyclophosphamide (12.5 mg/kg). 
On the day following the conditioning trial, 
and for 60 days thereafter, each subject was 
given free access to both saccharin solution 
and tap water presented in separate bottles, 
that is, extinction testing. In general, Elkins 
found the initial attenuation of the aversion 
to saccharin to be a direct function of the 
number of preexposures to cyclophosphamide 
injections. Moreover, the rate at which the 
aversion to saccharin extinguished over the 
60-day period was a direct function of the 
number of preexposures to the UCS. 

In a similar design, Vogel (Note 2) showed 
that preexposure to amobarbital attenuated 
the aversion induced by pairings of a novel 
taste with amobarbital as a direct function 
of the number of preexposed UCSs (0, 1, 3, 
or 5). 

Cannon, Berman, Baker, and Atkinson 
(1975) preexposed three groups of rats to 
ethanol intubation (4 g/kg) for 1, 3, or 5 
successive days, whereas two other groups 
of rats received intubation with equivalent 
volumes of water. All of the groups preex- 
posed to ethanol and one group preexposed 
to water then received saccharin-ethanol (4 
g/kg) pairings for 3 days, while the second 
group preexposed to water received saccha- 
rin-water pairings. Conditioning trials were 
spaced 3 days apart. None of the groups 
preexposed to ethanol showed an aversion to 
saccharin following the first pairing of sac- 
charin and ethanol, that is, they drank as 
much saccharin solution as the group that 
received no preexposure treatment and a sac- 
charin-water pairing. On the other hand, the 
control group given no preexposure followed 
by a pairing of saccharin and ethanol showed 
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a strong aversion to saccharin after a single 
conditioning trial. By the third conditioning 
trial, animals that had been preexposed to 
ethanol did show an aversion to saccharin, 
but it was attenuated relative to controls. 
The magnitude of the attenuation effect was 
a direct function of the number of preexpo- 
sures to ethanol. 

Elsmore (1972) preexposed groups of tats 
to delta-9-tetrahydrocannabinol (THC; 10 
mg/kg) for 0, 1, 2, 4, or 8 days. All groups 
then received a single pairing of saccharin 
and delta-9-THC (10 mg/kg) followed by a 
two-bottle test in which they could drink 
either saccharin or water. All groups showed 
approximately the same aversion to saccha- 
rin, although there was a nonsignificant trend 
indicating that the aversion was weakest in 
the groups that received the greatest number 
of preexposures to delta-9-THC. It is im- 
portant to note that a two-bottle test is a 
good method for detecting very weak aver- 
sions, but is less effective than a one-bottle 
test in demonstrating between-groups differ- 
ences in the strength of an aversion when all 
groups have a moderate to strong aversion. 

In summary, the data on the effect of the 
number of preexposures to the UCS on sub- 
sequent conditioning of a taste aversion gen- 
erally complement the findings reported with 
human and rabbit eyelid conditioning (Hob- 
son, 1968; Mis & Moore, 1973). The data 
of Cannon et al. (1975) also suggest that re- 
peated conditioning trials may sometimes be 
required. to reveal the effect of some para- 
metric variation of a UCS preexposure treat- 
ment that is not detected following a single 
conditioning trial. Elsmore’s (1972) failure 
to obtain a preexposure effect with delta-9- 
THC may reflect either the use of a single 
conditioning trial or, as noted above, the use 
of a two-bottle test. 

The effect of variations in the interval be- 
tween preexposure to a drug UCS and sub- 
sequent conditioning of an aversion to a 
taste paired with that drug has been investi- 
gated in several experiments by Cappell and 
LeBlanc. They showed that the magnitude of 
the attenuating effect of prior exposure to 
either D-amphetamine (Cappell & LeBlanc, 
1975) or morphine (Cappell & LeBlanc, 
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1977) on subsequent taste aversion learning. 
with the same UCS decreases as the time 
interval increases between the last preexpo- 
sure to the UCS and the start of condition- 
ing (cf. Mis & Moore, 1973). Moreover, the 
dissipation of the UCS preexposure effect 
with delays to the start of conditioning was 
more pronounced with amphetamine than 
with morphine. 


Several studies have examined the effect , 


of drug dosage during preexposure on the 
magnitude of the UCS preexposure effect. 
Cannon et al. (1975) preexposed groups of 
rats to a single saline injection or an injec- 
tion of either .12-M or .36-M lithium chlo- 
ride (LiCl; an emetic). Saccharin was then 
paired with injections of saline, .12-M LiCl, 
or .36-M LiCl in a factorial design. The 
decremental effect of preexposure to LiCl on 
subsequent taste aversion learning was a 
direct function of the preconditioning dosage 
of LiCl. The magnitude of the decremental 
effect of preexposure to a given dosage 
of LiCl was an inverse function of the con- 


ditioning dosage; that is, the greater the“ 


conditioning dosage, the greater the aversion, 
given a constant preconditioning dosage. 
Thus, there was no interaction between pre- 
conditioning dosage and conditioning dosage; 
these variables appear to combine in a sim- 
ple linear fashion. The pattern of results ob- 
tained in this study is similar to that ob-, 
tained in human eyeblink and rabbit nictitat- 
ing membrane conditioning procedures but 
not in the CER procedure with rats. 

Parker, Failor, and Weidman (1973) pre- 
exposed two groups of rats to chronic mor- 
phine treatment over a 25-day period, while 
a third group of rats received no preexposure 
treatment. The dosage of morphine was grad- 
ually increased over the preexposure period, 
and reached a final level of 140 mg/kg per 
day, a large dose. A preconditioning prefer- 
ence test between sucrose-octa-acetate (SOA) 
solution and water was then administered to 
all rats during a period in which the addicted 
rats were deprived of morphine for 96 hours. 
Subsequently, one group of rats that had 
been preexposed to morphine were repeatedly 
injected with morphine after consuming 
SOA; a second group of rats preexposed to 
morphine were injected with morphine aftet 
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«no-liquid sessions; and a third group of rats 
«4 that received no preexposure treatment were 
injected with morphine after consuming SOA. 

These injections were spaced 72 hours apart 

and thus occurred when the previously ad- 

dicted rats were suffering withdrawal stress, 

The rats preexposed to morphine prior to 

pairings of SOA and morphine showed an 

attenuated aversion to SOA relative to the 
«tats that received no preexposure treatment 
y prior to SOA-morphine pairings. In addi- 
tion, rats preexposed to morphine prior to 

SOA-morphine pairings showed a preference 

for SOA when compared with rats preexposed 

to morphine and then given morphine injec- 

tions after no-liquid sessions. Perhaps even 

stronger testimony to the effect of chronic 

t pretreatment with morphine on the impact 

of SOA paired with morphine is that seven 

y rats that were not addicted to niorphine but 

were conditioned with morphine died. No 

rats addicted to morphine and conditioned 
with morphine died. 

The authors concluded that chronic pre- 
"exposure to morphine creates an unnatural 
need state for the drug. By this view, SOA- 
morphine pairings do not induce an aversion 
in rats pretreated with morphine because 

* SOA comes to signal alleviation of with- 
drawal symptoms induced by the need for 
morphine, Indeed, SOA that has been paired 
with morphine might even be expected to 
elicit a stronger approach response in rats 
pretreated with morphine than in appropriate 
controls. 

LeBlanc and Cappell (1974) attempted to 

QY test the unnatural need state hypothesis in 
two experiments. In Experiment 1, chronic 
Preexposure to either a large (200 mg/kg) 
or moderate (40 mg/kg) daily dose of mor- 
iphine eliminated the formation of an aver- 

Sion to saccharin induced by saccharin-mor- 

phine (20 mg/kg) pairings in the morning, 

“w and there was no evidence of any condition- 

ing with repeated taste-drug pairings. How- 

ever, this outcome may reflect more than a 

UCS preexposure effect because the groups 

that were preexposed to morphine received 

a supplementa] dose of morphine each after- 

noon to maintain the normal daily level of 
morphine. For example, the group preex- 
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posed to 200 mg/kg of morphine рег day and 
conditioned with 20 mg/kg of morphine per 
day received a supplement of 180 mg/kg of 
morphine in the afternoon, following a taste- 
drug pairing in the morning. The implica- 
tions of providing supplemental doses of 
morphine are unclear because little is known 
about the effects of interpolated presentations 
of the UCS between conditioning trials. How- 
ever, it is possible that since these supple- 
mental doses of morphine constituted from 
50%-90% of the animals’ normal daily mor- 
phine intake, Saccharin-morphine pairings 
signaled an event of little metabolic conse- 
quence in satisfying a need state for mor- 
phine. Alternatively, this procedure could be 
viewed within the context of contingency 
theory, according to which the administration 
of morphine in the afternoon degrades the 
positive contingency between saccharin and 
morphine. This should render the CS a less 
effective signal (cf. Rescorla, 1967), resulting 
in little conditioning. 

In a second experiment, chronic preex- 
posure to either a large (20 mg/kg) or small 
(4 mg/kg) daily dose of p-amphetamine at- 
tenuated an aversion induced by pairing sac- 
charin with p-amphetamine (1 mg/kg). The 
magnitude of the decremental effect of pre- 
exposure to D-amphetamine on the aversion 
to saccharin was larger in the high-depend- 
ence group than in the low-dependence group 
(see also, Goudie, Thornton, & Wheeler, 
1976). 

LeBlanc and Cappell argued that since 
there is no convincing evidence for a need 
state artificially induced by D-amphetamine 
withdrawal, there is no need to postulate a 
need state to account for the drug preex- 
posure effect, even for so-called drugs of 
abuse. Instead, LeBlanc and Cappell argued 
that drug tolerance, or a diminished respon- 
siveness to repeated administration of a con- 
stant dose of a drug, develops to presenta- 
tions of either morphine or D-amphetamine 
during the preexposure phase and retards the 
development of an aversion to a taste paired 
with those drugs. 

Cannon, Baker, and Berman (1977) have 
obtained evidence that is compatible with 
the claim that tolerance for the effects of a 
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drug contributes to the drug UCS preex- 
posure effect, Rats that were repeatedly pre- 
exposed to ethanol and then given six pair- 
ings of saccharin and ethanol showed no 
tendency to acquire a saccharin aversion, 
whereas nonpreexposed controls formed a 
strong aversion. On the day following the 
last conditioning trial, the rats were intu- 
bated with ethanol and 30 minutes later were 
placed on a rotarod (a test of balance). 
Saccharin consumption and time on the rota- 
rod, that is, maintenance of balance, were 
significantly positively correlated, 7(15) = 
.54, leading Cannon et al. to conclude that 
tolerance for the effects of ethanol contrib- 
uted a drug-specific component to their UCS 
preexposure effect. 

On the other hand, Vogel (Note 2) as- 
serted that drug tolerance is not a necessary 
condition of the drug preexposure effect. One 
group of rats received daily injections of 
amobarbital (80 mg/kg) for 5 days, whereas 
a second group of rats received injections of 
the vehicle. Both groups then received sweet- 
ened milk paired with amobarbital (80 mg/ 
kg). The rats preexposed to amobarbital 
showed an attenuated aversion to sweetened 
milk relative to the no-preexposure control 
group. Vogel suggested that drug tolerance 
cannot entirely explain these results because 
the duration of sleep in rats preexposed to 
amobarbital was not significantly reduced 
relative to that of rats preexposed to the 
vehicle. Reduced sleeping time is often used 
as a measure of tolerance for barbiturates. 
In addition, if the attenuated taste aversion 
in animals preexposed to amobarbital reflects 
drug tolerance alone, there should be a corre- 
lation between reduced sleeping times and 
weaker taste aversions, Vogel found that this 
correlation was only .41. Perhaps stronger 
testimony against a drug tolerance hypothesis 
is Vogel's demonstration that a single pre- 
exposure to amobarbital given 2 days prior 
to a taste-drug pairing attenuated the aver- 
sion. Vogel argued that a single preexposure 
to amobarbital is unlikely to produce drug 
tolerance, although he presented no inde- 
pendent evidence to show that drug tolerance 


did not occur. 
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The rats in Vogel's experiments were de-« 
prived of food and water prior to each amo- 
barbital injection during both the preexpo- 
sure phase and the conditioning phase. Vogel 
recognized the possibility that during pre- 
exposure an aversion may have been condi- 
tioned to drive cues associated with depriva- 
tion states and that this aversion may have 
blocked subsequent taste aversion learning. 
Vogel tested this possibility by preexposing = 
a group of rats to a total of three amobar- 
bital injections (120 mg/kg) while the ani- 
mals were maintained on ad libitum food and 
water. When tested while food deprived, so 
that any previously conditioned drive cues 
were absent, these animals still showed an 
attenuated aversion relative to no-preexpo- 
sure controls. This experiment shows that ^ 
conditioning an aversion to a drive state 
(Peck & Ader, 1974) is not a necessary con- 
dition for obtaining the UCS preexposure 
effect (cf. Braveman, 1975; Elkins, 1974). 

Several studies using the conditioned taste 
aversion (CTA) paradigm not only have ob- . 
tained a preexposure effect but also have 
found no evidence of conditioning even with 
repeated taste-drug pairings (Berman & Can- 
non, 1974; Cappell & LeBlanc, 1977; Le- 
Blanc & Cappell, 1974). Brookshire and 
Brackbill (1976), for example, preexposed 
one group of rats to apomorphine injections 
(15 mg/kg) for 10 consecutive days, whereas: 
a second group of rats received saline injec- 
tions. The group preexposed to apomorphine 
formed no aversion to saccharin paired with 
an injection of apomorphine (15 mg/kg), 
even after repeated taste-drug pairings. Simi- 
larly, Cappell and LeBlanc (1975) showed 
that rats that had received 20 prior exposures 
to p-amphetamine (7.5 mg/kg) subsequently 
failed to develop an aversion to saccharim 
that was repeatedly paired with p-ampheta- 
mine (1.0 mg/kg). On the other hand, one 
or five prior exposures to the drug retarded, 
but did not prevent, the acquisition of a taste 
aversion (cf. Goudie et al, 1976). These 
findings contrast with experiments on human 
and rabbit eyelid conditioning and the CER, 
in which repeated CS-UCS pairings admin- 
istered after preexposure to the UCS eventu- 


ally yield a level of conditioning approxi- A 
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ea *mately equal to that of no-preexposure con- 
-^ trols (Hobson, 1968; Kamin, 1961; Mis & 
Moore, 1973). 

Riley, Jacobs, and LoLordo (1976) sug- 
gested that failures to obtain any conditioning 
with repeated taste-drug pairings following 
preexposure to the UCS may be restricted 
to addictive drugs (e.g., apomorphine, etha- 
nol, and morphine) or to those drugs that 

* animals will self-administer (e.g., ampheta- 
јр mine). They demonstrated that preexposure 
to six injections of LiCl (.15 M) attenuated 
an aversion established by a few pairings of 
saccharin and LiCl (.15 M), relative to no- 
preexposure controls. However, repeated sac- 
charin-LiCl pairings eventually produced а 
substantial aversion to saccharin even in rats 
f preexposed to LiCl. In a similar experiment, 
Cannon et al. (1975) preexposed groups of 
ү rats to a single LiCl intubation (.12 M) 1, 
4, or 8 days prior to seven conditioning trials 
in which saccharin was paired with LiCl 
(.12 M). A single dose of LiCl given 1 day, 
„but not 4 or 8 days, prior to conditioning 
attenuated the aversion to saccharin estab- 
lished by a single taste-drug pairing (cf. 
Cappell & LeBlanc, 1975, 1977). However, 
a, repeated taste-drug pairings induced a sub- 
* stantial aversion to saccharin even in the 
group given exposure to LiCl 1 day prior to 
conditioning. 
* Holman (1976) preexposed groups of rats 
to eight injections of either LiCl (.15 M) 
| or saline and then conditioned an aversion 
by pairing the taste of sodium chloride 
(NaCl) with LiCl (.15 M). The group pre- 
exposed to LiCl showed a retarded acquisi- 
tion of the aversion to NaCl, but substantial 
conditioning occurred with repeated taste- 
drug pairings. 
* Thus, the results of experiments with LiCl, 
a nonaddictive drug that animals will not 
self-administer, lend some support to the con- 
"V tention of Riley et al. that failures to ob- 
tain conditioning with repeated pairings of 
tastes and amphetamine, morphine, apomor- 
phine, or ethanol following preexposure to 
these drugs reflect their addictive qualities, 
or animals’ tendency to self-administer them. 
It is unfortunate, however, that none of the 
above experiments performed a direct com- 
~ 
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parison between addictive and nonaddictive 
drugs. 

Several experiments have evaluated the 
effect of preexposure to signaled versus un- 
signaled drug UCSs on the subsequent forma- 
tion of a taste aversion based on these drug 
UCSs. Cannon et al. (1975) showed that a 
single, signaled preexposure to LiCl does re- 
tard the acquisition of an aversion, although 
the decremental effect is smaller than that 
produced by a single, unsignaled preexposure 
to LiCl. In this experiment, one group of 
rats received a single sustacal-LiCl (.12 M) 
pairing, a second group of rats received a 
water-LiCl (.12 M) pairing, and a third 
group of rats received a sustacal-saline pair- 
ing. All groups then received a saccharin- 
ТАСТ (.12 M) pairing on the following day 
and on 3 additional days. Signaled and un- 
signaled preexposure to LiCl equally atten- 
uated the aversion induced by saccharin- 
ТАСТ pairings on the first conditioning trial. 
However, repeated saccharin-LiCl pairings 
produced a stronger aversion in the signaled 
preexposure group than in the unsignaled 
preexposure group. 

Mikulka, Leard, and Klein (1977) found 
an even more striking difference between the 
effects of signaled and unsignaled preexpo- 
sure conditions. They observed that unsig- 
naled preexposure to LiCl retarded subse- 
quent conditioning even when signaled pre- 
exposure had no such effect. Saline-preex- 
posed controls and a group that received 
four prior pairings of sucrose and LiCl ac- 
quired an aversion to almond-flavored water 
paired with LiCl equally rapidly, whereas a 
group that received four unsignaled expo- 
sures to LiC] showed a somewhat attenuated 
aversion to almond-flavored water. 

Not all studies have obtained differences 
between signaled and unsignaled preexposure 
conditions. Goudie, Thornton, and Marsh 
(Note 3) found that unsignaled preexposure 
to methamphetamine and preexposures sig- 
naled by access to a saline solution had equal 
attenuating effects on subsequent condition- 
ing of an aversion to saccharin paired with 
methamphetamine. Zellner and Riley (Note 
4) obtained similar results with the stimulant 
drug methylphenidate. Goudie et al.’s failure 
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to obtain a difference between signaled and 
unsignaled preexposure conditions led them 
to consider the importance of the relation- 
ship between responding and the UCS dur- 
ing the preexposure phase. They suggested 
that during the preexposure phase, rats learn 
that the occurrence of the UCS is indepen- 
dent of their behavior. This learning, called 
"learned helplessness” by Maier and Selig- 
man (1976), interferes with subsequent 
learning to withhold drinking in the presence 
of a taste cue that reliably precedes some 
UCS. Goudie et al. tested this notion by 
allowing groups of rats ad libitum access to 
only water or only methamphetamine during 
the preexposure phase. When rats were given 
some control over the preexposed UCS, that 
is, when the presentation of the UCS de- 
pended on responding, the attenuating effect 
of preexposure to the UCS was abolished 
(cf. Deutsch & Eisner, 1977). Recently, how- 
ever, Randich (1978) reported that rats per- 
mitted to escape electric shock presentation 
during a preexposure phase and thereby to 
control the UCS showed attenuation of subse- 
quent CER conditioning, even greater atten- 
uation than yoked controls that could not 
escape electric shock during the preexposure 
phase. 

Rudy, Iwens, and Best (1977) performed 
a series of experiments that showed that 
pairing a novel, exteroceptive background 
stimulus with poison during preexposure at- 
tenuates the development of a CTA regard- 
less of whether the background stimulus is 
present or absent when excitatory condition- 
ing is performed. 

In Experiment 1, three groups of rats were 
preexposed to injections of LiCl (.15 M) for 
4 days. Two of these groups received LiCl 
injections in the presence of a novel extero- 
ceptive stimulus (a black chamber). The rat 
was first placed in the black chamber for 5 
minutes, was then removed and injected with 
LiCl, and was finally replaced in the black 
chamber for an additional 25 minutes. A 
third group of rats received LiCl injections 
in their home cages. A fourth group of rats 
were simply placed in the black chamber 
and not injected with LiCl. Following the 
preexposure phase, one of the groups pre- 
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exposed to poison in the black chamber re- 

ceived a single pairing of a saccharin infusion 

and a LiCl injection in the home cage. The 

other three groups received a single pairing 

of a saccharin infusion and a LiCl injection 

in the black chamber and were then removed 

to the home cage. A two-bottle test between 

saccharin and water in the home cage re- 

vealed that animals preexposed to LiCl in 

the home cage and conditioned in the black” 
chamber did not significantly differ from the 

group that received no preexposure treat- 

ment; that is, the normal UCS preexposure 

effect was not obtained. However, preexpo- 

sure to poison in the black chamber did at- 

tenuate the aversion to saccharin whether 

the saccharin-LiC] pairing occurred in ће, 
home cage or in the black chamber. 

In a second experiment, Rudy, Iwens, and 
Best ruled out the possibility that the group 
preexposed to LiCl in the home cage and 
conditioned in the black chamber failed to 
show a preexposure effect because the novel 
stimulation produced by placement in the, 
black chamber during conditioning in some 
way potentiated the effect of LiCl. 

A third experiment showed that the 5-min- 
ute preinjection placement, rather than the 
25-minute postinjection placement, in the 
black chamber during preexposure was re- 
sponsible for the attenuation effect obtained 
in Experiments 1 and 2. ý 

In a final experiment, two groups of rats 
received the preexposure to LiCl in the 
presence of either the novel black chamber 
or a novel light stimulus. A third group of 
rats received the preexposure treatment with 
ТАСТ in the presence of the black chamber 
after being familiarized with the black cham- 
ber by four prior placements in it. A fourth 
group of rats received LiCl injections in thé 
home cage. All groups then received a single 
saccharin infusion paired with LiCl in the 
home cage. The results indicated that only 
the groups preexposed to LiCl in the presence 
of the unfamiliar, novel stimuli, that is, the 
black chamber or the light stimulus, showed 
the attenuation effect. 

Rudy, Iwens, and Best found that rats pre- 
exposed to LiCl in the presence of a novel 
exteroceptive stimulus subsequently displayed 
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'an attenuated taste aversion relative either 
to rats that received no prexposure treatment 
or to rats preexposed to LiCl in a familiar 
environment. They concluded that this at- 
tenuation effect was not a result of blocking 
by prior conditioning of the novel exterocep- 
tive stimulus during preexposure to LiCl, be- 
cause the attenuation occurred in groups 
preexposed to LiCl in the presence of a novel 
' stimulus whether that stimulus was present 
Y ог absent during subsequent conditioning of 
the aversion. If the novel exteroceptive stimu- 
lation was not present during saccharin-LiCl 
pairings, it could not have blocked condition- 
ing to saccharin. 

Instead, Rudy, Iwens, and Best (1977) 
У argued that the handling and infusion pro- 
cedure acquired associative strength during 
№ preexposure to LiCl and attenuated subse- 
quent taste aversion learning. This explana- 
tion demands that the groups preexposed to 
LiCl in the presence of the novel exterocep- 
*tive stimulus more strongly associated the 
handling cues with poison than the groups 
preexposed to LiCl in the home cages, which 
were handled in the same way. For this to 
КЕ occur, Rudy, Iwens, and Best first assumed 
^ that handling cues were latently inhibited for 
all groups during the initial stages of the ex- 
*«periment, that repeated occurrence of the 
handling and infusion procedure prior to the 
preexposure phase rendered these cues less 
associable with UCSs. Placement in the novel 
chamber disrupted the latent inhibition such 
that the handling cues that would not other- 
wise have been associated with poison did ac- 
quire the associative strength necessary to at- 

,tenuate the formation of a taste aversion. 
In another series of experiments, Rudy, 
Rosenberg, and Sandell (1977) obtained evi- 
™ dence that supports this sort of account. 
Presentation of a novel exteroceptive cue 
just prior to the pairing of a familiar taste 
with LiCl injection enhanced the condition- 
ing of aversion to the taste, an outcome 
formally similar to that postulated by Rudy, 
Iwens, and Best for familiar handling (rather 


est ee taste) cues. 
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Gamzu (Note 5) preexposed two groups of 
rats to either p-amphetamine (2 mg/kg) or 
chlordiazepoxide (15 mg/kg) and two groups 
of rats to saline. The two drug-preexposed 
groups and one saline-preexposed group then 
received pairings of saccharin and p-ampheta- 
mine, while the other saline-preexposed group 
received pairings of saccharin and saline. 
The group preexposed to D-amphetamine 
showed an attenuated aversion to saccharin 
when compared with the group preexposed 
to saline and conditioned with p-ampheta- 
mine. However, the group preexposed to 
chlordiazepoxide and conditioned with р-ат- 
phetamine showed an aversion comparable 
with that of the no-preexposure controls. 
Thus, preexposure to D-amphetamine attenu- 
ates a conditioned aversion when this drug is 
used as the UCS, but preexposure to chlordi- 
azepoxide does not affect the aversion induced 
by p-amphetamine. 

Goudie and Thornton (1975) preexposed 
two groups of rats to either D-amphetamine 
(2 mg/kg) or di-fenfluramine (6 mg/kg) for 
9 consecutive days. Five days elapsed be- 
tween the last UCS preexposure and the start 
of excitatory conditioning. During condi- 
tioning, half the rats in each group received 
pairings of saccharin and p-amphetamine, 
and the other half received pairings of sac- 
charin and dl-fenfluramine. Preexposure to D- 
amphetamine attenuated the aversion induced 
by pairings of saccharin and p-amphetamine 
relative to the appropriate controls, but did 
not attenuate the aversion induced by pair- 
ings of saccharin and dl-fenfluramine. How- 
ever, preexposure to di-fenfluramine attenu- 
ated the aversion induced by either pairings 
of saccharin and di-fenfluramine or pairings 
of saccharin and p-amphetamine. 

Vogel (Note 2) preexposed groups of rats 
to either D-amphetamine (2 mg/kg), amo- 
barbital (120 mg/kg), or the vehicle for 3 
days. Then all groups were given taste-drug 
pairings in a factorial design. Preliminary 
work indicated that these dosages of р-ат- 
phetamine and amobarbital produced com- 
parable aversions. Vogel found that preex- 
posure to D-amphetamine did not attenuate 
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the aversion induced by pairing a taste and 
D-amphetamine, a result that runs counter to 
the results of Gamzu (Note 5) and Goudie 
and Thornton (1975). However, preexposure 
to p-amphetamine did attenuate the aver- 
sion induced by taste-amobarbital pairings. 
Furthermore, preexposure to amobarbital at- 
tenuated the aversion induced by taste-amo- 
barbital pairings, but did not attenuate the 
aversion induced by taste-p-amphetamine 
pairings. 

Cappell, LeBlanc, and Herling (1975) 
showed that chronic preexposure to р-ат- 
phetamine (20 mg/kg) administered over a 
20-day period attenuated the aversion in- 
duced by pairing saccharin and morphine (6 
mg/kg). Further, there was no evidence of 
conditioning even with repeated taste-drug 
pairings. However, preexposure to morphine 
facilitated an aversion induced by pairing 
saccharin and p-amphetamine (1 mg/kg). In 
a third experiment, chronic preexposure to 
chlordiazepoxide (25 mg/kg) for 22 days had 
no effect on the aversion induced either by 
pairing saccharin and p-amphetamine (1 mg/ 
kg), a result consistent with that reported by 
Gamzu (Note 5), or by pairing saccharin 
and morphine (6 mg/kg). However, pre- 
treatment with chlordiazepoxide did attenu- 
ate an aversion induced by pairing saccharin 
and chlordiazepoxide (cf. Gamzu, 1977). 

Whaley, Scarborough, and Reichard (1966) 
preexposed two groups of rats to 1,000 rota- 
tions in a tumbling apparatus, whereas two 
other groups of rats were simply placed in 
the tumbling apparatus without being ro- 
tated. Twenty-four hours later, half the rats 
in each of these groups were irradiated with 
X rays (6 r./min.) for 10 minutes, while the 
other half in each group were sham irradi- 
ated. Immediately following this treatment, 
all rats received free access to tap water and 
saccharin for 20 minutes, so that saccharin 
consumption preceded some of the delayed 
effects of x-irradiation for rats in the experi- 
mental groups. A subsequent test found that 
the experimental group that had been tum- 
bled showed an attenuated saccharin aver- 
sion relative to the irradiated group that had 
not been tumbled. i 

Braveman (1975) examined cross-UCS ef- 
fects by preexposing groups of rats to injec- 
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tions of scopolamine methyl nitrate (1 mg/ 
kg) for 0, 1, 3, 5, or 7 days and then giving 
them a single pairing of saccharin and LiCl 
(3 M). Groups that were preexposed to 
scopolamine five or seven times consumed 
reliably more saccharin than the no-preex- 
posure control group in a test following the 
saccharin-LiCl pairing. The amount of sac- 
charin consumed in the test was a direct 
function of the number of preexposures to 
scopolamine. In a later study, Braveman 
(1977) also observed that preexposure to 
LiCl attenuated the magnitude of an aver- 
sion to a taste paired with scopolamine 
methyl nitrate. 

Braveman (1975) considered the possibility 


that these drugs acted on the same physio- ` 


logical substrate and did not truly represent 
a cross-UCS effect. He attempted to eliminate 
this possibility by preexposing rats to either 
D-amphetamine (2 mg/kg) or scopolamine 
methyl nitrate (1 mg/kg), two drugs that 
induce conditioned taste aversions via dif- 
ferent physiological substrates according to 
the ablation data of Berger, Wise, and Stein 
(1973). Rats that were preexposed to р- 
amphetamine then received a pairing of sac- 
charin and scopolamine methyl nitrate, 
whereas rats that were preexposed to scopol- 
amine methyl nitrate received a pairing of 


saccharin and p-amphetamine. In general, га 


groups that were preexposed to а drug 
showed attenuated saccharin aversions rela- 
tive to nonpreexposed controls. On the basis 
of this cross-UCS effect, Braveman concluded 
that drug preexposure did not modify the 
physiological mechanism that underlies con- 
ditioned food aversions. у 
Braveman also considered the possibility 
that cross-drug tolerance effects produced 
these results. He sought to eliminate this" 
possibility by using mechanical rotation, 
rather than a drug, as a UCS. Groups of rats 
were either preexposed to rotation at 60 rpm 
for 15 minutes/day or injected with (a) 
scopolamine methyl nitrate, (b) p-ampheta- 
mine, or (c) LiCl for 5 days. All groups then 
received repeated pairings of saccharin and 
mechanical rotation. All of the groups pre- 
exposed to a UCS failed to show any aversion 
to saccharin, that is, they drank as much as 
controls that received neither preexposure nof 
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*conditioning with the UCS. In a later study 
using a factorial design, Braveman (1977) 

preexposed rats to either LiCl injections or a 

severe electric shock treatment and a week 

later presented them with a taste followed 

by either LiCl or electric shock. A symmetri- 

cal cross-UCS preexposure effect was ob- 

tained. Braveman concluded that preexposure 

to any aversive UCS results in diminution 

* of a general stress response to aversive stim- 
uli, thereby reducing their impact during sub- 

sequent conditioning. Braveman's hypothesis 

is very similar to the emotional reactivity 

explanation of the preexposure effect sug- 

gested by other investigators (Kamin, 1961; 

Mis & Moore, 1973; Taylor, 1956). 
Cannon et al. (1977) repeatedly preex- 


| posed rats to LiCl, ethanol, or NaCl and then 


presented half the rats in each group with 
pairings of saccharin and LiCl, whereas the 
others received saccharin followed by etha- 
nol. Relative to saline-preexposed controls, 
rats that had been preexposed to one drug 
,and then conditioned with another showed 
a strong UCS preexposure effect. However, 
the cross-UCS preexposure effect was signifi- 
cantly smaller than the within-UCS effect, 
leading Cannon et al. to conclude that in 
addition to some general mechanism that 
mediates cross-UCS preexposure effects, drug- 
specific components also contribute to the 
effect when the same drug is used during 
both preexposure and conditioning. As noted 
earlier, Cannon et al. suggested that toler- 
ance contributed to the drug-specific effect, 
at least in the case of ethanol. 


Summary of Studies Assessing 


ss Preconditioning Exposure to a UCS 


The results of studies assessing the effect 
of preconditioning exposure to the UCS on 
human and rabbit eyelid conditioning, CER 
conditioning, and conditioning of taste aver- 
sions are briefly summarized below. The 
decremental effect of preconditioning expo- 
sure to the UCS on the formation of an ex- 
citatory CR is 

1. In the human eyeblink, rabbit nictitat- 
ing membrane, and taste aversion procedures 
E direct function of the intensity or concen- 
tration of ап unsignaled preexposed UCS 
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(Cannon et al, 1975; Mis & Moore, 1973; 
Taylor, 1956) and an inverse function of 
the intensity or concentration of the UCS 
used in excitatory conditioning (Cannon et 
al., 1975). 

2. In the CER procedure, an inverted-U- 
shaped function of the intensity of an un- 
signaled (or signaled) preexposed UCS 
(Randich, 1978). 

3. A direct function of the number of pre- 
exposed UCSs (Cannon et al., 1975; Elkins, 
1974; Goudie et al., 1976; Hobson, 1968; 
Mis & Moore, 1973; Randich, 1978; Vogel, 
Note 2), even when cross-UCS comparisons 
are made (Braverman, 1975). 

4. An inverse function of the time interval 
between the last preexposure to a UCS and 
the start of excitatory conditioning (Cannon 
et al., 1975; Cappell & LeBlanc, 1975, 1977; 
Mikulka et al., 1977; Mis & Moore, 1973). 

5. Sometimes reduced if the preexposed 
UCS is signaled (Cannon et al., 1975; Mi- 
kulka et al., 1977; see also, Revusky, Parker, 
Coombes, & Coombes, 1976), although Gou- 
die et al. (Note 3) and Zellner and Riley 
(Note 4) found no differences between sig- 
naled and unsignaled preexposure to the 
UCS on subsequent taste aversion learning 
(cf. Randich, 1978). 

6. Not overcome by repeated CS-UCS 
pairings when some addictive drugs and am- 
phetamine are used (Berman & Cannon, 
1974; Brookshire & Brackbill, 1976; Cap- 
pell & LeBlanc, 1977; Cappell et al., 1975; 
LeBlanc & Cappell, 1974), but is overcome 
when nonaddictive drugs are used (Cannon 
et al, 1975; Holman, 1976; Riley et al, 
1976). 

7. Often independent of the type of aver- 
sive UCS used in the preexposure and condi- 
tioning phases (Braveman, 1975, 1977; Cap- 
pell et al., 1975; Goudie & Thornton, 1975; 
Whaley et al., 1966; Vogel, Note 2; Gamzu, 
Note 5). 


Theoretical Considerations 


In general, the studies examined in this 
article show that preconditioning exposure 
to aversive UCSs retards the formation of 
an excitatory CR. Further, this effect often 
occurs when the UCS used during the pre- 
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exposure phase is different from the UCS 
used during excitatory conditioning (cf. 
Braveman, 1975). The basis of this phe- 
nomenon remains an enigma, although both 
associative and nonassociative theories have 
been advanced. Many of these theories are 
of limited scope and confine their predictive 
value to particular situations, for example, 
those in which the UCS is an addictive drug 
(cf. Parker et al., 1973). This is not to say 
that such theories are incorrect; rather, the 
generality of the preexposure phenomenon 
across Pavlovian conditioning paradigms, 
Species, and aversive UCSs invites a more 
general interpretation. It is toward this end 
that theories of the UCS preexposure effect 
are evaluated in the following sections. 


Associative Theories 


Blocking. As was noted earlier, a block- 
ing interpretation of the UCS preexposure 
phenomenon posits that some stimulus aspect 
(X) of the experimental situation acquires 
associative strength during preexposure to an 
aversive UCS. Conditioning of Stimulus X 
reduces the amount of associative strength 
that a nominal stimulus (A) can acquire 
during subsequent excitatory conditioning in 
that same environment (AX). As a result, 
the rate of acquisition of an excitatory CR to 
Stimulus A is attenuated. 

There are three critical assumptions of a 
blocking interpretation of the UCS preex- 
posure effect. First, Stimulus X can be any 
aspect of the experimental. situation. For 
instance, Stimulus X may be static cues pro- 
vided by the characteristics of the experi- 
mental environment, cues provided by the 
handling procedure, or cues provided by the 
injection procedure in CTA experiments. 
Second, Stimulus X must be present during 
both the preexposure and the excitatory con- 
ditioning phases of an experiment for block- 
ing to occur. Third, the presence of the pre- 
viously conditioned Stimulus X interferes 
with conditioning of Stimulus A because the 
UCS can support only a limited amount of 
associative strength (Rescorla & Wagner, 
1972) or because there is little conditioning 
to redundant predictors of reinforcement 
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(Mackintosh, 1975). There are several pre- 


dictions that can be derived from a blocking 
account of the UCS preexposure phenome- 
non: 

1. Blocking predicts retarded acquisition 
of an excitatory CR to Stimulus A if condi- 
tioning of Stimulus X occurs during pre- 
exposure to the UCS and Stimulus X is pres- 
ent during excitatory conditioning of Stimu- 


lus A. Virtually all the studies evaluated in ' 


this article obtained an attenuation of ex- 
citatory conditioning following prior exposure 
to the UCS, These studies typically included 
some stimulus (X) during both the preex- 
posure and the conditioning phases. This 
stimulus could block conditioning of Stimulus 
A. The questions at hand are whether such 
stimuli (X) are in fact conditionable and 
whether such conditioning is capable of block- 
ing conditioning of a discrete Stimulus A. 
Many of the aforementioned stimuli that 
can be identified as Stimulus X are condi- 
tionable. Tomie (1976) demonstrated that 
prior presentations of free food attenuate the 
rate of acquisition of an autoshaped key- 
peck response. Tomie's data suggest that con- 
ditioning of contextual stimuli during free- 
food presentations: blocks conditioning of a 
nominal CS subsequently presented in the 
same context, Similarly, Willner (1978) dem- 
onstrated that rats preexposed to injections 
of LiCl in a distinctive environment develop 
a place aversion and that this conditioning 
attenuates the formation of an aversion to 
saccharin that is later paired with LiCl in 
the same place. In two separate experiments 
using D-amphetamine and LiCl as UCSs, 
Braveman (Note 6) preexposed and condi- 
tioned rats in either the same or different 


environments; the environments were distin- 


guished by the level of auditory-visual stim- 
ulation. Braveman obtained a UCS preex- 
posure effect only when preexposure and con- 
ditioning occurred in the same environment, 
supporting an associative explanation of the 
phenomenon. 

However, it is not clear whether condi- 
tioning of contextual cues occurs in other 
preparations, such as in human and rabbit 
eyelid conditioning. Mis and Moore (1973) 
argued against a context-blocking interpre» 
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„tation of their rabbit eyelid conditioning 
study, citing the lack of evidence that the 
nictitating membrane response in rabbits can 
be conditioned to contextual stimuli. Their 
claim may not bear directly on a general 
blocking view of the UCS preexposure phe- 
nomenon. Preexposure to a UCS may neither 
alter the specific UCR that an experimenter 
measures nor condition that UCR to context- 
ual cues, but may instead condition a fear 
response elicited by the UCS to contextual 
cues. If this fear response is important for 
excitatory conditioning of the nictitating 
membrane response (cf. Konorski, 1967), 
then a blocking interpretation may still be 
useful. In fact, Hinson and Siegel (Note 7) 
have recently presented evidence in support 
of a context-blocking interpretation of the 
UCS preexposure effect in nictitating mem- 
brane conditioning. 

Researchers have attempted to reduce the 
likelihood of context blocking by providing 
a nominal signal (CS) for the UCS during 
the preexposure phase. Infofar as condition- 
ing occurs to the salient, nominal CS it will 
not occur to less salient contextual cues (cf. 
Kamin, 1969). Hence, no blocking should 
occur when a novel CS is subsequently paired 
with the UCS in the same context. Cannon 
et al. (1975) and Mikulka et al. (1977) 
showed that signaled preexposure either at- 


^K, tenuates or eliminates completely the re- 


tardation of excitatory conditioning that is 
typically obtained with unsignaled preexpo- 
sure conditions. These results are compatible 
with a blocking interpretation because con- 
textual stimuli should acquire little or no 
associative strength during preexposure to 
signaled UCSs. However, Goudie et al. (Note 
3) and Zellner and Riley (Note 4) reported 
that rats given signaled or unsignaled pre- 
exposure to stimulants showed equal attenua- 
tion of the acquisition of a taste aversion, 
relative to controls. Randich (1978) also re- 
ported that rats given signaled preexposure 
to an electric shock UCS showed greatly re- 
tarded acquisition of a CER, relative to ap- 
propriate controls. 

A second source of stimulation that could 
potentially block conditioning of a nominal 
CS is the handling procedure. Rudy, Iwens, 


м 
4 


539 


and Best (1977) specifically attempted to 
condition contextual cues by preexposing 
their rats to illness in the presence of a novel, 
exteroceptive stimulus, namely, a black cham- 
ber. This preexposure treatment attenuated 
the formation of a CTA to a greater extent 
than did preexposure to illness in the home 
cage. However, the effect of preexposure to 
the UCS in a novel environment occurred 
regardless of whether the novel exteroceptive 
stimulus was present or absent when taste- 
drug pairings were administered. This result 
seems incompatible with a blocking interpre- 
tation, which asserts that Stimulus X must 
be present during excitatory conditioning to 
block acquisition of conditioning to Stimulus 
A. Rudy, Iwens, and Best performed subse- 
quent experiments that forced them to con- 
clude that blocking was indeed responsible 
for their results, although not as a result of 
associative strength acquired by the novel, 
exteroceptive stimulus. They argued that the 
novel exteroceptive stimulus acted only as a 
disinhibitor of other latently inhibited back- 
ground cues (the handling cues in particu- 
lar), thus allowing these cues to acquire as- 
sociative strength. Since the handling cues 
were present during pairings of Stimulus A 
with the UCS, they blocked conditioning of 
an aversion to Stimulus A. 

Braveman (1978) reasoned that if condi- 
tioning to handling cues during drug pre- 
exposure were responsible for the UCS pre- 
exposure effect, then manipulations that 
should attenuate conditioning to handling 
cues should also attenuate the UCS pre- 
exposure effect. He varied the amount of 
handling received by groups of rats prior to 
preexposure to LiCl and found no evidence 
that this variable influenced the UCS pre- 
exposure effect, which was strong even in rats 
that had been handled for 21 days prior to 
preexposure, However, Braveman did not ad- 
minister any saline injections prior to drug 
preexposure, leaving open the possibility that 
conditioning to injection cues was respon- 
sible for the UCS preexposure effect. Recent 
evidence (Willner, 1978; Poulos & Cappell, 
Note 8) indicates that cues associated with 
the injection of a drug can acquire associa- 
tive strength and are capable of blocking 
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conditioning of an aversion to a distinctive 
taste. This conclusion is based on the finding 
that the normal UCS preexposure effect ob- 
tained by prior injections of a drug can be 
reduced by degrading the correlation between 
injection cues and the drug during the pre- 
exposure phase. 

2. Blocking predicts that the effect of pre- 
exposure to the UCS should dissipate with 
the time to the start of excitatory condition- 
ing, as long as the animal is maintained in the 
presence of Stimulus X during this period. 
This prediction is based on the view that the 
conditioned response to Stimulus X will ex- 
tinguish during a delay period in which it is 
not reinforced. If the CR to Stimulus X is 
near zero at the start of excitatory condi- 
tioning, then blocking should not occur. 

Mis and Moore (1973), Cappell and Le- 
Blanc (1975, 1977), and Cannon et al. 
(1975) have shown that the magnitude of 
the UCS preexposure effect decreases as the 
time interval increases between the last pre- 
exposure to a UCS and the start of excitatory 
conditioning. In the CTA studies, the animals 
spent the delay period in the preexposure 
environment, that is, in the presence of Stim- 
ulus X. It is unclear whether this outcome 
reflects extinction of Stimulus X or some 
effect that is independent of context, since 
the experiments included no conditions in 
which the animals spent the delay interval in 
the absence of Stimulus X. A critical test of 
this prediction would involve removing half 
the preexposed animals from the preexposure 
environment. The group removed from the 
preexposure environment during the delay 
period should show a Breater attenuation of 
excitatory conditioning than the group main- 
tained in the preexposure environment, 

In a procedure that is formally similar to 
that just described, Batson and Best (Note 
9) preexposed rats to injections of LiCl after 
they were placed in a distinctive black box, 
Then over the next 8 days, half the pre- 
exposed rats received 16 trials in which they 
were placed in the black box and then in- 
jected with physiological saline. These trials 
should have extinguished any conditioned re- 
sponse to the black box and associated cues. 
The other rats remained in their home cages 
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during this period. Then both groups re- 
ceived a single saccharin-LiCl pairing follow- 
ing placement in the black box. Finally, both 
groups and the nonpreexposed controls were 
permitted to drink saccharin in the home 
cage. Rats that had received extinction trials 
were as averse to saccharin as were nonpre- 
exposed controls, but rats that had spent the 
interval between preexposure and condition- 
ing in the home cage showed a strong UCS 
preexposure effect. Batson and Best con- 
cluded that associative blocking formed the 
basis of their UCS preexposure effect (cf. 
Hinson & Siegel, Note 7, for a similar ma- 
nipulation and outcome). 

3. Blocking predicts that an animal pre- 
exposed to the UCS should eventually attain 
the same level of excitatory conditioning as 
an animal not preexposed to the UCS, al- 
though this level of conditioning should be 
attained at a slower rate. This should occur 
because the UCS is presented only in the 
presence of the nominal CS during excitatory 
conditioning and never in its absence. Ac- 


cording to the Rescorla-Wagner (1972) * 


model, this should result in more nonrein- 
forced than reinforced presentations of Stim- 
ulus X, and the associative strength of Stim- 
ulus X should decline to zero. The nominal 
Stimulus A will acquire as much associative 
strength as Stimulus X loses. Thus, for an 


animal preexposed to the UCS, Stimulus Ах 


is permitted to acquire as much associative 
Strength as the Stimulus A for an animal not 
preexposed to the UCS. Mackintosh's (1975) 
model makes the same prediction, asserting 
that Stimulus X should lose associative 
Strength because it is a poorer predictor of 
the UCS than is Stimulus A. Thus, the sali- 
ence of Stimulus A will increase, and Stimu- 
lus A will gain associative Strength. Pre- 
exposed and control groups have attained 
the same level of conditioning in several 
studies that continued to present CS-UCS 
pairings during excitatory conditioning (Hol- 
man, 1976; Kremer, 1971; Mis & Moore, 
1973). However, a few studies failed to ob- 
tain any conditioning following preexposure 
to a drug, even with repeated CS-UCS pair- 
ings (Berman & Cannon, 1974; Braveman, 
1975; Brookshire & Brackbill, 1976; Cap- 
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pell & LeBlanc, 1977; Cappell et al., 1975). 
These are notable exceptions, however, be- 
cause all but Braveman's study involved the 
use of UCSs that an animal will self-admin- 
ister, namely, D-amphetamine, morphine, and 
ethanol. 

UCS controllability. Vogel (Note 2) and 
Goudie et al. (Note 3) proposed that an 
organism that is preexposed to an aversive 

«UCS learns that the aversive state is un- 
у controllable. This learning produces ап asso- 
ciative deficit that transfers to the excitatory 
conditioning phase and interferes with the 
learning of the relationship between the CS 
and the UCS. This hypothesis is an extension 
of the learned helplessness notion (Maier & 
Seligman, 1976), which essentially argues 
that the organism perceives the occurrence of 
the UCS during the preexposure phase as un- 
¥ correlated with anything it does or attempts 
to do. Thus, it is possible that the associative 
deficit produced by learning that either stim- 
71 uli or responses are ineffective in predicting 
or controlling the UCS interferes with the 
formation of an association between the stim- 
ulus and the reinforcer during excitatory con- 
ditioning. 

Data showing that signaled preexposure to 
је a UCS reduces the preexposure effect (Can- 
non et al, 1975) could be interpreted to 
support this contention. This, of course, as- 
^K, sumes that an organism can learn some form 
of preparatory response during signaled pre- 
exposure to a UCS that minimizes and there- 

by controls the impact of the UCS. 
On the other hand, Goudie et al. (Note 
Ww. 3) found equal attenuating effects of sig- 
naled and unsignaled preexposure to meth- 
amphetamine on the formation of a subse- 
quent taste aversion. They suggested that 
«since the UCS is uncontrollable in both un- 
signaled and signaled preexposure conditions, 
controllability of the UCS may be important. 
Subsequently these authors demonstrated 
that rats allowed ad libitum access to meth- 
amphetamine during the preexposure phase 
do not show attenuated saccharin aversions 
when saccharin is paired with methampheta- 
mine injections. These data are compatible 
with the hypothesis that controllability of 
UCS onset is the important determinant of 
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the UCS ргеехроѕше effect. A test of this 
hypothesis can only be accomplished by using 
drugs that an animal will self-administer, for 
example, methamphetamine. The results of 
such a test may be applicable only to ex- 
periments that use drugs of abuse. On the 
other hand, a related hypothesis that focuses 
on the controllability of UCS termination 
can be tested with a variety of aversive 
UCSs. In this regard, Randich (1978) found 
that rats permitted to terminate electric 
shocks during a preexposure phase showed 
even greater attenuation of subsequent CER 
conditioning than yoked controls not per- 
mitted to terminate electric shocks during 
the preexposure phase. Thus, giving an ani- 
mal control over the termination of an aver- 
sive UCS does not eliminate the UCS pre- 
exposure phenomenon. 

The UCS onset-controllability hypothesis 
would account for symmetrical cross-UCS 
effects (cf. Goudie & Thornton, 1975) if one 
assumed that the aversive states induced by 
drugs were similar in being equally uncon- 
trollable. Of course, given this assumption, 
asymmetrical cross-UCS effects pose a prob- 
lem, although they primarily involve the use 
of p-amphetamine, a drug that has both posi- 
tive and negative reinforcing characteristics 
(Wise, Yokel, & DeWit, 1976). 


Nonassociative Theories 


Artificial need states. Parker et al. (1973) 
suggested that preexposure to addictive drugs 
may induce an artificial need state for these 
drugs. The failure to obtain a CTA would 
then reflect the fact that the CS predicts 
alleviation of withdrawal symptoms corre- 
lated with the need state. This hypothesis is 
of limited general applicability. In addition, 
LeBlanc and Cappell (1974) have shown 
that preexposure to amphetamine attenuates 
the formation of a CTA but does not induce 
a need state for amphetamine. 

UCS novelty. Amit and Baum (1970) 
and Gamzu (1977; Gamzu, Note 5) sug- 
gested that the novelty of the UCS is an 
important determinant of associative learn- 
ing. Any treatment, such as preexposure to 
the UCS, that reduces the novelty of the UCS 
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should retard the acquisition of an excita- 
tory CR. The usefulness of this hypothesis is 
directly questioned by both symmetrical and 
asymmetrical cross-UCS effects. If one as- 
sumes that this hypothesis can be restated 
as "the novelty of the aversive state is im- 
portant for associative learning," then it may 
be possible to account for symmetrical cross- 
UCS effects in terms of a common aversive 
reaction to the UCSs. However, in doing so 
it becomes difficult to account for asymmetri- 
cal cross-UCS effects. In this regard, Gamzu 
(1977) argued that asymmetrical cross-UCS 
effects may reflect quantitative rather than 
qualitative effects of the two drugs involved. 
For example, he argued that the dose of dl- 
fenfluramine used by Goudie and Thornton 
(1975) to attenuate conditioning of a taste 
aversion to d/-fenfluramine was more aversive 
than the dose of p-amphetamine (see also, 
Braveman, 1977; Cannon et al., 1977). 

Although such an argument has some force, 
the main problem with the UCS-novelty ac- 
count is that it has not been specified in 
enough detail to be easily tested, 

Central habituation. Perhaps the most 
widely accepted nonassociative explanation 
of the UCS preexposure phenomenon is that 
some central habituation process occurs in 
response to repeated applications of the UCS 
during the preexposure phase and reduces the 
organism’s responsiveness to subsequent ap- 
plications of the UCS (Kamin, 1961; Mis & 
Moore, 1973; Taylor, 1956). This hypothe- 
Sis has been stated in a variety of forms. For 
example, Braveman (1975) stated that an 
organism may habituate to the stress induced 
by the UCS. Similarly, it has been suggested 
that an organism may develop a tolerance for 
the UCS when drugs are used as UCSs (Cap- 
pell et al., 1975; LeBlanc & Cappell, 1974; 
Riley et al, 1976). Central habituation to 
the UCS, habituation to the stress induced by 
the UCS, and tolerance for the UCS are 
formally quite similar accounts. 

It is possible that some physiological sub- 
strate, which is activated by an aversive UCS 
and is important for excitatory conditioning, 
is modified by repeated exposure to the UCS. 
Riley et al. (1976) suggested that this physi- 
ological substrate may involve the pituitary- 
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adrenal axis and, in particular, the adreno- 
corticotrophic hormone (ACTH), For ex- 
ample, ACTH has been shown to be critical 
for normal acquisition of conditioned active 
and passive avoidance responses (cf. De- 
Wied, 1964), and the release of ACTH is 
conditioned in CER  paradigms (Bassett, 
Cairncross, & King, 1973; Brady, 1967) and 
CTA paradigms (Ader, 1977). It is also 
known that exposure to an unsignaled, aver- 
sive UCS often inhibits the normal release 
oi ACTH from the adenohypophysis in re- 
sponse to subsequent applications of that 
UCS or to other aversive stimuli (Milulaj 
& Mitro, 1972; Munson, 1973). It is pos- 
sible, therefore, that inhibition of stress-in- 
duced release of ACTH by prior exposure to 


unsignaled, aversive stimuli would reduce: 


the potentiating effect that ACTH has on 
the acquisition and maintenance of fear-mo- 
tivated behaviors. The possibility that ACTH 
plays a role in the preexposure phenomenon 
is intriguing for the following reasons. First, 
it would provide a common final substrate 


through which many aversive stimuli might” 


act and thus a basis for the notion of central 
habituation. Second, some investigators (Res- 
corla, 1973) have obtained evidence of a 
preexposure effect only during extinction test- 
ing, a situation in which Weiss, McEwen, 
Silva, and Kalkut (1970) have shown ACTH 


to have maximal effects on behavior. How- ^8 
ever, the hypothesis that ACTH plays a 


critical role in the UCS preexposure phe- 
nomenon should be viewed with some cau- 
tion, considering the lack of data. 


Memorial representation. Rescorla (1974) ғ 


proposed that first-order excitatory condi- 
tioning involves the construction of memories 
ior individual events and a formation of 
associations (stimulus-stimulus) 
such memories, Preconditioning exposure to 
a UCS produces a memorial representation 
of that UCS that may either augment or 
diminish the representation of the UCS used 
during excitatory conditioning. An unsupple- 
mented memorial-representation model pre- 
dicts no UCS preexposure effect unless UCS 
intensity is changed between the preexposure 
and conditioning phases and is thus unten- 
able. Consequently, it seems worthwhile to 
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‘consider how central habituation and changes 
in memorial representation of the UCS might 
interact. It may be the case, for example, 
that animals preexposed to any intensity 
UCS will show an attenuation of excitatory 
conditioning because of central habituation 
but that the amount of attenuation will also 
be a function of changes in the memorial 
representation of the UCS. In general, Res- 
* corla's view would predict that the attenuat- 
ing effect of UCS preexposure should be 
diminished when the UCS is more intense 
during preexposure than during excitatory 
conditioning, but enhanced when the UCS is 
less intense during preexposure than during 
conditioning. In studies that used unsignaled 
ý Preexposure conditions (Cannon et al., 1975; 
Mis & Moore, 1973) and signaled preex- 
posure conditions (Randich, 1978), these re- 
№. sults were not obtained. Thus, changes in 
the memorial representation of the UCS are 
unlikely to contribute to the UCS preexpo- 
sure effect. 

«<  Opponent-process theory. Another non- 
associative account of the decremental effect 
of preexposure to the UCS comes from the 
opponent-process theory of acquired motiva- 

Sy, tion (Solomon, 1977; Solomon & Corbit, 
1974)—a theory designed to explain the af- 
fective dynamics of responses to strong stim- 

A, uli. The opponent-process theory holds that 
the relative strengths of two opposing pro- 
cesses, the a process and the process, deter- 
mine the affective state of an organism in 
response to a strong stimulus. The a process, 
or primary affective process, is postulated to 

У be the emotional ОСЕ to the UCS, for ex- 
ample, fear elicited by a strong electric shock. 

1 The a process is said to be stimulus locked, 

œs showing little habituation or sensitization 
“with successive presentations of the UCS. 
The ф process, or the opponent process, is 

, said to be aroused by occurrence of the a 
© process and to have an affective sign opposite 
to that of the a process, for example, inhibi- 
tion of fear following strong shock. The b 
process, unlike the a process, is postulated to 
increase in both intensity and duration with 
repeated evocation, as long as the time inter- 
val between successive evocations is less than 

— iit required for the complete decay of the 


У 
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b process (critical decay duration; Starr, 
1976). The algebraic summation of these 
two opposing affective processes results in a 
standard pattern of affective dynamics. When 
the quantity a — 6 is positive, the animal is 
said to be in the A state; when the quantity 
а — b is negative, the animal is said to be 
in the B state. 

This model bears on the UCS preexpo- 
sure phenomenon because it predicts that 
repeated presentations of the UCS will 
strengthen the b process when these pre- 
sentations are separated by less than the 
critical decay duration. As this occurs alge- 
braic summation of the opponent processes 
should result in a greatly diminished A state 
and in an augmented B state with a long 
decay time. If excitatory classical condition- 
ing is the conditioning an A state to a nomi- 
nal CS, then such conditioning will be attenu- 
ated by prior exposure to the UCS because 
this treatment diminishes the A state. 

The opponent-process model makes a pre- 
diction that is not made by other nonassocia- 
tive accounts of the UCS preexposure phe- 
nomenon. If the b process grows with repeated 
UCS presentations, then the conditioning of 
the B state should be facilitated by prior ex- 
posure to the UCS. Thus, if a nominal CS is 
paired with the peak of the B state in a 
backward conditioning procedure, condition- 
ing of this B state should be facilitated by 
prior exposure to the UCS. Other nonassocia- 
tive accounts of the UCS preexposure phe- 
nomenon that have been discussed predict 
attenuating effects of UCS preexposure on 
both excitatory (A state) and inhibitory (B 
state) conditioning. These differential pre- 
dictions have not yet been tested. 


Summary 


The first researchers to demonstrate the 
decremental effect of prior exposure to the 
UCS on the acquisition of an excitatory CR 
explained this decrement in nonassociative 
terms (Kamin, 1961; Kimble & Dufort, 
1956; Taylor, 1956). However, with the 
marked surge in the development of associa- 
tive theories of Pavlovian conditioning in the 
late 1960s and early 1970s (Kamin, 1969; 
Rescorla & Wagner, 1972) came an attempt 
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to encompass a greater variety of phenomena, 
including the UCS preexposure effect, in as- 
sociative terms. 

The most viable associative explanation of 
the UCS preexposure phenomenon obtained 
in human eyelid conditioning, rabbit eyelid 
conditioning, CER, and CTA learning is con- 
text blocking. According to this view, prior 
exposure to the UCS conditions some stimu- 
lus aspect of the experimental environment, 
thereby blocking subsequent conditioning of 
à nominal CS in that same environment. A 
question that naturally arises is, what can be 
gained by the study of the UCS preexposure 
effect if context blocking forms the basis of 
this phenomenon? In other words, what im- 
plications does a context-blocking explanation 
have for learning theory in general? 

First, а context-blocking analysis should 
delineate the nature of contextual stimuli 
that can be associated with UCSs and should 
determine whether responding to these stim- 
uli can be described by the same laws that 
govern responding to discrete CSs. Although 
an influential formal model of Pavlovian con- 
ditioning (Rescorla & Wagner, 1972) has 
emphasized the importance of the condition- 
ing context in an abstract sense, there have 
been few attempts to identify the physical 
aspects of the context that do become condi- 
tioned. In the absence of such attempts, po- 
tentially important differences between con- 
ditioning paradigms remain obscure. For ex- 
ample, interest in context blocking has called 
attention to the importance of handling and 
injection cues in the analysis of learning es- 
tablished in the CTA paradigm (Braveman, 
1978; Rudy, Iwens, & Best, 1977; Willner, 
1978). Such stimuli, for example, the han- 
dling and injection procedure that occurs be- 
tween the presentation of the nominal CS 
and the occurrence of the UCR, have no 
obvious analogues in the CER or eyelid con- 
ditioning paradigms. 

A context-blocking view may also provide 
important information concerning common 
elements of the UCRs elicited by different 
UCSs and the role of such responses in the 
conditioning process. Such a notion was an- 
ticipated in Konorski's ( 1967) discussion of 
the role of preparatory CRs in the condi- 


tioning process, According to Konorski, pres 
paratory or diffuse emotional responses elic- 
ited by UCSs must first be conditioned to a 
nominal CS before a consummatory or spe- 
cific CR can be established. Information 
about the similarities among preparatory re- 
sponses based on different UCSs could be de- 
rived from outcomes of experiments on cross- 
UCS preexposure effects, 


Other associative accounts of the UCS” 


preexposure phenomenon, such as controlla- 
bility of the UCS (Vogel, Note 2), remain 
to be more fully explored. Research based 
on the UCS-controllability hypothesis may 
establish vital links between the learned 
helplessness phenomenon and the UCS pre- 
exposure effect, 


Nonassociative accounts of the UCS pre- i 


exposure phenomenon also warrant thorough 
investigation because they too have impor- 
tant implications for learning theory. The 
most viable nonassociative hypothesis of the 
UCS preexposure effect is that some central 


adaptation or habituation process occurs dur- 


ing prior exposure to the UCS and reduces 
the impact of the UCS during subsequent 
excitatory conditioning. If central habitua- 
tion proves to form the basis of the UCS pre- 
exposure effect in a given paradigm, then one 
Should consider the possibility that a sub- 
stantial amount of habituation occurs in any 


Pavlovian conditioning procedure that re-^^ 


peatedly presents the UCS. The opponent- 
Process model (Solomon & Corbit, 1974) 
makes an analogous prediction, although on 
the basis of cancellation of a constant а 
process by a gradually increasing b process. 
In any case, the role of b processes in habit- 
uation of various responses is a worthwhile 
area for further study. 


One widely studied paradigm in Pavlovian' 


conditioning, called blocking (Kamin, 1969), 
refers to the case in which Prior conditioning 
to one stimulus (A) markedly attenuates 
conditioning to an added stimulus (B) in 
the compound stimulus AB, as long as the 
UCS remains unchanged. A common assump- 
tion in all explanations of blocking (Mackin- 
tosh, 1975; Rescorla & Wagner, 1972) is 
that to the extent that the added stimulus, 


B, provides no new information about the 76 
| 


v 
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occurrence of the UCS (beyond that provided 
by Stimulus A), Stimulus B will not condi- 
tion. However, if it is assumed that habitua- 
tion occurs to the UCS during Stimulus-A 
training, then this reduced emotional respon- 
siveness may in part be responsible for the 
failure to condition Stimulus B, independent 
of the presence of Stimulus A in either phase. 
This notion suggests that a group of animals 
*receiving reinforced presentations of Stimulus 
C followed by reinforced presentations of the 
compound stimulus AB (C+/AB+) would 
show blocking relative to a group receiving 
no treatment and then АВ+ training. A 
C--/AB-- control group is not typically in- 
corporated in studies of the blocking phe- 
nomenon. 

A habituation account of the UCS pre- 
exposure phenomenon also bears on the un- 
blocking effect. It is known that unblocking, 
or conditioning of the added stimulus (B) 
in the compound AB, occurs following prior 
conditioning of Stimulus A when the in- 
,tensity of the UCS used to reinforce the com- 
pound is increased (Kamin, 1969) relative 
to the intensity of the UCS used to condition 
Stimulus A. Similarly, Dickinson, Hall, and 
Mackintosh (1976) showed that unblocking 
occurs when Stimulus A is reinforced with 
two shocks and the compound stimulus AB 
is reinforced with only one shock; there isa 


"Meelative decrease in the intensity of the UCS. 
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The unblocking effect is typically attributed 
to the surprise value of the novel UCS used 
to condition Stimulus AB, although the mech- 
anism of surprise remains unstated. In this 
regard, habituation to the UCS may provide 
a mechanism of surprise. If one assumes that 
some habituation of the UCR to the UCS 
occurs during conditioning of Stimulus A, 
*the use of a novel UCS during conditioning 
of Stimulus AB, whether the UCS represents 
a relative increase or a relative decrease in 
intensity, may act to temporarily dishabitu- 
ate the UCR to the UCS (Thompson & Spen- 
cer, 1966). This treatment would effectively 
act to restore the former UCR to #е'ОС5, 
and depending on how much habituation has 
occurred and the magnitude of the dishabitu- 
ation effect, would permit some conditioning 
of Stimulus B. 
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A dishabituation account of the unblock- 
ing effect also revitalizes the explanation of 
blocking and unblocking that can be derived 
from the Rescorla- Wagner (1972) model of 
conditioning. The occurrence of unblocking 
when Stimulus A is reinforced with two 
shocks but Stimulus AB is followed by only 
one shock (Dickinson et al., 1976) is prob- 
lematical for this model, which assumes that 
only a manipulation that increases the over- 
all level of associative strength that the UCS 
can support permits Stimulus B to condition. 
Reducing the number of shocks during AB 
training is not expected to produce an increase 
in overall level of associative strength. How- 
ever, if unblocking is attributable to dis- 
habituation, then restoration of a previously 
habituated UCR to the UCS should act to 
temporarily increase the overall level of as- 
sociative strength that the UCS can support 
and permit Stimulus B to condition accord- 
ing to the general framework specified by 
the Rescorla-Wagner model. 

The evidence presented in this review con- 
clusively supports neither a context-blocking 
nor a habituation account of the UCS preex- 
posure phenomenon; nor is it necessary to as- 
sume that a single mechanism of action is re- 
sponsible for the phenomenon. Associative 
and nonassociative factors may both play a 
role in any given situation, and, indeed, the 
relative importance of the two may vary 
across conditioning paradigms. At the present 
time, we suffer from both a lack of under- 
standing of basic processes and the absence 
of theoretical models that provide rules for 
combining associative and nonassociative con- 
tributions to performance established by 
Pavlovian conditioning procedures. 
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Symbolic Interactionist View of Self-Concept: 
Through the Looking Glass Darkly 


J. Sidney Shrauger and Thomas J. Schoeneman 
State University of New York at Buffalo 


Research on the relationship between self-perceptions and evaluations from 
other people is reviewed. Studies of naturalistic interactions indicate that 
people’s self-perceptions agree substantially with the way they perceive them- 
selves as being viewed by others. However, there is no consistent agreement 
between people’s self-perceptions and how they are actually viewed by others. 
There is no clear indication that self-evaluations are influenced by the feedback 
received from others in naturally occurring situations. When feedback from 
others is manipulated experimentally, self-perceptions are usually changed. 
However, methodological limitations such as the questionable external validity 
and strong demand characteristics of the experimental situations employed make 
the significance of these findings unclear. The available evidence is examined 
within a framework that considers the transmission, processing, and evaluation 
of judgments from others. Other means by which interaction may influence 
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O wad some power the giftie gie us 
To see oursels as others see us! 
Robert Burns, To a Louse 


Burns's couplet expresses a concern about 
self-knowledge and its origins that is ancient 
and contemporary. Recently, a resurgence of 

№ interest in the self has flourished in many 
areas of psychology, especially in psychother- 
„„арешіс formulations that view cognitions 
Aabout oneself as vital mediators in the main- 
tenance and modification of behavior and in 
social psychological theories involving at- 
tribution, cognitive dissonance, and self- 
awareness. Understanding how attitudes 


W about the self are developed and maintained 


has thus become increasingly important. 
When people are asked how they know 
that they possess certain characteristics, a 


~~ Aypical answer is that they have learned 


~ 


сф 


about them from other people. А more for- 
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self-perceptions aside from direct evaluative feedback are considered. 


mal theoretical statement of this view has 
been articulated by the influential school of 
thought known as symbolic interactionism. 
This theory proffers the idea of a "looking 
glass self? and asserts that one's self-concept 
is a reflection of one's perceptions about how 
one appears to others. This assertion has re- 
ceived widespread professional acceptance 
and is intoned with catechistic regularity in 
many leading texts on social behavior (e.g., 
Raven & Rubin, 1976; D. J. Schneider, 1976; 
Secord & Backman, 1974). 

Social philosophers and psychologists of 
the late 19th century such as Peirce (1868), 
James (1890), and Baldwin (1897) were 
precursors of symbolic interactionism in their 
emphasis on the self as a product and re- 
flection of social life (Gordon & Gergen, 
1968; Ziller, 1973). Cooley (1902), generally 
credited as the first interactionist, developed 
the idea of the looking glass self. He posited 
that the self is inseparable from social life 
and necessarily involves some reference to 
others. This process of social reference re- 
sults in the looking glass self: “А self idea 
of this sort seems to have three principal ele- 
ments: the imagination of our appearance to 
the other person; the imagination of his 
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judgment of that appearance, and some sort 
of self-feeling, such as pride or mortification” 
(Cooley, 1902, p. 152). According to Cooley, 
from early childhood our concepts of self de- 
velop from seeing how others respond to us: 
"In the presence of one whom we feel to be 
of importance, there is a tendency to enter 
into and adopt, by sympathy, his judgment 
of ourself” (p. 175). Mead (1934), the ma- 
jor theorist of symbolic interactionism, am- 
plified and expanded the view of the self as 
a product of social interaction: “The in- 
dividual experiences himself as such, not di- 
rectly, but only indirectly, from the particu- 
lar standpoints of other individuals of the 
same social group, or from the generalized 
standpoint of the social group as a whole to 
which he belongs" (p. 138). Essential to the 
genesis of the self is the development of the 
ability to take the role of the other and par- 
ticularly to perceive the attitude of the other 
toward the perceiver. Mead’s looking glass 
self is reflective not only of significant others, 
as Cooley suggested, but of a generalized 
other, that is, one’s whole sociocultural en- 
vironment. More recently, Kinch (1963) has 
summarized and systematized symbolic in- 


teractionist self theory by noting that it 


basically involves an interrelation of four 
components: our self-concept, our percep- 
tion of others! attitudes and responses to us, 
the actual attitudes and responses of others 
to us, and our behavior. 

In recent years, self theories have been 
Proposed that do not insist on the primacy 
Of social others as sources of information 
about the self. Bem (1967, 1972) has as- 
serted that self-perception is a special case 
of person perception: 

Self-descriptive attitude statements can be based on 
the individual's observations of his own overt be- 
havior and the external stimulus conditions under 
which it occurs. . . , As Such, his statements are 


functionally similar to those that any outside ob- 
server could make about him. (1967, pp. 185-186) 


Jones and Nisbett (1971) have qualified 
Bem's analysis somewhat by proposing that 
"actors tend to attribute the causes of their 
behavior to stimuli inherent in the situation, 
while observers tend to attribute behavior to 
stable dispositions of the actor" (p. 93). 
Duval and Wicklund's (1972) objective self- 
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awareness theory also emphasizes the poten, 
tial of the individual for active self-appraisal, 
Objective self-awareness is a state of con- 
sciousness in which attention is focused in- 
ward on the self, making the individual an 
object to his or her consciousness. The as- 
sumption that self-awareness is dependent 
on the imagination of another’s views is min- 
imized. Although these self-perception the- 
ories have stimulated considerable research). : 
the initial justification for each view was 
mainly on theoretical rather than empirical 
grounds. Thus, some attention is given to 
the relevance of the data presented here to 
self-perception theories, although the main 
objective is to evaluate the evidence relevant 
to the looking glass self. 

Information concerning the looking glass’ 
self derives from several lines of inquiry, not 
all of them explicitly related to this theory. 
Even work that has been done within the 
framework of symbolic interactionism suffers 
from a severe case of “ahistoricity,” so that 
there is little sense of cumulative develop- 
ment of information. This article attempts 
to examine thoroughly the studies done under 
the auspices of symbolic interactionism. An 
exhaustive review of relevant studies outside 
of this framework cannot be claimed, how- * 
ever, since these come from many divergent 
bodies of literature. 

The research presented is divided into) 
two sections. First, studies are reported that 
examine feedback given in uncontrolled, nat- 
urally occurring interactions. Next, investiga- 
tions of the effects of controlled feedback in 
Structured situations are considered, with at- * 
tention given to work in which feedback is 
purportedly based either on objective infor- 
mation or on more subjective judgments. 
Some restrictions on the types of research ^ 
reviewed here should be noted. The main de- 
pendent variable examined is expressed self- 
perceptions. Studies exploring the impact of 
self-relevant feedback on other aspects of 
behavior are typically not covered, since it is 
debatable whether such Changes are neces- 
sarily mediated by changes in self-percep- 
tions. Also, although it may be argued that 
studies of attitude change on any topic in- 

7 


volve some implied reappraisal of self-evalu- 
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ations, the focus here is limited to changes in 
attitudes about the self, since there is evi- 
dence that reactions to feedback about the 
self differ from those about other attitudes 
(eg. Eagly, 1967). A final restriction in- 
volves the area of self-presentation. Express- 
ing one's self-perceptions in any public 
fashion inevitably has some potential instru- 
mental value, and numerous investigations 
have focused on the functional impact of 
such self-statements, These studies, however, 
address issues that are not central to our dis- 
cussion. The focus of this article is on in- 
vestigations in which self-statements are per- 
ceived as fairly accurate estimates of the 
individual’s actual attitudes and external in- 
centives to a particular type of self-presenta- 
tion are minimized. 


Naturalistic Studies 


Many investigations have sought support 
for the idea of the looking glass self in natu- 
rally occurring interactions. One group of 
studies has focused on the proposition that 
individuals’ self-perceptions should be highly 
congruent with the way they see themselves 
as being perceived by others. Table 1 shows 
that these studies vary widely along a num- 
ber of different dimensions. Most analyses 
were correlational, some involved statistical 
comparisons, and some of the earlier studies 
relied on nonstatistical “eyeballing” of the 
data (eg, Miyamoto & Dornbusch, 1956; 
Quarantelli & Cooper, 1966; Reeder, Dono- 
hue, & Biblarz, 1960). Samples have been 
drawn from all levels of the educational sys- 
tem and from a variety of other populations. 
Evaluations by self and others have most 
often centered on global measures of self- 
concept, although some investigations have 
examined more specific aspects of person- 
ality and behavior. Overall, these studies 
Show modest to strong correlations between 
individuals’ perceptions of themselves and 
the way they assume others perceive them. 
Nonsignificant relationships have occurred 
in situations in which deviant groups, such as 
delinquents (Teichman, 1972), learning dis- 
abled students (Swanson, 1969) and socio- 
metrically rejected students (Goslin, 1962), 
have been studied. The only exception to this 


J. SIDNEY SHRAUGER AND THOMAS J. SCHOENEMAN 


pattern is Swanson's finding that for 11 
tionally disturbed children there was con! 
gruence between self-acceptance and per- 
ceived parental acceptance and that for 35 
normal children this congruence was absent, 

In addition to postulating concordance be- 
tween self-evaluation and the perceived eval- 
uations of significant others, Mead (1934) 
contended that self-concept is reflective of 
the perceived evaluation of a generalized. 
other. Relatively few studies have examined 
this facet of symbolic interactionism. There 
is some evidence that individuals’ self-per- 
ceptions are similar to their perceptions of 
how they are viewed by others in general 
(Miyamoto & Dornbusch, 1956; Quarantelli 
& Cooper, 1966; Reeder et al, 1960). The 
evidence on whether self-perceptions are more 
strongly related to the perceived impressions | 
of specific others or to the perceived impres- 
sions of the generalized other, however, is 
contradictory (Miyamoto & Dornbusch, 
1956; Quarantelli & Cooper, 1966). 

The demonstration of a relationship be- 
tween people's self-perceptions and how they, 
feel others see them is not sufficient in vali- 
dating the symbolic interactionist position. 
It is necessary, in addition, to demonstrate 
congruence between (a) self-perceptions and i 
others’ actual perceptions of the person and 
(b) perceived other-evaluations and actual 
other-evaluations. A large number of studies ту 
have examined the former relationship; they 
are summarized in Table 2. Although many 
of these studies are of questionable statistical 
and conceptual significance (Wylie, 1974), | 
the overall pattern of the conclusions drawn _ 
by these investigations suggests much less 
agreement between self-judgments and actual 
judgments by others than between self-judg- 
ments and perceived judgments. Approxi- .- 
mately half the studies reviewed show no* 
significant correlations between self-percep- 
tions and others’ actual evaluations. The ma- . 
jority of the remaining investigations have 
reported either significant but low correla- 
tions or ambiguous results. There are no 
easily distinguishable factors that account 
for the presence or absence of positive asso- 
ciations. A wide range of subjects and evalu- 
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(text continued from page 552) 


ators were used, and comparisons were made 
on many attributes, most frequently selí- 
esteem or task competence. Also, a number 
of studies have shown that perceived reac- 
tions of others are closer to self-concept than 
are actual reactions (Miyamoto & Dorn- 
busch, 1956; Orpen & Bush, 1974; Quaran- 
telli & Cooper, 1966; Sherwood, 1965; Wal- 
hood & Klopfer, 1971). The minimal asso- 
ciations between self-perceptions and others’ 
actual evaluations suggest that people do not 
accurately perceive others’ opinions of them, 
that these opinions minimally influence self- 
judgments, or, as indicated by a study by 
Reese (1961), that these two variables may 
be curvilinearly related, thus explaining why 
significant linear correlations do not often 
emerge (Hartup, 1970). Studies assessing de- 
gree of influence are infrequent and are dis- 
cussed below. 

The issue of accuracy in perceiving others’ 
opinions has also been examined by the con- 
sideration of the relationship between in- 
dividuals’ perceptions of others’ views of 
them and others’ actual views. Of the studies 
assessing this relationship, some show con- 
gruence (Ausubel & Schiff, 1955; Ausubel, 
Schiff, & Gasser, 1952; De Jung & Gardner, 
1962), some indicate partial or ambiguous 
relationships (Goslin, 1962; Israel, 1958; 
Reeder et al., 1960; Tagiuri, Blake, & 
Bruner, 1953; Walhood & Klopfer, 1971), 
and others demonstrate no association (Au- 
subel, 1955; Fey, 1955; Kelman & Parloff, 
1957; Orpen & Bush, 1974). Most of the 
studies showing congruence have involved 
judgments of highly evaluative characteristics 
such as liking by the other person, whereas 
those showing minimal associations have typi- 
cally involved more content-specific judg- 
ments. Ability to predict peers’ liking in- 
creases with age, at least, from the lower 
grades through high school (Ausubel & Schiff, 
1955; Ausubel et al., 1952; De Jung & Gard- 
ner, 1962), reflecting perhaps a more exten- 
tive interaction with those judged, more fre- 
quent expression of interpersonal preferences, 
or, greater sensitivity to interpersonal cues. 
Also, whether one is predicting positive or 
negative feelings may be important; people 
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seem to be better able to predict who likes. 
them best as opposed to who likes them least 
(Tagiuri et al., 1953). That self-perceptions 
are consistently more strongly correlated with 
people's perceptions of how they think others 
view them than with how others actually 
view them suggests that the tendency to as- 
sume greater similarity between one's own 
and others’ attitudes than actually exists 
(e.g., Newcomb, 1961) extends into the area, 
of attitudes toward oneself. Thus, subjects" 
self-evaluations may be weakly related to 
others’ opinions of them because they fre- 
quently do not know what others’ opinions 
are. 

Since the studies reported thus far show 
no direction of causality or change over time, 
it is impossible to decide whether the actual 
or perceived evaluations of people by others 
are a cause or effect of how they perceive 
themselves. If one is to infer that others' 
judgments influence self-perception, assess- 
ments must be made at different times to see 
if self-perceptions change in the direction 
of others’ earlier evaluations. Almost all of" 
the relevant investigations have examined 
short-term changes in self-evaluation in re- 
lation to actual or perceived evaluations by 
others, Sherwood (1965) had sensitivity ! 
training participants rate themselves on a - 
set of bipolar trait scales during the second | 
day of a 2-week program. At the end of еф 
program they rated themselves again, rated 
how they felt other group members would 
rate them, and rated other group members on 
the same set of dimensions. The ratings of a 
person by others were more similar to his or 
her self-ratings at the end of the program 
than to initial self-ratings. Since others’ rat- 
ings were not obtained at the outset, one 
cannot infer that they actually T 


self-ratings. Instead, both the subject and 
other group members may have observed and 
responded to changes in subjects’ presenta- : 
tion of themselves as the sessions continued. 
Rosengren (1961), in a study of 10 insti- 
tutionalized preadolescent boys with emo- 
tional disturbances, obtained self-ratings and 
ratings by peers over a l-year interval. He 
found that for the post- as compared with 
the preratings, self-ratings were more similar . 
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to both subjects’ perceptions of others’ rat- 
ings of them and others' actual ratings of 
them. Although these subjects did see them- 
selves more similarly to the way they were 
seen by others, the critical comparison show- 
ing that self-ratings in the second evaluation 
became more similar to others' initial evalua- 
tions of them was not made. 
The most sophisticated naturalistic inves- 
-x tigation to date remains an early study by 
Manis (1955). Male undergraduates assigned 
as dormitory roommates rated themselves, 
their ideal selves, and their roommates at the 
beginning of a semester and after 6 weeks. 
Based on sociometric choices at the beginning 
of the first sessions, a friend and a nonfriend 
were designated for each subject. Subjects’ 
* self-perceptions and their friends’ perceptions 
of them were more similar after their final 
rating than after their first. The most im- 
portant finding was that subjects’ final self- 
ratings were more similar to others’ initial 
judgments of them than were their initial 
self-ratings. Furthermore, others’ second rat- 
“ings of a subject were no more similar to 
the subject’s initial self-perceptions than were 
their first ratings, suggesting that others’ 
impressions were not substantially influenced 
by the subject’s initial self-evaluation. 
Although these data suggest that individ- 
uals do change their self-perceptions in the 
‘Wdirection of others’ opinions about them, 
methodological limitations make this conclu- 
sion equivocal. Most significantly, subjects’ 
self-perceptions changed in the direction of 
1 + friends’ initial judgments of them only when 
the designated friend had initially described 
them more favorably than subjects had de- 
scribed themselves. When their designated 
friend described them less positively than 
their own self-perceptions, there were no in- 
creases in the similarity of their self-de- 
~ scriptions. A friend who views subjects more 
positively than the subjects view themselves 
would be likely to reciprocate the subjects’ 
friendship more than someone who views 
them less favorably than they view them- 
selves. Learning that they have chosen as à 
friend someone who also likes them may en- 
hance people's feelings of interpersonal per- 
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ceptiveness and social competence and cause 
them to raise their self-evaluation. 

Even if Manis’s results indicate that a 
friend who describes a peer positively influ- 
ences the peer’s self-perceptions, the nature 
of the changes generated remains unclear. 
Subjects may either change the overall favor- 
ableness of their self-ratings to more closely 
match that of their evaluators, or they may 
change their assessments on specific dimen- 
sions so that the pattern of their self-de- 
scriptions across dimensions becomes more 
similar to that of their evaluators. This dis- 
tinction raises the issue of whether the in- 
fluence of others’ assessments extends beyond 
the general evaluative level to more specific 
elements of the dimension being assessed. 
Perhaps when people are reacting to others’ 
evaluations of them, the principal or even 
exclusive information that they process is 
whether they are being perceived in some 
globally positive, negative, or neutral way. 

The only long-term longitudinal study that 
has been reported involved self-ratings and 
ratings by peers and teachers of children in 
the first and second grades who were later re- 
assessed in the fifth and sixth grades (Trick- 
ett, 1969). Neither peer nor teacher ratings 
from the initial assessment were significantly 
correlated with self-ratings in the second as- 
sessment. Children’s perceptions of how peers 
saw them in the initial assessment were un- 
correlated with their self-perceptions in the 
second measurement. Although the author 
implied some causal influence of others’ rat- 
ings (particularly those of teachers) on later 
self-perception, this is difficult to detect in 
the data. The absence of such an effect is not 
surprising in light of the fact that by the 
time subjects were in the later grades they 
had been exposed to a number of different 
peers and teachers, whose influence was im- 
possible to gauge. 

The numerous naturalistic studies that 
have been undertaken have not, by and large, 
contributed substantially to an understanding 
of the extent to which others’ perceptions 
influence self-judgments. Currently, there is 
little evidence that in their ongoing social 
interactions people’s views of themselves are 
shaped by the opinions of others. This is due 


560 


Ld 
primarily to the lack of repeated assessments 
of self-perceptions and others’ perceptions 
whereby movements of one toward the posi- 
tion of the other could be determined. 

Other issues are also important in evalu- 
ating the naturalistic data. Many investiga- 
tions may not have examined situations in 
which the input of other people was maximal. 
For instance, most studies have used late 
adolescents and adults as subjects. If these 
individuals are in stable life situations, they 
may be more likely to maintain relatively 
solidified self-images. The impact of others’ 
opinions could possibly be enhanced and 
more pronounced if adults were studied in 
unfamiliar situations in which their norms 
for self-evaluation and the behavior patterns 
that they displayed were in a state of flux, 
as in Manis’s (1955) study of incoming col- 
lege freshmen in dormitories. It also seems 
likely that younger people are more sus- 
ceptible to external influence in developing 
their self-concept than are older individuals. 

A final consideration in assessing the work 
reviewed above concerns the individuals who 
are sources of feedback and their relationship 
to the subjects studied. Although peers are 
the most commonly used and are, in many 
cases, perhaps the most appropriate sources 
of evaluations, more attention should be 
given to the actual degree of interaction be- 
tween them and the people whose self-per- 
ceptions are being assessed. Membership as 
a peer in a group of students or workers does 
not necessarily demand that colleagues offer 
appraisals to one another. For both children 
and adults, a relatively small number of 
people may serve as significant sources of 
evaluative feedback. In most studies it is the 
researcher who decides who the subjects’ 
significant others are, and in many cases this 
designation may be off the mark. Investiga- 
tions that attempt to identify the significant 
others of a given population (e.g., Denzin, 
1966) would be useful preliminary steps in 
future naturalistic investigations, 


Studies of Controlled Feedback From Others 


Although researchers have employed a 
wide range of specific procedures for as- 
sessing the role of controlled feedback on 
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judgments of others, most studies have fols 
lowed one of two paradigms, which differ 
mainly in the extent to which the evaluator's. 
judgments are based on objective data. In the 
first type of study the feedback received 15 
purportedly based on tests of personality or. 
competence. Typically, subjects describe 
themselves on the attributes assessed by the 
tests, then take the tests, receive feedback 
about their performance either immediately 
or within a week or two, and finally re- 
appraise themselves. This procedure has been 
employed not only in specific efforts to as- 
sess the symbolic interactionist position but 
also in studies examining the effects of change 
in self-evaluation on other aspects of be- 
havior, with change in self-evaluation often 
examined principally as а manipulation 
check. In the second type of study, feedback 
is based on the subjective impressions of 
other individuals who have no specific knowl- 
edge of objective assessment results. These 
studies have varied in the extent to which 
the other person is presented as having ex- 
pertise in the topics considered. ) 

The most elementary question typically. 
asked in this research is, Will individuals 
modify their self-descriptions in the direction 
of the feedback they receive? The most ele- 
mentary answer is usually. Such changes һауе 
been shown for numerous populations and 
for many different attributes, from сотре- 
tence in public speaking (Videbeck, 1960) 
and physical skills (Haas & Maehr, 1965) to 
a variety of personality traits (e.g., Back- 
man, Secord, & Pierce, 1963; Binderman, 
Fretz, Scott, & Abrams, 1972; Cooper & 
Duncan, 1971; Eagly, 1967; Evans, 1962; 
Harvey & Clapp, 1965; Harvey, Kelley, & 
Shapiro, 1957; Regan, Gosselink, Hubsch, & 
Ulsh, 1975; Shrauger & Lund, 1975; Snyderfr 
& Shenkel, 1976; Steiner, 1968). In almo. 
all cases changes in self-perception have been 
judged by modifications in verbal self-de- 
scriptions made immediately following others? 
evaluations and in the presence of the evalu- 
ator. 

Although controlled feedback from others. 
typically produces some changes in people's) 
self-descriptions, several factors influence 
extent of such changes. These include th 
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discrepancy of feedback from subjects’ selí- 
perceptions, favorableness of feedback, char- 
acteristics of the evaluator, consensual vali- 
dation of the judgments given, and attributes 
of those evaluated. After these factors have 
been examined, some general observations on 
the significance and limitations of studies 
employing manipulated feedback are con- 
sidered. 


Discrepancy of Feedback From 
Self-Perceptions 


The amount of discrepancy between others’ 
evaluations and one’s own self-perceptions 
has been examined in several studies. Bergin 
(1962) found that the credibility of feedback 
influenced the relationship between discrep- 
ancy and self-perception changes. With a 
high-credibility source, increases in discrep- 
ancy resulted in greater changes in self-rele- 
vant attitudes, whereas for a low-credibility 
source the tendency was for greater credi- 
, bility to produce less change. Although not 
wholly consistent, other results have sug- 
gested that when others’ evaluations are pur- 
portedly based on objective test data, self- 
perceptions change more as the discrepancy 
from initial perceptions increases (Binder- 
- man et al., 1972; Eagly, 1967; Gerard, 1961; 
"i Johnson, 1966). However, Gerard found that 
uthis occurred only when subjects felt that the 
| feedback they had received would be made 
public, and Eagly found that changes in- 
creased from low to moderate but not from 
moderate to high levels of discrepancy. John- 
оп found a curvilinear trend, with attitude 
change first increasing with increased dis- 
crepancy and then decreasing. In contrast 
with the findings based on objective test data, 
when feedback was based on subjective rat- 
ings of a personality dimension made by sub- 
jects’ classmates, changes in self-evaluations 
were not enhanced by increased discrepancy 
between their judgments and subjects’ initial 
self-perceptions (Harvey & Clapp, 1965; 
Harvey et al., 1957). Although many factors 
may differentiate these studies from one an- 
other, they are generally consistent with 
Bergin’s argument about the role of credi- 
bility and suggest that for feedback that 
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diverges substantially from one’s views to 
have a strong effect on self-evaluations, it 
must be perceived as being based on clear 
objective information. 


Favorableness 


Several studies have examined amount of 
change in self-perceptions as a function of 
feedback favorableness. Some of these studies 
involved the “Barnum effect,” that is, the 
acceptance of bogus personality feedback 
(Meehl, 1956). Most such investigations in- 
dicate that favorable information is more 
readily accepted than unfavorable informa- 
tion (Sundberg, 1955; Halperin, Snyder, 
Shenkel, & Houston, in press; Mosher, 1965; 
Weisberg, 1970), with a few showing no dif- 
ferential acceptance (Dmitruk, Collins, & 
Clinger, 1973; Evans, 1962). These studies’ 
significance is questionable, however, since 
they involved no preassessments of subjects' 
self-evaluations and may have reflected the 
greater comparability between positive in- 
formation and initial self-perception than be- 
tween negative information and initial self- 
perceptions. 

A few studies have attempted to control 
for the discrepancy between feedback and 
initial impressions. Steiner (1968) examined 
changes in self-ratings on bipolar traits and 
found that positive feedback produced greater 
changes than negative feedback, when feed- 
back was based on upper level undergradu- 
ates’ interpretations of self-report tests. An- 
other study (Snyder & Shenkel, 1976) 
attempted to control for the “initial truthful- 
ness” of the information evaluated and found 
no differences in the acceptance of positive 
or negative feedback given by the graduate 
student and based on projective tests. 

Turning to studies in which feedback was 
not based on personality test results, we find 
that most were not designed to assess dif- 
ferences in reactions to equally discrepant 
positive and negative evaluations (Haas & 
Maehr, 1965; Jones, Gergen, & Davis, 1962; 
Maehr, Mensing, & Nafzger, 1962; Papa- 
georgis & McCann, 1965; Videbeck, 1960). 
Two careful investigations that did examine 
initial self-perceptions produced inconsistent 
findings similar to those of Steiner and Sny- 
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der and Shenkel just discussed. Eagly (1967) 
found no differential acceptance of feedback 
from a trained rater with regard to subjects" 
assertiveness or submissiveness. Harvey and 
Clapp (1965), however, found that students 
changed their self-ratings on a set of bipolar 
adjectives more when they had received posi- 
tive feedback than when they had received 
negative feedback from classmates. The eval- 
uators in Eagly's study may have had more 


legitimized expertise than those in Harvey . 


and Clapp's study, and the same may have 
been true in the Barnum effect study of Sny- 
der and Shenkel (1976) versus that of Steiner 
(1968). These results may suggest that sub- 
jects are reluctant to accept unflattering in- 
formation about themselves unless they feel 
that the source of that information has a par- 
ticularly strong basis for judgment. The in- 
consistency of these findings, however, sug- 
gests that the differential acceptance of posi- 
tive versus negative information may depend 
on a variety of parameters, Eagly and her 
colleagues have shown, for example, that 
positive information is readily accepted if 
the recipients of the information do not ex- 
pect to be evaluated again (Eagly & Acksen, 
1971) and if they have no choice over the 
information they have received (Eagly & 
Whitehead, 1972). Other factors such as 
the strength of the subject's initial self-per- 
ceptions and the attributes on which feed- 
back was given may also be relevant here. 


Evaluator Characteristics 


The most systematic investigation of fac- 
tors affecting the influence of an information 
source involved several studies by Webster 
and Sobieszek (1974), who examined sub- 
jects’ responses to evaluations of their ability 
on a perceptual task. Each subject worked 
with a partner, and both subjects’ initial 
performance was judged by an evaluator 
whose apparent competence on the task was 
varied. The impact of the evaluator’s assess- 
ment on subjects’ self-perceptions was not 
measured directly, but was inferred from the 
extent to which subjects acquiesced to their 
partners’ judgments on a subsequent set of 
items. The evaluator’s judgments had more 
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ect on rate of acquiescence when the eya] 
ator was presented as very competent as op- 
posed to moderately competent and had no 
effect when he was presented as incompetent, 
Manipulation of more general aspects of 
evaluator’s competence by presenting him to 
high school subjects as either a college junio 
or an eighth grader produced no differential 
changes in acquiescence level. 


effects of manipulating general competence. 
Whether a test evaluator was a PhD or а 
counseling practicum student influenced the 
degree of acceptance of bogus personality 
feedback if that information was highly dis: 
crepant from the subject’s initial self-per 
tion but not if it was less discrepant (Binder 
man et al., 1972). Whether a person received) 
ratings on adjective dimensions from an ас | 
quaintance or from a stranger in his or he 
class had no effect on the degree of chan 
in subsequent self-ratings (Harvey et al. 
1957). Although it is difficult to develop 
generalizations from such scattered findings, 
these data suggest that others’ expertise or 
competence has an impact on the acceptance 
of their evaluations only when that compe 
tence is specifically relevant to the judgment 
being made. 


Consensual Validation 


` Another aspect of the credibility of ier 
formation received involves the extent 10 
which it is validated by others, Presumably, 
as a larger number of individuals reflect a 
particular perception to the subject, the likeli 
hood that the subject will incorporate that 
perception is increased. Following this às-j 
sumption, Backman et al. (1963) found that; 
bogus personality feedback had less effect on] 
college students’ self-ratings as a greater num- 
ber of significant others were viewed as agree 
ing with the subject’s initial self-perception. 
The specific relevance of the number of others 
who hold an opinion is unclear, however 
since neither the salience of the dimensions 
to the subjects themselves nor the strength 
of their own self-perceptions was assessed 
In another study, junior high school boys 
were given feedback about their physical 
skills by either one or two experts (Haas & 
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, Maehr, 1965). Initial postfeedback ratings 
did not differ as a function of the number 
of raters, but self-ratings made 6 weeks after 
the experts’ judgments showed greater 
‘changes on the attributes evaluated for the 
group judged by two experts. Since there was 
no condition in which consistent feedback 
was repeated by a single judge, it is not cer- 
tain that it was a second person as opposed 
to a repetition of the communication that was 
the critical factor in enhancing feedback. 
This is an important issue, since there is some 
evidence that the repetition of an evaluation 
by the same evaluators enhances changes in 
self-evaluations (Kinch, 1968). Thus, there 
is no clear evidence that increasing the num- 
ber of people who make an evaluation en- 
| hances the likelihood that it will be accepted. 

The consistency of feedback across differ- 
ent evaluators has also been examined. Al- 
though there has been some suggestion that 
people respond more strongly to feedback 
that is consistent than to feedback that varies 
from evaluator to evaluator (Sherwood, 
1967), other findings offer little support for 
this view (Kinch, 1968; Sobieszek & Web- 
ster, 1973). Because of the wide variation in 
the methodology of these studies, it is im- 
possible to determine which differences among 
them account for the inconsistency in find- 
ings However, given the ambiguous nature 
X of these results, plus the fact that multiple 
^ and inconsistent evaluations may be frequent 

in real-life interactions, more careful exami- 

nation of how evaluative information is com- 
* bined and integrated seems warranted. 


n 


Self-Evaluator Characteristics 


There is some evidence that individuals 
differ in their receptivity to information 
about themselves. The main characteristic 
that has been examined in this regard is level 
of self-esteem, perhaps because most of the 
Work on response to others' feedback has 
focused on highly evaluative information. 
There is some consistency in the finding that 
individuals who have generally low self- 
esteem are more influenced by negative feed- 
back from others and less by positive feed- 
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back than are individuals with high self- 
esteem. This has been shown even when 
subjects’ initial self-perceptions about the 
specific attributes evaluated were comparable, 
and it has been demonstrated for judg- 
ments of assertiveness-submissiveness (Eagly, 
1967), social sensitivity (Shrauger & Rosen- 
berg, 1970), and several other personality 
traits (Harvey & Clapp, 1965). The only 
instance in which such a differential accept- 
ance was not demonstrated was for self- 
awareness (Shrauger & Lund, 1975). 

Studies of other individual differences in 
recipients have been more episodic. Gerard 
(1961) found that a self-report measure of 
susceptibility to social influence predicted de- 
gree of change in self-perception, but only 
when the evaluation from others was sup- 
posed to be made public. People who had a 
less well-developed sense of self or a lower 
level of ego identity (Erikson, 1956) changed 
their self-evaluations more following success 
or failure feedback on an intellectual task 
than did those at higher levels of ego iden- 
tity (Marcia, 1967). Harvey, Hunt, and 
Schroder (1961) reported that levels of con- 
creteness-abstractness in cognitive processes 
predicted the extent of changes in self-de- 
scriptions following personality test feedback. 
These data indicate that in the future, precise 
appraisals of the impact of others’ judgments 
on self-perceptions will require acknowledg- 
ing the association between subject character- 
istics and the nature of the judgments given. 


Significance of Manipulated Feedback 


Having considered some factors that can 
affect the impact of others’ ratings, we turn 
to an examination of the broader significance 
of feedback manipulation studies, particu- 
larly the degree of influence that feedback in 
such studies has been shown to have. Issues 
to be considered here are how long feedback 
effects last, their situational specificity, and 
the influence of feedback about a specific 
attribute for self-appraisal on other attri- 
butes. 

One important but relatively neglected 
issue is the longevity of the impact of others’ 
evaluations. Only two studies have examined 


564 


the effect of others’ appraisals over time. In 
one investigation, subjects were given posi- 
tive or negative feedback about physical 
skills by an expert, and their self-perceptions 
were reassessed immediately and after 1 day, 
6 days, and 6 weeks (Haas & Maehr, 1965). 
Both positive and negative evaluations af- 
fected self-perceptions, and these effects were 
maintained over the 6-week period, although 
they appeared to diminish over time. Changes 
in dimensions not specifically evaluated were 
evident immediately after the evaluation, but 
were insignificant thereafter. Hicks (1962) 
gave subjects feedback that classmates judged 
them more favorably than their own self- 
perceptions on a group of personality traits, 
Two days after the initial evaluation, sub- 
jects were more likely to have raised their 
self-judgments on the elevated traits than on 
the control traits, although this difference 
did not hold after a week. Thus, the minimal 
evidence available on this issue suggests that 
the impact of others’ judgments on self-per- 
ceptions holds over short periods of time but 
tends to diminish as time passes. 

Also relevant in assessing the importance 
of feedback from others is the extent to 
which the effect of feedback generalizes from 
focal attributes to other characteristics. The 
three studies that have examined this effect 
used expert sources and systematically varied 
the relatedness of secondary attributes to the 
focal dimension (Haas & Maehr, 1965; 
Maehr et al., 1962; Videbeck, 1960). They 
found, not surprisingly, that judgments 
changed more on the dimension that was 
evaluated than on the one that was not 
(Maehr et al., 1962) and that those changes 
that did occur in other dimensions dissipated 
over time (Haas & Maehr, 1965). There- 
fore, relatively little information exists re- 
garding the manner and extent to which con- 
tent-focused evaluations are generalized to 
other characteristics of oneself. 

Situational factors may also influence the 
degree of acceptance of others’ self-evalua- 
tions, since the functional utility of accept- 
ing or rejecting others’ impressions may vary 
from situation to situation. When college 
students feel that evaluations of their per- 
formance on a test are going to be made 
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public, for example, they change their self-y 
perceptions regarding that attitude more than 
do subjects who feel their responses will be 
known only to themselves (Gerard, 1961). 
Eagly and Acksen (1971) found that individ- 
uals changed their self-perceptions more in | 
the direction of negative information and 
less in the direction of positive information 
when they felt that they would be retested 
on the attribute on which their performance y 
was assessed, as compared with when they 
felt no retesting would occur. Positive at- 
tributes may be accepted and negative at- 
tributes may be fended off if there is no im- 
mediate prospect that the accuracy of these 
self-enhancing beliefs will be challenged. 
Other potential costs and gains of accepting 
or rejecting others’ evaluations might also 
be envisioned. For instance, acknowledg- 
ment of certain positive attributes might be 
accompanied by the anticipation of favorable 
future outcomes or of increased demands 
from others. Similarly, the endorsement of 
negative attributes might lead to the antici- 
pation of social rejection or loss of other } 
favorable outcomes. In examining such prob- 
lems it is important to distinguish between 
self-presentation and self-perception, since 
certain external factors might influence the 
manner in which people present themselves 
without affecting their actual self-perceptions. 

The factors that most limit the interpreta- 
tion of these manipulated feedback studies 
are the demand characteristics of the situa- 
tion in which changes in self-perception are 
assessed. Invariably the appraisal of changes 
in self-evaluation was made in the presence of © 
the evaluator or experimenter, When the 
evaluator is present, subjects who do not 
change their self-perceptions directly dis- 
credit the evaluator's appraisal, which may ý 
be difficult, particularly if the evaluator is 
presented as an expert. Even when evalua- 
tors are absent, experimenters may be per- 
ceived as being likely to communicate with 
them. Very rarely are there clearly reported 
efforts to disguise the postmanipulation self- 
appraisal process (Shrauger & Rosenberg, 
1970). 

One major way that the significance of 
manipulated feedback studies might be en- 


NN 


hanced would involve making the assessment 


`of change less reactive and more subtle. For 


example, the appraisal might be woven into 
some other aspect of the experiment sup- 
posedly unrelated to the portion in which 
feedback was given, as has been done in coun- 
terattitudinal advocacy studies (e.g., Rosen- 
berg, 1965; Hendrick & Seyfried, 1974). 
Another possibility is to have the final self- 
evaluation made after an initial “debriefing,” 
with the evaluation presented in the context 
of an appraisal of the effects of psychologi- 
cal experiments on individuals" attitudes and 
feelings. 

A final issue in manipulated feedback stud- 
ies is whether changes in self-evaluation are 


, specific only to the self or can reflect modi- 


fications in judgments of others as well. There 
is evidence (Bramel, 1962; Edlow & Kiesler, 
1966; Steiner, 1968) that when people are 
confronted with information discrepant from 
their self-evaluations, they not only change 
their self-evaluations but also modify their 
evaluations of others on the attribute judged. 
This may reflect a process of defensive pro- 
jection or simply a change in the criteria 
they use for evaluating the attribute in ques- 
tion, Unfortunately most studies have looked 
only at shifts in the absolute level of self- 
judgments and not at changes in judgments 
of self relative to others. Such relative ap- 
praisals may be at least as significant as ab- 
solute judgments. Therefore, the effect of 
feedback on judgments of others as well as 
of oneself should be evaluated. 


Discussion and Conclusions 


The numerous studies of naturalistic and 
manipulated feedback that we have reviewed 
have had much to say about the relationship 
between others’ judgments and self-apprais- 
als; it is unfortunate that the flaws and 
limitations of these investigations have ren- 
dered the significance and validity of their 
findings questionable, Although there is evi- 
dence that individuals’ self-perceptions and 
their views of others’ perceptions of them 
are quite congruent, there is less evidence 
that self-perceptions are related to or influ- 
enced by others’ actual perceptions. None of 
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the studies of naturally occurring interac- 
tions were designed so that they would dem- 
onstrate unequivocally that receiving con- 
tent-focused feedback from others leads to 
corresponding changes in one's own selí- 
perceptions. In contrast, there is ample evi- 
dence of changes in self-perceptions follow- 
ing controlled feedback in laboratory settings. 
However, the importance of these findings is 
unclear because of the short-term nature of 
most assessments and the potential effects 
of demand characteristics. In evaluating the 
contributions and limitations of the avail- 
able research, we give some attention to how 
information from others about the self is 
transmitted, received, interpreted, and acted 
upon. These are aspects of social self-percep- 
tion that have for the most part been ne- 
glected by researchers in this area. 


Availability of Evaluative Information 


That there is minimal agreement between 
individuals’ judgments of others’ perceptions 
of them and their actual perceptions suggests 
that the communication of feedback to others 
may often be infrequent or ambiguous. ‘Al- 
though norms regarding the evaluation of 
other people’s behavior probably vary widely 
across different subcultures and situations, 
strong sanctions are often maintained against 
making direct appraisals, particularly when 
they are negative. In some of the only re- 
search on the communication of evaluations, 
Blumberg (1972) found that people report 
inhibiting the direct communication of all 
types of evaluations to others, particularly 
if it is negative or if the recipient is not 
known well. Barriers to direct expression can 
be found in intimate relationships as well as 
in more impersonal social interactions. This 
“not-even-your-best-friend-will-tell-you” phe- 
nomenon has been noted by Goffman (1955), 
who pointed out that unfavorable evaluations 
of close associates are typically given only 
when directly solicited and that in such a 
situation, chances are that the asker has al- 
ready made some negative self-appraisal. 
Perhaps this accounts in part for the popu- 
larity of sensitivity training, in which people 
have the privilege of finding out what others 
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really think of them, and of assertiveness 
training, in which they can learn to com- 
municate their true feelings about others. 
To understand the real impact of others' 
opinions, one must determine how frequently 
such opinions are communicated directly in 
people's everyday social interactions. Who 
gives evaluations? On what dimensions? Un- 
der what circumstances? How often and how 
explicitly? The answers to such questions 
would facilitate an assessment of the relative 
influence of others’ judgments on selí-per- 
ceptions, as opposed to the opposite influence 
of self-perceptions on the perception of 
others’ judgments. When information from 
others is not explicit, its interpretation may 
depend substantially on one's own self-per- 
ception on the attribute being assessed. In 
clinical contexts, for example, if people have 
concerns about what others think of them, 
it is frequently assumed that their inferences 
about others' feelings reflect a projection of 
their own self-evaluations. 
It is quite likely that direct feedback 
occurs extensively in the socialization of 
young children by parents and other adults. 
During the process of language development, 
for instance, it seems certain that children 
come to model the construct system of those 
around them and to apply these constructs 
to themselves. Symbolic interactionists (Coo- 
ley, 1902; Mead, 1934) and self-perception 
theorists (Bem, 1967, 1972; Duval & Wick- 
lund, 1972) alike have discussed the impor- 
tance of preschool interactions in the devel- 
opment of a concept of self. It is surprising, 
however, that an empirical literature sub- 
stantiating these arguments is nonexistent. 
Naturalistic studies of self-concept and per- 
ceived or actual assessments by others pick 
up developing selves as they enter the cap- 
tive environment of elementary school. The 
subjects in these studies are typically in at 
least third or fourth grade (see Tables 1 and 
2); only two studies have used first graders 
(Alberti, 1971; Trickett, 1969). Studies of 
controlled feedback almost exclusively use 
undergraduates. Since the preschool years are 
so vital to theories of the development of 
self-concept, it seems imperative that this 
period be attended to empirically. However, 
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this may be easier to recommend than tod 
implement. Trickett, for example, has noted 
the difficulties encountered in assessing the 
self-concept of first graders, which means 
that new and imaginative methods are neces- 
sary in this regard. Furthermore, recent work 
raises questions about whether young chil- 
dren possess the abstract concepts necessary 
to process information from others and use 
it in forming perceptions of themselves 
(Herzberger, Note 1). A naturalistic study 
of parent-child evaluative interactions might 
be a desirable first step in determining just 
what kind of feedback is given in the earliest 
stages of life. 

Finally, in considering the availability of 
information from others it is important to 
recognize that people who are evaluated may 
help to determine how much evaluative in- 
formation they receive. People’s frequency 
of social interaction, how directly they ask 
for information, and how much they behave 
in ways that might elicit others’ comments 
may all affect the amount of evaluative feed- n 
back received. 


Interpretation of Information From Others 


Although it is likely that people differ in 
their interpretations of others’ feedback, par- 
ticularly if that feedback is not explicit, these = 
differences have not been explored exten- 
sively. People may disagree about what cues 
from others constitute an evaluation. And 
even when cues have been identified, people 
may differ in the inferences or conclusions 
they draw from these cues about others? judg- 
ments of them. For instance, it might be im- 
portant to examine the extent to which in- 
formation is considered principally for its 
specific content or for its evaluative meaning. 
To date the evidence suggests that content- 
specific feedback changes self-descriptions 
principally for those attributes on which feed- 
back is given and only minimally on other 
attributes (Haas & Maehr, 1965; Videbeck, 
1960). However, the nature of the situation 
in which these data were obtained may have 
maximized the impersonal, objective quality 
of evaluations and minimized the generaliza- 
tion that can occur in other contexts. f 
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4 Characteristics of the evaluator may also 
be significant in determining the extent to 
which information is accepted. To date ex- 
aminations of evaluator competence (Web- 
ster & Sobieszek, 1974) imply that only 
competence relevant to the attribute being 
judged has real impact on the acceptance of 
information. Expertise of the evaluator may 
be more complex, however, when the attri- 

4 butes judged do not involve specific, clearly 
defined skills. In these more subjective judg- 
ments, evaluators' competence may be judged 
more on global indices of status or on the 
extent to which they are perceived to hold 
norms similar to one's own on the dimensions 
in question. 

A more situational aspect of the evaluator's 
competence involves whether or not the eval- 
uator has a sufficient sample of one's be- 
havior to make an adequate appraisal. Even 
if an appraiser is viewed as a good judge, 
his or her evaluation may be discounted if 
it is based on a limited or unrepresentative 
j sample of behavior. Wyer, Henninger, and 

Wolfson (1975) showed, for example, that 

observers were much more likely to base 

their judgments on the limited behavior sam- 
ple that they observed than were actors, 
whose self-appraisals were based less on that 
specific behavior sample and more on pre- 

Д vious experiences. 

Finally, the interpretation of the evalua- 
tion may depend on a perception of how can- 
did other people are being. If one believes 
that there is some ulterior motive in making 
the evaluation (e.g. ingratiation or one-up- 
manship), it may not have as much effect 
on one's self-perception as à communication 
interpreted as more genuine. 


~ 


Comparison With Self-Evaluations 


An important aspect of others’ judgments 
is how closely they agree with one's initial 
self-appraisal. Although judgments that 
match an initial self-perception may do little 
more than fortify this perception, judgments 
that are at variance frequently set up some 
dissonance or tension that requires cognitive 
reappraisal. There is an implicit disagreement 

1 ,petween symbolic interactionist and self-at- 
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tribution theories as to how such discrepan- 
cies are resolved. The symbolic interactionist 
view implies that such discrepancies are typi- 
cally dealt with by changing one's self-per- 
ceptions, whereas self-attribution theories 
suggest that people have a reasonably clear 
and stable picture of themselves and may 
not readily conform to the discrepant ap- 
praisal of another individual. 

The extent to which self-perception is 
maintained in the face of contradicting in- 
formation from others presumably depends 
on the certainty of an individual’s initial 
self-perceptions. Several factors may influ- 
ence people’s assuredness about their self- 
perceptions, all of which are related to oppor- 
tunities for examining their own behavior. 
One factor is the salience of the dimension 
on which a judgment is made. Individuals are 
expected to have more clearly developed 
opinions about themselves on dimensions that 
are more important to them. A second aspect 
regarding the opportunity for observation 
may be the degree to which the person can 
compare his or her behavior with that of 
other people (cf. Festinger, 1954). Impres- 
sions may be more firmly established if peo- 
ple have the chance to compare themselves 
with other individuals. However, the oppor- 
tunity for such comparisons may vary de- 
pending on the dimension being judged. A 
final determinant of assuredness may be the 
clarity of the criteria against which attributes 
are judged. A person is more likely to have 
a firmly established self-appraisal on an at- 
tribute that has a very clear public definition. 
One reason for children’s potential suscepti- 
bility to self-concept molding may be their 
lack of clear criteria for defining particular 
characteristics. This may also account for 
the clinical observation that negative global 
self-perceptions (e.g., “I am rotten” or “I 
am a total failure”) are resistant to change 
without exploration of what those attributes 
actually entail. 

One complication in assessing the impact 
of others’ feedback is that some changes in 
self-perception might be attributed to input 
from others when in fact they really reflect 
changes in individuals’ independent apprais- 
als of themselves. In the naturalistic studies 
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cited previously, changes toward others" per- 
ceptions could be accounted for by the in- 
dividuals having changed or reappraised their 
own behavior. Certainly there is little in this 
literature that would negate the potential 
significance of the claim of self-perception 
theories that most self-knowledge comes from 
direct observation of one's own actions. 


Maintenance of Changes 


As previously mentioned, there is little 
evidence of the long-term effects of others? 
judgments on self-appraisals, and more ade- 
quate investigations of these effects are 
clearly required. Although these investiga- 
tions would ideally involve naturally occur- 
ring interactions, manipulated feedback de- 
signs could also be employed. The use of 
negative feedback in such studies would, of 
course, be unacceptable ethically, but the 
effects of positive feedback could feasibly be 
investigated. 

Long-term investigations are particularly 
important, since at least three processes may 
mitigate the impact of others evaluations 
over time, First, discrepant feedback tends 
to be distorted so that it becomes more con- 
gruent with one's own initial self-perceptions 
(Harvey et al., 1957; Steiner, 1968; Suinn, 
Osborne, & Page, 1962). This tendency to- 
ward distortion has been demonstrated in ex- 
perimental situations, although it is unclear 
how extensively such distortions occur in 
real-life settings. 

A second mitigating factor may be that 
evaluations from another person may some- 
times induce people to change their behavior 
in an opposite direction. If, for instance, an 
individual were evaluated as being self-cen- 
tered but did not like that attribute, he or 
she might expend a special effort to be more 
altruistic and accordingly strengthen this 
perception of altruism. It has been shown 
that when subjects are told that they are 
making shorter or slower Tesponses than those 
of other individuals, they lengthen and speed 
up their subsequent responses (Burnstein & 
Zajonc, 1965; Kleinke, 1975). Thus, the 
long-range impact of others? judgments may 
sometimes be to produce either no change 
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in self-ratings or even changes in the opposite 
direction. | 

A third long-term effect of others’ feed- 
back may be that people change their social 
interactions so that they minimize their ex- 
posure to evaluators or to situations in which 
such feedback is likely to occur. Conceivably 
these mitigating long-term effects could be 
Offset by an opposing tendency for people to 
change their behavior and also their self- 
Perceptions to conform to others’ role ex- 


pectations. Unfortunately there are yet no 


investigations that have sorted out these po- 
tential outcomes, 


Some Neglected Aspects of Others’ Influence 


Tt should be noted that empirical investiga- 
tions of Mead and Cooley's looking-glass-self 
hypothesis have explored almost exclusively 
the impact of direct feedback from others. 
There may, however, be several less direct 
but equally important effects of others’ judg- 
ments on self-perception, Simply being in (ће 
presence of others may influence the manner 
in which people behave (Goffman, 1959) and 
presumably come to evaluate their own be- 
havior. At a conscious level one might de- 
liberately enhance socially desirable and min- 
imize socially undesirable behaviors when in 
the presence of others, and such changes 
could influence how one saw oneself. Less de- 
liberately controlled aspects of behavior may 
also be affected by others? presence, as sug- 
Bested in studies of audience effects on per- 
formance (e.g., Zajonc, 1965) and on self- 
evaluations of competence (Shrauger, 1972). 
Also, as Mead’s (1934) notion of the gen- 
eralized other implies, the physical presence 
of others is not imperative, so long as the 
perceiver can manage a mental impression 
of them. 

Other individuals may also influence one’s 
self-judgments by the manner in which they 
interact with people. Whether or not one 
receives help from a co-worker, for example, 
has been shown to affect one’s subsequent 
self-esteen (Fisher & Nadler, 1974, 1976). 
In certain role relationships, such as that be- 
tween a boss and subordinate, many inter- 
personal behaviors become quite clearly pre- 
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scribed. The nature of these interactions may 
convey to the individuals involved a certain 
degree of competence, or self-worth, without 
any explicit communication of these qualities 
over occurring. Although such processes have 
been described often in role theory (Goff- 
man, 1955, 1959; Scheff, 1966), they have 
much less frequently been explored empiri- 
cally, particularly with reference to their 


г effects on people's self-perceptions. 
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A third indirect way that social inter- 
action may influence self-perceptions is by 
affording the opportunity for people to com- 
pare their behavior with that of other people. 
Social comparison obviously requires the 
presence of other people at some point. It 
does not, however, prevent people from being 
active, reflective observers of their own be- 
havior, The observance of others’ behavior 
provides relative standards against which 
one’s own actions and attributes may be 
judged. Although the significance of such 
comparison processes has long been recog- 
nized, few studies have explored how attri- 
butes of those against whom one compares 
oneself influence self-evaluation (Fontaine, 
1974; Morse & Gergen, 1970; Strong & 
Gray, 1972). Morse and Gergen’s investiga- 
tion found that job applicants’ judgments of 
themselves were substantially influenced by 
the apparent competence and appearance of 
other potential applicants. Perhaps even in 
situations that do not pull so explicitly for 
social comparisons, the observation of others’ 
actions affects one’s self-perceptions. 

Finally, other people may indirectly affect 
one’s self-perceptions when they are observed 
making evaluations of other individuals. Even 
if people do not receive feedback directly, 
observing someone make a judgment of an- 
other individual may provide indirect in- 
formation about how they are viewed by the 
evaluator, How much this actually occurs 
depends of course on how explicit the criteria 
for evaluating the other person's behavior 
are and on the degree to which one sees simi- 
larity or dissimilarity between one's own be- 
havior and that of the person being evalu- 
ated. 

In sum, it may be that the aspect of the 
looking-glass-self hypothesis that has been 
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most frequently examined, the effect of di- 
rect feedback from other people, reflects only 
one of the ways that interaction with others 
has an impact on selí-judgments. Further- 
more, this means of influence may well be 
of no greater importance than the others. 
The relative ease with which direct evalua- 
tion can be explored ought not to preclude 
the examination of other viable aspects of 
social interaction that may also lead to the 
modification of self-evaluations. 
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Sex differences in the major categories of childhood behavior disorders most 
relevant to the issue of continuity between child and adult disorders are re- 
viewed. Explanations for these differences are explored with attention given to 
both the different experiences and the different endowments of the sexes. These 
differences are then compared and contrasted with sex differences in adult 


psychopathology. 


Although the sex differences in disturbed 
adults, both treated and untreated, have been 
extensively examined (Chesler, 1972; Doh- 
renwend & Dohrenwend, 1969; Garai, 1970; 
Gove & Tudor, 1973), sex differences in chil- 
dren appear to have been accorded only one 
such examination (Gove & Herb, 1974). 
Furthermore, this review is beset by the same 
limitation as part of a similar review of the 
adult literature by Gove and Tudor (1973), 
namely, as Dohrenwend and Dohrenwend 
(1975) suggested, the authors relied almost 
entirely on role theory to explain the sex 
differences, arguing that at some time or place 
one or the other sex is under greater stress 
and hence more prone to psychiatric dis- 
order. It should also be noted that the Gove 
and Herb review was without the benefit of 
Maccoby and Jacklin's (1974) encyclopedic 
coverage of the psychology of sex differences 
and its subsequent critique by Block (1976, 
1978). Hence, it is the purpose of the present 
review to extend Gove and Herb's earlier 
review in the hope of remediating its limi- 
tation. 

In the absence of a generally accepted 
taxonomy (Achenbach & Edelbrock, 1978), 
the categories of child behavior disorders to 
be reviewed are those that have a special 
relevance to the issue of continuity between 
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child and adult psychopathology (Kohlberg, 
LaCrosse, & Ricks, 1972; Rutter, 1972b). 
Childhood is considered as that period of the 
life span from birth to adolescence. The rea- 
son for this demarcation is that since adoles- 
cent sex differences in psychopathology 
closely resemble adult sex differences (Gove 
& Herb, 1974; Graham & Rutter, 1977; Rut- 
ter, 1974), it is the period prior to adoles- 
cence that requires attention. The findings, 
in accordance with Dohrenwend and Dohren- 
wend’s suggestion, are analyzed not only in 
terms of the different experiences of the sexes 
but also in terms of the sexes’ different en- 
dowments. These findings are then compared 
with those of the adult literature. 

Finally, it should be noted that although 
expository convenience dictates a considera- 
tion of factors influencing sex differences un- 
der the separate rubrics of endowment or ex- 
perience, in no way does the author construe 
these influences as polarized. Sex differences 
are construed here, as they are by virtually 
all theorists, to be multidetermined products 
of an interaction between biological and 
learning processes. This interaction has been 
variously considered by behavioral geneticists 
in terms of norm of reaction (McClearn & 
DeFries, 1973), by psychopathologists in 
terms of diathesis—stress (Rosenthal, 1970), 
by developmentalists in terms of a greater 
readiness to learn (Maccoby & Jacklin, 1974) 
and by psychosexual developmentalists in 
terms of an imprimatur (Money & Ehrhardt, 
1972) that is shaped by a social script (Gag- 
non & Simon, 1973). The aptness of each 
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,metaphor varies with the idiomatic conven- 
‘tions of one's discipline, and it is left to the 
reader to decide which if any of the images 
is the most suitable. 


Methodological Problems 


The methodological problems in making 
a diagnosis of psychological disorder in epi- 
‘demiological studies of adult populations 
have been cogently detailed by Dohrenwend 
and Dohrenwend (1974). These same prob- 
lems obviously exist in similar studies of child 
populations, as Conger and Cole (1975) 
noted in a recent example. Although it is 
beyond the scope of this review to engage 
in a detailed analysis of these problems, 
there is a major difficulty that must be con- 
sidered. 

In their review of the adult literature 
Dohrenwend and Dohrenwend scored the in- 
adequacy of studies based on treated rates. 
Hence their review focused for the most part 
on studies of untreated cases of psycho- 
pathology, of which they uncovered 70 done 
since the turn of the century. This same 
problem of focusing on treated rates exists 
in the studies of child psychopathology, but 
unfortunately the paucity of studies of un- 
treated cases renders similar recourse im- 


i possible. For example, Gove and Herb (1974) 


relied exclusively on treated cases, while 
noting that scattered results for studies of 
non-clinic-attending populations were gen- 
erally consistent with their formulations. 
Although the scarcity of prevalence studies 
of untreated psychological disorders in chil- 
dren makes any conclusions about sex differ- 
ences somewhat tenuous, it does not render 
such an attempt futile. Rather, following the 


7^ lead of Gove and Herb, such studies can be 


employed in a supportive fashion to confirm 
or disconfirm findings based on treated cases. 
Furthermore, there is the comforting finding 
of’ past researchers that prevalence rates 
based on untreated cases generally confirm 
the sex differences established by the data 
based on treated cases (Anthony, 1970; 
Gove & Herb, 1974). Hence, although the 
data base used to investigate the existence of 


Ne differences in child psychological dis- 
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orders is limited in comparison with that of 
the adult literature, it seems sufficiently 
robust to warrant such an examination. 


Adjustment Reaction of Childhood 


The most common child diagnosis is that 
of adjustment reaction of childhood (An- 
thony, 1970). The determination of the diag- 
nosis can include virtually any symptom or 
set of symptoms that appear to be precipi- 
tated by acute situational stress. 

In this regard it is interesting to note the 
general agreement among reviewers (Kanner, 
1960; Kohlberg et al., 1972; Rutter, 1972b) 
that there is little continuity between these 
characteristics and adult psychopathology. 
Hence, the most common interpretation given 
to these symptoms is that they represent 
manifestations of developmental stress oc- 
curring in essentially well-adjusted children. 

Studies of both treated and untreated 
cases of adjustment reaction of childhood 
indicate greater prevalence among males 
than among females (Anthony, 1970; Gove & 
Herb, 1974). However, this greater preva- 
lence does not manifest itself until the school- 
age years (Richman, Stevenson, & Graham, 
1975). The typical explanation given for this 
disparity is the greater stress on males due 
to their biological immaturity, their tempera- 
ment, expectations for their behavior, and the 
feminine environment in which they live 
(Bardwick, 1971; Gove & Herb, 1974). Thus 
Bardwick maintained that cultural pressure 
on girls is much less because their general 
predispositions and the cultural interpreta- 
tions of what is acceptable are more nearly 
matched. This lesser pressure enables a girl 
to experience a less stormy childhood. Ex- 
amples of greater cultural pressure cited by 
Bardwick are the greater expectations for 
the male to achieve; the greater censure of 
the male who is passive, withdrawn, and 
overdependent; and the message given the 
male to be active and aggressive and, at the 
same time, the threat of punishment when 
these actions result in nonconformity. Gove 
and Herb added that this greater cultural 
pressure impacts on a male who is less in- 
tellectually and physically mature and more 
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aggressive, thus engendering even greater 
stress. 

It should be noted that at the time these 
hypotheses were offered, research substantia- 
tion was minimal. More recently, however, 
empirical support for some of them has been 
accumulating. For example, Rutter (1977b) 
concluded his review of the literature on 
temperament by stating that there is good 
evidence individual differences in this vari- 
able play an important role in the develop- 
ment of psychological problems. He scored 
the following type of child as being especially 
at risk: a child who is emotionally tense, 
who is slow to adapt to new situations, whose 
behavior is difficult to change, who has ir- 
regular eating, sleeping, and bowel habits, 
who tends to be irritable and negative in 
mood, and who is unusually tolerant of messi- 
ness and disorder. Though Rutter himself 
did not mention sex differences, Moss (1974) 
summarized 10 years of his work with in- 
fants, consisting of several independent stud- 
les, and concluded that males were generally 
more irritable than females. This conclusion 
was confirmed in a recent study by Phillips, 
King, and DuBois (1978) on newborns that 
controlled for the methodological limitations 
of prior studies, not the least of which was 
the variable of male circumcision. Thus it 
seems that at least one of the temperamental 
attributes listed by Rutter shows a sex dif- 
ference. 

Another example is in the area of differ- 
ential pressure for achievement. In a recent 
review Hoffman (1977) indicated that al- 
though boys and girls are both encouraged 
to do well in school, some important sex dif- 
ferences in achievement pressures may exist. 
She cited observational and interview studies 
in which parental interactions with sons em- 
phasized greater achievement and competi- 
tion than did interactions with daughters. 
Parents are also more likely to want their 
sons to be hardworking and ambitious and 
are more disappointed if they do not achieve 
their academic and occupational goals. 

Other research pertinent to the traditional 
explanations are taken up later. At this junc- 
ture, however, there are two other explana- 
tions that have generally been overlooked in 
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the literature but that merit equal considera? 
tion in the delineation of the divergent en. 
dowments and experiences of the sexes. 

The first is simply that the annoyance 
threshold for male deviance is less than that 
for female deviance. For example, Shepard, 
Oppenheim, and Mitchell (1966), in a study 
that compared 50 referrals with matching 
nonreferral controls, concluded that it was 
primarily parental reaction, not severity of 
disturbance, that dictated a clinical referral, 
It appeared that clinic-attending mothers 
felt more puzzled and hopeless in coping with 
their children's problems than did control 
mothers. They further reported that the most 
obvious difference lay in the number of non- 
clinic mothers who accepted their children's 
behavior as a temporary difficulty and that 
this increased tolerance was particularly ob- 
vious among the mothers of girls. Similarly, 
Chess and Thomas (1972), in their study of | 
temperamental differences in children, indi- 
cated that parents were less tolerant of a. 
lack of persistence and of distractibility in. 
males than in females. Battle and Lacey! 
(1972), in a study of hyperactivity in 74. 
subjects drawn from the Fels longitudinal 
study, reported that mothers of highly active 
males were critical, disapproving, unaffec- 
tionate, and severe in their punishment. No 
correlation of these maternal behaviors with 
high activity levels in females was found. / 
Serbin and O'Leary (1975) observed teacher- 
child interaction in 15 preschool classrooms. 
They reported that a disruption by a male 
was more likely to elicit a reprimand than 
a similar disruption by a female; males' rep- 
rimands were also more severe, For example, 
teachers responded over three times as often 
to males who hit or broke things, and the 
boys usually got a loud public reprimand. ~ 

Hence, it may be that females mothers 
and teachers—are more likely to view the 
same disturbance as more pathological in 
the male than in the female. And since they 
are the primary sources of referral and evalu- 
ation in epidemiological studies, there is the 
resultant excess of males with adjustment 
problems. This may be partly a function of 
the fact that adults feel less comfortable 
and competent with children of the opposita 
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sex (Sullivan, 1953). Hence it is not sur- 
| prising that in a national survey, Hoffman 
(1977) indicated that some of the most com- 
mon reasons women gave for preferring a 
female were "that girls are easier to raise 
and more obedient . . . are cuter, sweeter, ог 
not as mean” (p. 648). 
This explanation, though, should be tem- 
pered by the caveat that one is assuming the 
‘disturbances are indeed the same. Although 
this assumption seems reasonable in the 
studies of Chess and Thomas, Battle and 
Lacey, and Serbin and O'Leary, since the 
authors themselves clearly designated the 
behaviors as similar, it would be rash to 
extrapolate these findings of similarity to the 
multitude of adjustment problems that the 
sexes present. For example, both Garai and 
Scheinfeld (1968) and Feshbach (1970) in- 
dicated that aggression in girls is more likely 
to assume a prosocial mode, for example, rule 
enforcement, whereas male aggression is more 
likely to be destructive. Hence the “same” 
aggression is less tolerated in males because 
it is actually dissimilar in mode. Though 
Maccoby and Jacklin (1974) challenged the 
validity of this distinction, it still serves the 
function of illustrating the slipperiness of 
designating similarity. 
. A second reason for the greater prevalence 
of male adjustment reactions of childhood 
relates to a difference in endowment, namely, 
that males may be constitutionally more vul- 
nerable not only to biological but to psycho- 
logical stress as well. As the reviews by Garai 
and Scheinfeld, Maccoby and Jacklin, and 
Birns (1976) indicate, there is little doubt 
that despite the fact that males are larger 
and stronger than females at almost every 
age, they are more vulnerable to almost any 
kind of physical hazard; and this vulner- 
ability is magnified by the effect of poverty 
(Birns, 1976). Though the ratio of male to 
female conceptions is 130:100, the ratio is 
reduced to 105:100 at birth in the United 
States, They suffer more abortions, still- 
births, miscarriages, prematurity, anoxia, and 
other birth complications. They are also more 
likely than females to suffer serious defects 
as a result of prematurity (Braine, Heimer, 
Wortis, & Freedman, 1966) or anoxia (Gott- 
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fried, 1973). During infancy 37% тоге 
males die, and throughout life males are 
more afflicted by the major diseases (Garai 
& Scheinfeld, 1968). They are also more 
likely to suffer ill effects from malnutrition 
(Tanner, 1970) and radiation (Rutter, 
1972b). 

The reasons commonly cited for this 
greater male vulnerability are greater male 
immaturity (Rutter, 1972b), greater male 
susceptibility to sex-linked diseases (Garai & 
Scheinfeld, 1968), and possible adverse ma- 
ternal immunological reaction to male fetal 
tissue because of the presence of the male Y 
chromosome (Mussen, Conger, & Kagan, 
1974). Analogously, there appears to be a 
greater male vulnerability to psychological 
stress. 

Rutter (1970) conducted a study of 200 
families and matching controls in which one 
of the parents was a psychiatric patient. The 
sample contained an equal number of male 
and female psychiatric patients who had one 
or more children under the age of 15. In- 
formation about the families was obtained by 
psychiatric interview of the parents, and in- 
formation about the children’s psychiatric 
state was obtained by teacher questionnaire. 
He found that discord and disruption in the 
home was consistently and strongly associated 
with anitsocial disorder in boys but not in 
girls. No consistent associations were found 
between family characteristics and neurosis 
in either boys or girls. Nor did the sex of 
the ill parent bear a relation to the likeli- 
hood of the children’s developing a psychi- 
atric disorder. He then in some detail con- 
sidered and dismissed the following possible 
methodological biases as alternate explana- 
tions to this provocative sex difference in 
preadolescent children: The sex difference is 
peculiar to the children’s behavior at school; 
teachers cannot perceive deviance in girls as 
well as in boys; and the sex difference is due 
to diagnostic differences. By reviewing other 
relevant studies he concluded and found that 
though the evidence is surprisingly meagre 
and to some extent contradictory when con- 
sidered in relation to his clear-cut findings, 
it indeed appears that males are more vulner- 
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able to adverse effects of family discord and 
disruption. 

Similarly, Wolkind and Rutter (1973) in 
their Isle of Wight survey reported that 
there was a strong tendency for children who 
were in short-term (6 months or less) care 
because of maternal confinement or physical 
illness to have a behavioral disturbance, as 
defined by teacher and parent interviews. 
The tendency was much more marked for 
boys than for girls. With long-term care, 
howevers, girls seem to be as susceptible to ill 
effects as are boys. Rutter (1972a), in his 
review of the literature on maternal depriva- 
tion, pointed out that the findings on sex 
differences are somewhat contradictory and 
that no differences have been found in many 
studies. However, where there has been a 
difference, the male has usually been found 
to be more vulnerable to the adverse effects 
of separation. Bowlby (1973) cited a finding 
parallel to Rutter's in his review of the litera- 
ture on separation anxiety. He reviewed five 
studies in which three found no sex differ- 
ences and two found that males evidence 
greater separation anxiety than females. In 
the most comprehensive review to date, Mac- 
coby and Jacklin (1974) reached a similar 
conclusion. In several studies of children aged 
10 months to 3 years, males exhibited greater 
resistance to separation, as indicated by 
greater distress at separation or greater like- 
lihood of quickly following the departed 
figure. More recently Waters (1978) reported 
that the normative data for the Ainsworth 
"strange situation" indicated that crying and 
its correlates were greater in males than in 
females in the second separation and reunion 
sequence at the ages of 12 months and 18 
months. 

A group of studies that are cognate to 
the studies on separation and deprivation are 
the studies of adoptive children. This kin- 
ship resides in the fact that adoptive children 
experience a break in the continuity of care 
prior to placement and frequently come from 
deprived backgrounds (Hersov, 1977a). In 
an extensive review of this literature, Hersov 
concluded that these and other stresses result 
in adoptive children's being at greater risk 
for the development of a psychiatric disorder. 
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Furthermore, in those few studies that | 
employed a ‘control group, males have been 
found to be more at risk than females. 
Further indication of greater male vulner- 
ability to environmental stress is found in 
the cognitive realm. Bayley (1970) conducted 
a review on the association of parental vari- 
ables such as warm, hostility, restrictiveness, 
and so on with IQ. She concluded that there 
were clear indications that the mental abili- 
ties of males are more strongly related than 
those of females, both positively and nega- 
tively, to the emotional aspects of their en- 
vironment. She also noted that there were 
indications that early experiences are more 
likely to have long- -lasting effects on boys 
than on girls. Similarly, in two separate re- 
views and a recent study, maternal employ- 
ment has been found to have a negative effect 
on male academic achievement (Etaugh, 
1974; Gold & Andres, 1978; Hoffman, 1974), 
And finally, in a literature review on the 
effects of father absence on children’s cogni- 
tive development, Shinn (1978), although 
concluding that the proportion of studies 
that found negative effects was the same for 
males and females, noted that three studies 
found stronger negative effects for males. 
Recented, however, Kamin (1978) те 
examined Bayley’s data and concluded that 
no significant sex difference in susceptibility 
of IQ to environmental influence has been 
demonstrated. Е urthermore, he contended 
that the differences in male and female sam- 
ples with regard to maternal education and 
children’s IQ variance confound whatever 
sex difference may be detected. Thus, for the 
present, it appears that the significance of 
Bayley’s findings has been attenuated. 
Perhaps the most interesting aspect of the 
above finding is that it dovetails so nicely 
with the undisputed fact of greater male 
vulnerability to biological stressors. Hence 
one is tempted to posit a correspondingly 
greater constitutional vulnerability to psy- 
chological stressors. The data, though fat 
less decisive than they are for biological 
stress, do offer some intriguing support for 
this hypothesis. For example, it may be that 
the same psychological stress is more severe 
for males than for females simply because it 
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interacts with a less mature organism. It may 
lalso be that just as the female organism is 
more stable for physical and mental growth 
(Mussen et al., 1974), so it is more stable 
in maintaining psychological stability. 
However, there exist equally plausible ex- 
planations for the seemingly greater male 
vulnerability to psychological stress. For ex- 
ample, in the case of familial stress, what is 
apparently the same situation may in prac- 
tice be different depending on the sex of the 
child. Thus it may be that the male child 
for some reason has more contact with the 
disturbed parent, has more responsibility for 
the home when the parent is ill and so on. 
Though this hypothesis was tested and found 
it has not yet 
been adequately explored. It may also be 
that boys and girls respond to different fam- 
ily stresses. For example, Hoffman (1974) 
noted that in most child developmental stud- 
ies girls show ill effects from too much super- 
vision or control, whereas boys typically 
suffer from too little. Further, if one examines 
the studies indicating greater male vulner- 
ability to psychological stress, it seems that 
lack of control rather than too much control 
more correctly typifies the family situation. 
Hence, it may be that in future studies that 
examine a greater range of stress variables, 
females may prove to be more vulnerable 
than males to some of them. 
Parenthetically, it should be noted that 
the reasons for differential effect of under- 
and overcontrol are not clear. Though it 
would be convenient to evoke the hypothesis 
of differential socialization of the sexes, Mac- 
coby and Jacklin (1974) concluded that the 
data reveal a remarkable uniformity in the 
socialization of the sexes. This has been chal- 
lenged by Block (1978) and Hoffman 
(1977), largely on the basis of more recent 
research, none of which addressed itself 
directly to the point at hand. One can only 
note that this finding has an interesting 
parallel in the clear sex difference in adult 
psychopathology, in which there exists greater 
male prevalence of disorders of undercontrol, 
that is, personality disorders, and a greater 
overcon- 
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trol, that is, neurotic disorders (Dohrenwend 
& Dohrenwend, 1974). 


Learning Difficulty 


There is a male preponderance in all dis- 
orders that involve a specific delay in de- 
velopment (i.e., speech or language delay, 
nocturnal enuresis, and clumsy child syn- 
drome; Rutter, 1977a). This preponderance 
continues into the school years; Kessler 
(1966) noted that academic difficulty is the 
reason for referral of at least three fourths 
of the children between the ages of 7 and 14 
and that it is indisputably a male problem. 
Whether one looks at mental retardation 
(Lehrke, 1972, 1978), reading difficulty 
(Rutter & Yule, 1977), hyperactivity (Cant- 
well, 1977), or simply lower grades (Mac- 
coby & Jacklin, 1974), males predominate. 
Given the seriousness of this problem in its 
portent for future adult maladjustment 
(Kohlberg et al., 1972), an examination of 
its possible causes becomes of major signifi- 
cance. 

The causes cited by Gove and Herb 
(1974) were the slower intellectual and phys- 
ical development of the male and the incon- 
gruity of establishing a male identity in the 
feminine world of the school. Again it seems 
that the endowments and experiences of the 
sexes are more divergent than the causes 
noted by Gove and Herb. 

The feminine culture theory of male learn- 
ing difficulty, although confidently put forth 
by Gove and Herb as well as others, re- 
quires closer scrutiny as to exactly how it 
operates. For example, sex of teacher per se 
does not seem to be the prime factor, since 
a review of eight studies found no notable 
favorable effects of male teachers on male 
students (Good & Brophy, 1977). That male 
teachers have little, if any, differential influ- 
ence on male achievement is explained by the 
fact that male teachers interact with boys 
and girls in the same general way that female 
teachers do (Good & Brophy, 1977). 

More relevant feminizing factors than the 
sex of the teacher seem to be either what is 
studied or the cultural stereotype toward 
learning in general. For example, Good and 
Brophy reported two studies in which the sex 
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difference in reading was eliminated when 
males read material of high interest to them. 
Also, Johnson (1976) surmised that one rea- 
son why the sex difference in reading is not 
evident in cultures such as those of Nigeria, 
Germany, and England is that reading in 
these cultures is deemed a masculine activity. 
Furthermore, to the extent that these and 
other factors are operative, they seem to 
affect the grades teachers award to males 
more than they affect actual achievement. 
Thus Garai and Scheinfeld (1968), in their 
review of sex differences in intellectual per- 
formance, indicated that females in elemen- 
tary and high school are generally awarded 
higher grades than males despite the fact 
that males achieve as well as or, in some 
cases, better than females. 

Though these environmental factors un- 
doubtedly account for part of the observed 
sex difference in some learning difficulties, 
there are some equally compelling biological 
factors stemming from the dimorphism of 
the sexes that play a significant role. The 
first and most obvious biological difference, 
which Gove and Herb (1974) mentioned, is 
that the male child at school age lags approx- 
imately 1 year behind the female in physical 
maturation (Garai & Scheinfeld, 1968). Tan- 
ner (1970) hypothesized that the male matu- 
rational lag is probably due, indirectly, to 
the action of the genes on the Y chromo- 
some. Thus, children with the abnormal chro- 
mosome constitution XXY  (Klinefelter's 
Syndrome) have a skeletal maturity indis- 
tinguishable from that of the normal male, 
and children with the chromosome constitu- 
tion XO (Turner's syndrome) have a skele- 
tal maturity approximating that of the nor- 
mal female XX constitution. 

This greater female maturation appears to 
be paralleled by a greater intellectual matu- 
ration. Bayley (1956) indicated that physi- 
cal growth, as measured by height and skele- 
tal maturity, is positively correlated with IQ 
scores. Note, however, that in individual 
cases physical growth, as measured by per- 
centage of mature height achieved, is not cor- 
related with IQ measured in terms of per- 
centage of 21-year-old intelligence scores 

achieved. Parenthetically it should be added 


ROBERT F. EME | 


here that Maccoby and Jacklin (1974) cite 
Bayley to caution against the acceptance o 
a correlation between maturation and intel 
lectual growth. However, as Bayley clearly 
indicated, this lack of correlation pertains 
only to individuals and not to groups differ- 
ing in maturation. Thus, Sherman (1978) 
aptly pointed out that though physical 
growth spurt might not correlate with mental 
growth spurt within individuals, groups could 
differ, with both the physical and the mental 
growth spurts coming earlier in females, 
Tanner, (1970, 1978), in his review of the 
relationship between physical maturation 
and mental ability, stated that the more 
physically mature scored higher on mental 
tests in North American and European pop- 
ulations at all ages tested, going back as far 
as 6} years. He thus concluded that in age- 
linked examinations more physically mature 
children have a significantly better chance 
than less mature children. 

At present there seems to be general артее- 
ment with Sherman's cautious statement that 
there is no easy resolution to the question of 
whether the sexes differ in cognitive matura- 
tion. Though, as she added, it would be un- 
wise to ignore this possibility. This caveat 
seems to be well-founded and should be most 
heeded when one considers young children. 
For example, Maccoby and Jacklin (1974) 
noted that when sex differences are found in 
general intellectual abilities between the ages 
2-7 years, these differences usually favor 
girls. Furthermore, Block (1976) pointed out 
that their caution in interpreting this finding 
as being due largely to the disadvantaged 
origin of the samples is unjustified. 

Hence, it comes as no great revelation to 
find Anthony (1970) stating that in the 
first grade boys are referred for help 11 times 
as often as girls for social and emotional im- 
maturity, a syndrome characterized by a high 
tate of absenteeism, fatigue, inability to at- 
tend and concentrate, shyness, poor motiva- 
tion for work, inability to follow directions, 
slow learning, infantile speech patterns, and 
problems in the visual-motor and visual per- 
ception areas. 

The second obvious biological difference, 
which has been discussed before, is the 
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greater male vulnerability to a host of pre-, 
регі-, and postnatal stresses resulting in brain 
dysfunction (Birns, 1976; Garai & Schein- 
feld, 1968; Maccoby & Jacklin, 1974). Fur- 
thermore, the role of brain dysfunction in 
learning disturbance seems well established 
(Heincke, 1972; Rourke, 1975), though the 
precise nature of the dysfunction is vigor- 
ously disputed (Vellutino, 1977). Hence, one 
fof the sequelae of the greater male vulner- 
ability to biological stress is the greater 
prevalence of males who are predisposed to 
learning disorders. 

The third more controversial biological 
difference is that of greater male variability. 
Shields (1975) in tracing the evolution of 

the concept noted that one of its first serious 
 discussants was Darwin, who used it to ex- 
plain how in many species males developed 
greatly modified sexual characteristics, where- 
as females did not. This principle was next 
brought to the attention of psychologists by 
Ellis (cited in Shields, 1973), who extended 
it to mental abilities as well as physical 
traits. After noting that there were more 
men than women in homes for the mentally 
deficient, which indicated a higher incidence 
of retardation among males, and that there 
were more men than women on the rolls of 
the eminent, which indicated a higher preva- 
lence of genius among males, he concluded 
і that greater male variability probably held 

for all qualities of character and ability. A 

current application of this principle to mental 

ability comes from Lehrke’s (1972, 1978) 

X-linkage theory of intellectual traits. 

Lehrke (1972, 1978), along with his prede- 
céssor Ellis, maintained that males are more 
frequently represented at the extremes of 
the range of general intelligence. He further 

& proposed that this greater male variability 
is best explained by assuming that there are 
major genes for intelligence on the X chro- 
mosome. Concerning the first hypothesis of 
greater male variability at the extremes of 
intelligence, Lehrke's own review of the lit- 
erature, along with an independent review 
of 27 community epidemiological surveys 
by Abramowicz and Richardson (1973), in- 
dicated a greater prevalence of male re- 
hoo in both institutions and communi- 
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ties, spread out in time over 80 years and 
in space from Australia to Scandinavia. His 
response to the criticism of sex bias in diag- 
nosis as an explanation was as follows: First, 
he contended that if there is a bias, it is in 
the opposite direction, since retarded females 
are institutionalized more frequently as a 
means of controlling their fertility. Second, 
he maintained that it is hardly plausible that 
all the studies showed the same type of sex 
bias in determining which individuals were 
counted as retarded; and even if some such 
bias were present, he contended that it is 
hardly conceivable that it would result in 
such extreme levels as to account for the 
difference seen in most studies (e.g., as much 
as a 76% male excess in some studies). 

However, he noted that those studies show- 
ing the greater numbers of retarded males 
may mean merely that there is a greater 
amount of pathology affecting males’ intellec- 
tual functioning, unless it can also be shown 
that there are more males at the high end of 
the distribution of IQs and that the greater 
prevalence of males at the low end follows 
a sex-linked pattern of inheritance. Since the 
contention of a greater male prevalence is 
both seriously questioned (Maccoby & Jack- 
lin, 1974; Sherman, 1978) and tangential to 
the present discussion, it is the evidence for 
the sex-linked pattern that is examined. 

The most persuasive support for this link- 
age comes from the findings that retarded 
women married to normal men are twice as 
likely to have retarded offspring as retarded 
men married to normal women and that there 
is a marked excess of families with male-only 
retardates. Anastasi (1972) criticized these 
supporting data by suggesting that since 
women have more of a role in child rearing, 
retardation should have a more debilitating 
effect on their offspring than should the re- 
tardation of their husbands. The force of this 
criticism became blunted, however, when 
Lehrke noted that this social deprivation hy- 
pothesis applies mainly to mild retardation 
and can hardly explain a fivefold increase 
in severely retarded children born to retarded 
mothers as compared with retarded fathers. 
Furthermore, this criticism, along with the 
criticism that the major factor may be ma- 
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ternal prenatal rather than maternal gene in- 
fluences on the X chromosome, fails to ac- 
count for the strong tendency of male re- 
tardation to run in families and of which the 
most likely explanation is an X linkage. And 
finally, in a sample of 5,049 pairs of individ- 
uals, Freire-Maia, Freire-Maia, and Morton 
(1974) examined three alternate explana- 
tions of what they referred to as an estab- 
lished sex difference in retardation. They con- 
cluded that a sex-modified threshold for men- 
tal retardation that includes sex-linked genes 
is more consistent with the evidence than 
hypotheses that stress either prenatal en- 
vironmental or postnatal maternal socializa- 
tion. 


Psychosexual Disorders 


Davison and Neale (1978) indicated that 
in Diagnostic and Standard Manual III sex- 
ual deviations will probably become psycho- 
sexual disorders and be divided into three 
categories in which only the gender identity 
or role disorder category (e.g. transvestism 
and transsexualism) will be pertinent to a dis- 
cussion of childhood disorders. The two other 
categories of paraphilias (e.g., fetishism and 
rape) and psychosexual dysfunctions (e.g., 
impotence and dyspareunia) are the province 
of adult psychopathology. 

In an adult population males exceed fe- 
males in disorders of gender identity (Green, 
1974; Pauly, 1974); and though studies oí 
this disorder in childhood are just beginning, 
what research does exist points to a similar 
greater male prevalence among children 
(Green, 1974). In childhood this disorder 
takes the form of a cross-gender identifica- 
tion on such dimensions as self-concept, play- 
things, clothing preference, playmate prefer- 
ence, and so on (Green, 1974). Although re- 
search on the significance of this behavior for 
females is virtually nonexistent, three follow- 
up studies on males have documented the con- 
tinuity of this behavior into adulthood. Out 
of a total sample of 27, 15 became adult 
transsexuals, transvestites, or homosexuals 
(Green, 1977). 

Two theories have been offered to explain 

this greater male prevalence. A psychoana- 
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lytically oriented theory of identification pro- 
poses that gender differentiation is more 
difficult for the male. For although both sexes 
initially identify with the mother, it is only 
the male who has to switch this identifica- 
tion (Green, 1974). Consequently, this addi- 
tional hurdle entails the concomitant greater! 
probability that male gender differentiation 
will not be successful. 

A biologically based theory proposes that 
there is a prenatal hormonal hurdle that only 
the male has to surmount (Green, 1974): 
that is, as Money (1974) has indicated, it 
appears that nature’s rule is that to mascu- 
linize something must be added. Specifically, 
at about the 6th week of prenatal develop- 
ment, masculinizing substances must be re- 
leased by the fetal testis for morphologic 
differentiation of males, whereas no sex hor- 
mone whatsoever is necessary for the differ- 
entiation of morphologic females. Thus, if 
both embryonic gonads are removed prior 
to the critical period when the sexual anat- 
omy is formed, then the embryo will pro- 
ceed to differentiate as a morphologic female, 
regardless of genetic sex. Of these two mas- 
culinizing substances, one, known only by 
its function, is the müllerian inhibiting sub- 
stance. Without it the male is born with the 
müllerian ducts differentiated into a uterus 
and fallopian tubes, as in the female. The 
other substance is androgen, the male sex 
hormone, essential for differentiation of the 
internal masculine reproductive tract and for 
the differentiation of the external sexual an- 
lagen into male structures instead of their 
female homologues—that is, the clitoris, cli- 
toral hood plus labia minora, and labia ma- 
jora, instead of, respectively, the penis, penile 
skin covering, and scrotum. This release of 
fetal hormones results not only in the afore- 
mentioned genital dimorphism but also in 
a brain dimorphism, the significance of which 
is discussed later. Parenthetically, it should 
be mentioned that even in the last stages:of 
differentiation in puberty, the changes of 
males are more striking and extensive than 
those of females (Tanner, 1978). 

Thus, in a dual system in which the female 
path automatically evolves and the alternate 
male path requires specific influences at spe 
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cific intervals, more errors probably occur 
along the latter path. This greater proba- 
bility of error is theorized to contribute to 
the greater prevalence of male gender iden- 
tity disorders. 


Antisocial Behavior 


In their review Kohlberg et al. (1972) 
stated that antisocial behavior, particularly 
when some estimate of severity is taken into 
account, is the single most powerful pre- 
dictor of adult maladjustment. Hence the 
decisive male preponderance in aggressive 
behavior, which is probably the most un- 
equivocal sex difference in the literature 
(Feshbach, 1970; Hoffman, 1977; Maccoby 
& Jacklin, 1974; Terman & Tyler, 1954), 
takes on a special significance. Furthermore, 
this difference, as Maccoby and Jacklin con- 
cluded, is real and cannot be explained away 
by simply positing different modes of ex- 
pressing aggression (Feshbach, 1970). How- 
ever this should not be construed to mean 

| that women are always less aggressive than 
men (Frodi, Macaulay, & Thome, 1977). 
This preponderance in aggressive behavior 
manifests itself in males’ exceeding females 
in both delinquent and nondelinquent dis- 
turbances of conduct (Wolff, 1977) and in 
externalizing symptoms in general (Anthony, 

| 1970). This ratio persists into adulthood, 
where males predominate in personality dis- 
orders (Dohrenwend & Dohrenwend, 1969, 
1974). 

The most common explanation given for 
this disparity is differential socialization. 
Mead (1949) provided some of the most dra- 
matic data for this explanation with her 
findings of sex role reversal among the 

4 Tchambuli, in which the females were the 
aggressive, dominating personalities. Paren- 
thetically, it should be noted that the re- 
versal was not complete, since it is the men 
who fight when the Tchambuli go to war 
(Brown, 1965). Since then authors, follow- 
ing the lead of Sears, Maccoby, and Levin's 
(1957) finding that parents make the great- 
est distinction between child rearing of boys 
and girls in the area of aggression, have at- 
tempted to investigate the nuances of this 
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socialization process. Reviews on sex role ac- 
quisition by Mischel (1966), Mussen (1969), 
Feshbach (1970), Bardwick (1971), and 
Maccoby and Jacklin (1974), just to name 
a few, typically mention learning mechanisms 
such as operant conditioning, modeling, and 
frustration aggression as mediators of the 
socialization process. On the basis of these 
and numerous other reviews, further slaying 
of the Freudian myth of innate female pas- 
sivity would be tedious overkill; and one can 
safely assume that differential socialization 
is a major factor in explaining the sex dif- 
ferences in antisocial behavior. 

As with many myths, however, there re- 
sides an element of truth; and it is likely 
that differential socialization is not the com- 
plete explanation. Most authors have recog- 
nized this. For example, Feshbach (1970), 
in his exhaustive review of childhood ag- 
gression, wondered “whether from a biosocial 
view, it is also reasonable to ask whether it 
is easier to facilitate aggressive behaviors in 
boys than in girls and what the implications 
of this training might be for other behaviors 
of the child” (p. 189). In view of current 
research, it might be added that it would be 
unreasonable to assume otherwise. In the 
most extensive review to date, Maccoby and 
Jacklin (1974) concluded that the higher 
level of male aggression probably cannot be 
completely explained by а learned fear of 
aggression among girls, by any tendency for 
girls to reinforce the aggression of boys, or 
by the tendency of adults to reinforce aggres- 
sion more in males. This conclusion regarding 
the socialization of aggression is part and 
parcel of Maccoby and Jacklin’s overall con- 
clusion that parental socialization data show 
little differences between the sexes. And al- 
though this surprising and controversial in- 
terpretation has been seriously questioned 
(Block, 1978), the conclusion that differential 
socialization is not an exhaustive explanation 
of the sex difference in aggression seems se- 
cure, as the following discussion soon indi- 
cates. Maccoby and Jacklin contended that 
the male’s greater aggression has a biological 
component. This biological disposition to 
greater aggression stems not only from the 
obviously greater mesomorphy of the male, 
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which is manifested by age 5 (Willerman, 
1979), but also from the effects of the male 
hormone on the organism. 

The correlation between mesomorphy and 
variables such as aggression and delinquency 
is clearly established whether one interprets 
this correlation as a manifestation of the 
same underlying biological structure or as the 
influence that different constitutions have on 
the successful reinforcement of instrumental 
responses (Feshbach, 1970; Hall & Lindzey, 
1973; Shah & Roth, 1974). This predisposi- 
tion to aggression because of greater psysical 
strength is further reflected in the fact that 
there is no society on record in which the 
female does the actual fighting in warfare 
(Brown, 1965). 

The second factor predisposing the male to 
greater aggressivity is the male hormone, an- 
drogen, whose importance has been dramati- 
cally demonstrated in animal research. Re- 
views by Money and Ehrhardt (1972), Hart 
(1974), Quadagno, Briscoe, and Quadagno 
(1977), Reinisch and Karow (1977), and 
Vandenberg (1978) indicate that whereas 
the male of the vertebrate species is the more 
aggressive in both laboratory and natural 
situations, the administration of the male 
hormone either pre- or postnatally results 
in the female's being equally aggressive. 

Something analogous to this also occurs in 
humans. Block (1976) disagreed with Mac- 
coby and Jacklin's (1974) verdict of no sex 
differences in activity level. She pointed out 
that not only have they erred in their in- 
terpretation of some of the studies but they 
have also omitted nine relevant studies, all 
of which reported a higher male activity 
level. In addition, a recent epidemiological 
investigation of 705 3-year-olds randomly 
sampled from the community reported that 
although the overall prevalence of behavior 
problems did not differ for the sexes, males 
did exceed females in being described as too 
active (Richman et al., 1975). Furthermore, 
Willerman (1979) indicated that mortality 
figures among children aged 1-4 years show 
that boys are much more likely than girls to 
die from accidents and that many of these 
differences are clearly present around age 1. 
Although acknowledging the possibility of 
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differential child-rearing practices, he ] 
cluded, and is supported in this conclusion 

by Maccoby (1976), that there are only 
weak indications that parents may be less 
watchful of boys than girls in the early 
years. Hence, even though Fagot (1978) has 
recently reported that in children aged 20-24 
months, males are more likely to be left alone 
by their parents, it seems that Willerman is) 
still correct in concluding that boys are more 
likely to have a higher activity level than are 
girls. 

Hence, a higher male activity level (which 
serves as a precursor to a higher aggression 
level [Patterson, Littman, & Bricker, 1967]) 
seems to be an established sex difference, 
Furthermore, Quadagno et al.’s recent review 
of the research on fetally androgenized hu-| 
man females supports Maccoby and Jacklin’s | 
contention that the consistency of findings 
with animal experimental work of a higher 
activity level among females is especially 
compelling in establishing a biological base 
for this difference. 

This parallel with the animal research is 
all the more convincing in light of the ex- 
cessive caution that Maccoby and Jacklin 
attached to attributing too much significance 


to this analogue. They, along with Quadagno 


et al, suggested that the cortisone therapy 
that the females received may account for 
their higher activity level. However, this cau- 
tion is rendered less persuasive when it is 
noted that the females whose syndrome was 
progestin induced did not receive such treat- 
ment and yet manifested the higher activity 
levels. Furthermore, similar investigations of 
boys with either the adrenogenital or pro- 
gestin-induced syndromes reveals comparable 
increases in activity level (Brecher, 1971; 
Erhardt & Baker, 1975). 

They also warned that since the results 
were based on parental report, the report may 
have been inaccurate or the parental percep- 
tion of the females as more tomboyish may 
itself have induced the reported difference; 
but this also seems overly cautious. Reinisch 
and Karow (1977), in their review of the 
effects of prenatal exposure to synthetic es- 
trogens and progestins on human develop: 
ment, concluded that it seems unlikely that 
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E fact a mother knew her child was treated 

had a significant effect on her treatment of 
that offspring. They indicated that different 
hormone treatments have been shown to have 
different effects, which is difficult to explain 
in terms of maternal caretaking. 

There is additional evidence that maternal 
knowledge is not a significant factor in the 
common finding that mothers have no recall 
of their having taken the hormones. Even in 
the case in which the effects are a change 
in genital morphology, as in the oft quoted 
Erhardt and Money (1967) study, parental 
attitude toward the child and her behavior 
is difficut to predict. Erhardt (1973) noted 
that parents did not have a consistent atti- 
tude toward their masculinized daughter, 
when compared with control parents, that 
could explain the tomboy syndrome. 


, Neurosis 


In contrast with children with antisocial 
symptoms, children with neurotic problems 
are almost as likely to be mentally healthy 
adults as a random sample of the population 
(Hersov, 1977b; Kohlberg et al., 1972; Rut- 
ter, 1972b). Also in marked contrast with 
the female preponderance in adult neurosis 
(Dohrenwend & Dohrenwend, 1969, 1974), 
males either equal or exceed females in child- 
hood neurosis (Gilbert, 1957; Gove & Herb, 
1974; Hersov, 1977b). It is not until adoles- 
cence that the adult ratio begins to manifest 
itself (Gove & Herb, 1974; Hersov, 1977b; 
Rutter, 1974; Terman & Tyler, 1954). Many 
of the reasons for this greater male preva- 
lence seem to be similar to those already 
adduced for the greater male prevalence in 
childhood adjustment reactions and hence 
are not in need of further elaboration. Fur- 
thermore, with the additional explanation of 
adverse reactions secondary to the greater 
male prevalence of learning problems (Cor- 
bett, 1977; Rutter & Yule, 1977), it seems 
that the major interpretations have been ex- 
plored. Hence, what remains is to examine 
the reasons for the difference between child 
and adult sex ratios. 

Gove and Herb (1974) concisely sum- 
marized the explanations that rely on sex 


wr theory. These explanations are evaluated 


585 


in the light of more current research. and 
complemented by explanations that have a 
more biological focus. 

Gove and Herb sounded their familiar 
theme of differential stress and hypothesized 
that adolescence brings an increase of stress 
ior females and a decrease for males. The 
feminine sex role is thought to become more 
stressful first because there is a sudden nar- 
rowing of the sex role, in that the female is 
restricted from engaging in activities that are 
then deemed too masculine. Such a precipi- 
tous constriction of sex role is thought to 
induce conflict and anxiety since these ac- 
tivities have most likely been integrated into 
her personality. The prime example given of 
this type of stress is that females who were 
once rewarded for academic success find, in 
adolescence, that they should not surpass 
men. Consequently, they come to fear suc- 
cess. In support of this hypothesis, Hoffman 
(1977) indicated that females, though no less 
achievement oriented than males, appear 
more attuned to the negative consequences 
of academic and occupational success. This 
concern reflects the realities of the situation, 
since for women—particularly during the 
adolescent and college years when heterosex- 
ual relations are salient—the rewards of high 
academic and occupational success are un- 
certain and their costs—often in the form of 
affiliative loss—real. However, if fear of suc- 
cess is assessed on a fantasy basis, there ap- 
pears to be little overall sex difference 
(Tresemer, 1974; Zuckerman & Wheeler, 
1975). 

Second, females are hypothesized to per- 
ceive their roles as depending on the actions 
of others, and because of this they experi- 
ence more uncertainty about the future. This 
hypothesis receives strong support from Dou- 
van and Adelson’s (1966) massive study of 
3,500 adolescents in the 1950s. Both then 
and now, for a majority of females identifica- 
tion with a future adult role involves pri- 
marily that of wife and mother (Conger, 
1977). As a result, Douvan and Adelson 
concluded that females’ adolescent adaptation 
is more difficult because they face a more 
ambiguous task in adapting their present life 
and self-concept to the future. The ambi- 


586 


guity stems from the fact that marriage, un- 
like occupation, is less a matter of simple 
individual choice for females than for males 
and lies not in the immediate future but be- 
yond in some relatively unspecified time. 
Marriage lends itself neither to rational plan- 
ning nor to specific preparation, since it in- 
volves the decision and initiative of another 
person. 

Furthermore, it seems that with the tradi- 
tional feminine role in a state of transition, 
the assumption of a future feminine identity 
is becoming even more ambiguous. The al- 
ternate satisfactions of career achievement 
are often seen as incompatible with those of 
marriage and motherhood (Hoffman, 1977). 
As Conger suggested, this may mean that the 
female is exposed to conflicting social re- 
wards and punishments no matter which role 
she assumes. 

Third, because of their characteristic de- 
pendency, females are hypothesized to find 
the transition to an independent status, 
which starts in adolescence, more difficult. 
Though Maccoby and Jacklin (1974) have 
challenged this conventional notion of greater 
female dependency, other reviewers have re- 
affirmed its validity (Block, 1976; Hoffman, 
1977). What appears to be disconfirmed, 
however, is the hypothesis that females find 
the transition to an independent status more 
stressful. Conger's review indicates that fe- 
males appear to experience fewer and less 
stressful conflicts over the development of 
independence than do males, particularly in 
early adolescence. He indicated that males 
are more likely to be actively engaged in es- 
tablishing independence from parental con- 
trol, to be concerned with issues of self- 
esteem and achievement of responsibility for 
their own actions, and to be more preoccu- 
pied with issues of self-control (e.g., control 
of temper and impulsiveness). 

It may be possible to reconcile these seem- 
ingly contradictory positions in two ways. 
First, it may be that Conger's (1977) find- 
ings are applicable to late adolescence. This 
possibility however, while tempting, remains 
speculative. A second possibility with more 
research support is that the two positions 
look at different manifestations of the same 
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phenomenon. Rutter, Graham, Chadwick, an 
Yule (1976) examined the concept of ado- 
lescent turmoil in the context of finding 
from a total-population epidemiological stud 
of Isle of Wight 14—-15-year-olds. Thou 
their findings do not directly relate to the 
independence issue, an extrapolation to this 
issue seems appropriate. The study con- 
cluded, along with American studies as sum-| 
marized by Conger, that parent-child aliena- 
tion is not a common feature unless ado- 
lescents have already showed psychiatric 
problems. In addition, the study supported 
Conger in finding that when alienation ex- 
isted, it was more common among males, 
However, the study also concluded that inner 
turmoil as represented by feelings of misery 
and self-depreciation was quite frequent and! 
more common among females. Thus it may 
be this manifestation of stress that Gove and 
Herb have alluded to. | 

The fourth and final hypothesis offered by 
Gove and Herb is that as adolescent females 
prepare for and start moving into adult roles 
they become aware that males are favored in 
our society and start experiencing the stress 
associated with their adult sex role. Though 
this hypothesis is offered more in the form 
of a confident assertion than as a result of 
well-based knowledge, a study by Peskin. 
(1972) confirms its basic thrust. Using a 
sample of 31 male and 33 female subjects 
from the longitudinal Berkeley guidance 
study (cited in Peskin, 1972), he reported 
that for both sexes a change from a rela- 
tively tension-free or conflict-free preadoles- 
cence to a stressful adolescence is predictive 
of positive adult mental health, He further 
found that a greater number of these 
changes, for example, from high to low self- 
confidence, are predictive of positive adult 
mental health for females than for males. 
Peskin concluded that adolescence clearly ap- 
pears more disquieting for the female, pos- 
sibly because of her unique task of acquiring 
what psychoanalytically oriented theorists 
term a self-assured passivity, 

In addition to these hypotheses, there are 
others that focus on two of the basic bio- 
logical differences between the sexes, the in- 
ternal and external sexual structures. These. 
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Eoo tend to be construed by Freud- 
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ian theorists in terms of biological impera- 
tives with pervasive influences on psycho- 
sexual functioning. Other theorists propose 
that the reverse is likely to be true and that 
the sexual area may be precisely that realm 
in which the superordinate position of the 
sociocultural over the biological level is most 
convincing (Gagnon & Simon, 1973). It is 
the implications of this latter position that 
are sketched in detailing additional biologi- 
cally based reasons for a more stressful 
female adolescence. 

With the onset of female adolescence, there 
occurs the sexually dimorphic trait of a cyclic 


| release of gonadotrophin, which results in, 


among other things, menstruation. Further- 
more, it appears that for many women there 
are also attendant feelings of tension, irrita- 
bility, anxiety, and depression, composing the 
syndrome of premenstrual tension (Bardwick, 
1971; Conger, 1977). This syndrome has 
been found to be correlated with violence, 
death by accident and suicide, and admission 
to a hospital because of acute psychiatric 
disorder (Graham & Rutter, 1977; Moyer, 
1973). Although it is not clear to what ex- 
tent this syndrome can be attributed to ac- 
tual physiological causes or to the script of 
a culturally induced expectation (Parlee, 
1973; Ruble, 1977), it is commonly accepted 
that adverse emotional consequences are a 
corollary for at least some women (Graham 
& Rutter, 1977). And to the extent that this 
is so, it can be considered contributory to à 
more stressful female adolescence. 

A second hypothesis focuses on the re- 
actions of adolescents to the physical changes 
engendered by puberty. Conger (1977) re- 
ported that such changes cause more concern 
for females, with a substantial minority те- 
Porting feeling shame and anxiety over а 
change such as menstruation. This greater 
female concern is due in part to the fact that 
only the female has to face the possibility 
of an unwanted pregnancy. This possibility is 
the number one worry of parents of females 
(Hoffman, 1977) and is undoubtedly com- 
municated to their daughters. Consequently, 
whereas the male has only to fear sexual 
failure, the female has to fear failure and 
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success simultaneously (Gagnon & Simon, 
1973). Hence it is not surprising that the 
most common reaction of females to the first 
experience of intercourse is fear, whereas for 
males it is excitement (Sorenson, 1973). 

In addition, the physical changes with 
their dramatic effects on physical attractive- 
ness are a further cause of the greater concern 
of females. Berscheid and Walster (1975) 
have amply documented the overriding im- 
portance of physical attractiveness in the 
adolescent dating situation, and this probably 
explains why physical attractiveness out- 
weighs all other concerns in early adolescence 
(Eme & Goodale, in press). Furthermore, 
since female popularity is more closely re- 
lated to physical attractiveness (Berscheid 
& Walster, 1975), it comes as no great reve- 
lation that physical attractiveness causes 
them more concern than it does males (Eme 
& Goodale, in press; Musa & Roach, 1973). 

In sum, these hypotheses suggest that, be- 
ginning in adolescence, females tend to ex- 
perience more stress than males. Moreover, 
they seem more disposed to respond to this 
stress according to the sex-stereotypic pat- 
tern of internalization rather than external- 
ization (Achenbach, 1966; Anthony, 1970; 
Garai, 1970; Locksley & Douvan, Note 1), 
with the resultant greater prevalence of neu- 
rotic symptomatology. 

These preceding hypotheses that suggest 
a more stressful female adolescence are com- 
plemented by others that suggest a lessening 
of stress in some areas for males. Gove and 
Herb (1974) indicated that males start per- 
forming better academically. This is at- 
tributed to a maturational “catch-up,” the 
increasing importance of “masculine” sub- 
jects such as mathematics and science and 
the increased relevance of schooling to their 
long-range vocational goals. These hypothe- 
ses, like the previous ones, are evaluated in 
the light of more current research and are 
complemented by explanations that have a 
more biological focus. 

The hypothesis of maturational catch-up 
receives support from the most extensive re- 
view to date on sex-related cognitive differ- 
ences, that by Sherman (1978). She endorsed 
the generalization that though females are 
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generally verbally precocious compared with 
males, males eventually catch up, although 
females retain a slight edge in verbal func- 
tioning. She attributed this catch-up both to 
maturation and to the exposure of males to 
heavy educational intervention in verbal 
training. 

In the area of mathematical skills, Sher- 
man, along with Maccoby and Jacklin (1974), 
concluded that beginning in adolescence 
males tend to outperform females. She dif- 
fered from Maccoby and Jacklin by placing 
the onset later in adolescence, by assessing 
the difference as nonexistent or minimal once 
previous mathematical background was con- 
trolled for, and by rejecting a biologically 
based male superiority in spatial ability as 
contributory to whatever male superiority 
did exist, However, reviews by Waber (1977) 
and Goleman (1978) have both affirmed the 
strong possibility of a biological basis for 
greater male spatia] ability. 

Despite the preceding disagreements, it 
does seem clear that whether one views male 
adolescent mathematical ability as greatly or 
minimally superior to that of females, it 
undoubtedly contributes to improving male 
academic performance. Also, there is suf- 
ficient support in the data for a biological 
basis for this ability, Sherman notwithstand- 
ing, and at the very least, a prudent person 
would not totally ignore this possibility, as 
did Gove and Herb (1974). 

The increased relevance of schooling to 
the long-range goals of males needs to be 
closely scrutinized in light of the significant 
changes in the vocational orientation of fe- 
males (Conger, 1977). Although this greater 
relevance was undoubtedly true some 20 
years ago when the studies cited by Gove 
and Herb, such as that of Douvan and Adel- 
son (1966), were actually conducted, there is 
a question as to its continuing validity. For 
example, in the most thorough study to date, 
Buxton (1973) surveyed 6,500 students in 
grades 7-12 in four separate school systems 
and reported that both sexes were highly 
concerned with the relation of schooling to 
the future. Moreover, females were found to 
deny disliking school more strongly than 
males and to like teachers more and to ex- 
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press more feelings of guilt and anxiety about 
school. Hence it may be that schooling does 
take on increased relevance for males, but 
this does not mean that it becomes less rele- 
vant for females. What may be true, how- 
ever, is that despite the fact females are at 
least equally concerned about school as 
males, this concern does not have the same 
consequences for them. Thus, Locksley and 
Douvan (Note 1) reported that objective 
academic achievement was associated with 
less stress for adolescent males but not for 
females. Indeed, they found that females with. 
high grade point averages were more de- 
pressed and reported more psychosomatic 
symptoms than males with high grade point 
averages, and they were not significantly less | 
aggressive than females with low grade point | 
averages. Locksley and Douvan suggested 
that the reason why actual academic achieve- 
ment did not reduce stress in females is that 
grades constitute a basis for social compari- | 
son with peers, precipitating conflicts over 
standards of femininity and sexual desira- 
bility, and that the anticipated ramifications 
of academic achievement for future work and 
family plans are conflictful for females. To 
the extent that this is so, it reflects Garai's | 
(1970) conclusion that for males happiness 
appears to be highly correlated with some, 
strong vocational interest that manifests it- | 
self frequently during puberty or even earlier, 
whereas females are more interested in inter- 
personal relations. Consequently, though 
commitment to vocational goals, as signified 
by successful academic achievement, may 
serve to mitigate the stress of adolescence for 
males, for females such success may para- 
doxically precipitate even more stress. 
Although the preceding discussion offers 
support for a decrement in stress in adoles- 
cence for males relative to females, there is 
other evidence that suggests that this decre- 
ment may be somewhat illusory. Because of 
the cultural proscription against male emo- 
tionality (Hoffman, 1977), males may learn 
to mask the neurotic expression of their prob- 
lems in much the same way that Kagan and 
Moss (1962) have suggested they mask the 
expression of dependency. Toolan (1962) 
and Glaser (1967) have both indicated that 
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many of the acting-out symptoms of adoles- 
cents may camouflage underlying depression. 
This may explain why Achenbach and Edel- 
brock (1978), in one of the most sophisti- 
cated factor analytic studies to date of syn- 
dromes derived from parental report of clinic- 
attending children (aged 6-16 years), indi- 
cated that all samples except the 12- to 16- 
year-old male sample, yielded a factor labeled 
depression. It is not surprising, then, that 
Graham and Rutter (1977) indicated that 
whereas boys with disturbances of conduct 
are likely to be referred to a psychiatrist, 
when they get older these same personality 
disturbances or criminal behaviors are often 
likely to be dealt with in other ways. Thus 
the altered sex ratio in neurotic disorders 
that begins in adolescence and with it the 
implication of a less stressful male adoles- 
cence may be partly an artifact of a different 
manifestation of stress and the consequent 
referral process. 

Furthermore, the increment in stress in 
adolescence for females, as indexed by greater 
"inner turmoil” (Rutter et al, 1976), may 
also be partly illusory. Garai (1970) sug- 
gested that the presence of anxiety or fear 
may not be a reliable indicator of mental 
disturbance for females. He noted that since 
they seem to live at a generally higher level 
of anxiety than males do, this may be a 
sign of alertness to danger and stress and 
may enable them to cope better with emer- 
gencies and endure anxiety and fear for 
longer periods of time than men, who tend 
to repress anxiety, Hence to a certain extent 
greater female inner turmoil may be adaptive 
rather than maladaptive. 

In summary, whereas in childhood it seems 
that cultural pressures are more dissonant 
for male predispositions, in adolescence they 
are more dissonant for females. This dis- 
Sonance can be expected to be coped with 
in sex-stereotyped patterns of internaliza- 
tion and externalization, with the correlative 
change in the sex ratio for neurotic disorders. 


Psychosis 


For a long time most writers tended to 
Sind all the psychoses of childhood to- 
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gether, usually under the term schizophrenia 
of childhood. However, in recent years the 
situation has changed radically, and there 
is a general (but not quite universal) recog- 
nition that the generic term childhood schizo- 
phrenia covers a number of quite different 
syndromes, which should be differentiated 
(Rutter, 1977c). The distinction that is most 
commonly made is between autism and child- 
hood schizophrenia. This distinction has re- 
ceived impressive support (Rutter, 1978) and 
is embraced in this presentation for reasons 
that are detailed later. However, it should 
be noted that many, such as Bender (1971) 
and Miller (1974), reject this position; and 
in fact, the National Society for Autistic 
Children has adopted the official posture that 
to conceptualize autism as the earliest form 
of schizophrenia, which becomes manifested 
in later childhood or early adulthood, is 
equally valid. Though agreement on this dis- 
tinction is lacking, there is enough speci- 
ficity to at least distinguish nonpsychotic 
from psychotic children (Keith, Gunderson, 
Reifman, Buschbaum, & Mosher, 1976). In 
addition, there is agreement that whatever 
the syndromes of child psychosis may be, 
it is the most debilitating of all forms of 
psychopathology in the severity of its symp- 
toms and the bleakness of its prognosis (Hint- 
gen & Bryson, 1972; Rutter, 1977c). 

With regard to autism, there seems to be 
universal agreement that males outnumber 
females in ratios ranging from 2:1 to 4:1 
(Goldfarb, 1970; Hintgen & Bryson, 1972; 
Kanner, 1973; Rutter, 1978; Schopler, 
1978). This ratio differs from that of adult 
schizophrenia, in which the sex ratio is the 
same (Dohrenwend & Dohrenwend, 1969, 
1974; Rosenthal, 1970) and is one of the 
facts adduced to advance the contention that 
autism and adult schizophrenia are different 
nosological entities. 

Although there is little agreement on the 
specific causes of autism, there is general 
agreement among those who have reviewed 
the research that when the etiology is even- 
tually established, organic factors will be 
found to play a major role (Ornitz & Ritvo, 
1976; Rutter, 1977c; Schopler, 1978). The 
one reviewer (Ward, 1970) who opted for an 
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etiology of primarily psychological factors 
seems to have been seriously deficient in his 
coverage of the literature (L'Abate, 1972; 
Rimland, 1972). As the most current reviews 
indicate, the majority of autistic children 
demonstrate severe deficits in intellectual, 
perceptual, and linguistic development (Fried- 
man, 1974; Gunderson, Autry, & Mosher, 
1974; Hintgen & Bryson, 1972; Rutter, 
1977c; Rutter, 1978). Accordingly, it is cur- 
rently thought that deficits in social behavior 
are less a cause than a reflection of dys- 
function in other areas of development. 

If one adopts this position, then it seems 
that the sex ratio can be explained largely 
in the same terms as were used to explain 
the similar sex ratio in learning difficulties. 
For example, one of the more tenable inter- 
pretations of childhood autism is that it is 
explicable primarily in terms of a language 
deficit, which in turn is rooted in defective 
cognitive functioning (Rutter, 1977c). This 
hypothesis is remarkably similar to Vellu- 
tino's (1977) conceptualization of dyslexia. 
In a comprehensive review, he rejected ex- 
planations based on visual perception, inter- 
sensory integration and temporal-order pro- 
cessing and concluded that the evidence most 
favors a verbal-deficit hypothesis. 

One must add the caveat, however, that 
this analogy does limp quite noticeably, as 
Rutter (1968, 1974) pointed out. First, two 
of the most conspicuous contributors to 
learning difficulties, retardation and central 
nervous system dysfunction, are absent in 
fully one fourth to one third of autistic 
children. Second, the vast majority of chil- 
dren with learning difficulties do not mani- 
fest the symptoms common to autism. Fur- 
thermore, even though the learning difficulties 
concomitant with retardation characterize the 
majority of autistic children, it is clear that 
autism constitutes a syndrome that is dif- 
ferent from that of retardation (Rutter, 
1978; Schopler, 1978). Third, even where 
the analogy is most apt, as in the case of 
children with developmental, receptive-lan- 
guage disorders, autistic children differ in 
several important ways (Rutter, 1977c). 
Hence, until a more exotic kind of learning 

difficulty is discovered, the parallel, although 
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currently considered the most plausible, re 
mains imperfect. 

As noted before, the greater male preva 
lence in autism stands in contrast with th 
parity of the sex ratio in adult psychosis 
and hence requires explanation. The most 
compelling explanation involves the medical 
model and posits different disease entities, 
The affective psychoses, which are virtuall 
unheard of in childhood (Lefkowitz & Bur- 
ton, 1978; Rutter, 1977c), are thought t 
have a strong biological base (Becker, 1977; 
Depue & Monroe, 1978; Gershon, Bunney, 
Leckman, Van Eerdewegh, & DeBauche, 
1976) and this base may simply program a 
vulnerable organism for an adult rather than} 
a childhood manifestation and for the rej 
sultant sex ratio. | 

Likewise, Rutter (1977c, 1978) made а 
strong case for autism’s being a disease en- 
tity different from schizophrenia. He noted 
that autism differs from schizophrenia in 
terms of time of onset, social class, family 
history of schizophrenia, evidence of cere: 
bral dysfunction, symptom patterns, level] 
of intelligence, and course of the disorder 
This distinction is further buttressed by the 
impressive evidence for genetic influences in 
schizophrenia (Cancro, 1976; DeFries & Plo- 
min, 1978; Folstein & Rutter, 1977; Gottes 
man & Shields, 1976) in contrast with the| 
minimal evidence for autism (Folstein & 
Rutter, 1977; Hanson & Gottesman, 1976) 
Thus, as with the affective psychoses, the s 
differences in autism and schizophrenia c 
be most plausibly explained by positing dif 
ferent biological factors that result in differ 
ent disease entities and hence in differen! 
sex ratios. 

Although the preceding discussion has com 
ceptualized autism and adult psychosis 4 
distinct disease entities and has employ! 
this distinction to explain the different s 
ratios, mention was also made of the fad 
that a psychotic process similar to adul 
Schizophrenia can manifest itself in child 
hood. This manifestation also involves а pre 
ponderance of males (Rutter, 1977c); but 
since it is conceptualized as being similar !0 
adult schizophrenia, an explanation от 
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than one based on a biological difference is 
equired. 

The most plausible explanation stems from 
research on sex differences in the child and 
adolescent preschizophrenic personality. Both 
follow-back studies (Watt &  Lubensky, 
1976; Watt, Stolorow, Lubensky, & Mc- 
Clelland, 1970; Woerner, Pollack, Rogalski, 
Pollack & Klein, 1972) and preliminary find- 
ings of ongoing high-risk studies of schizo- 
piens (Mosher, Gunderson, & Buschbaum, 
1972) indicate that whereas female pre- 
schizophrenics present a pattern of overin- 
hibition, sensitivity, conformity, and intro- 
version, males, in contrast, present patterns 
of unsocialized aggression. Hence it may be 
that because the unsocialized aggressive 
symptoms of males are more salient, their 
Schizophrenia is more easily recognized, and 
therefore they are more likely to be diag- 
nosed than female schizophrenics. 


Summary and Conclusions 


Recognizing the absence of a generally 
accepted taxonomy for use in diagnosing psy- 
chopathology in children, the present review 
chose to examine sex differences by focusing 
only on those categories relevant to the most 
important issue of continuity between child 
and adult psychiatric disorders. The review 
revealed a greater male prevalence in all of 

| the following categories: adjustment reac- 
tions, antisocial disorders, gender identity 
disorders, learning disorders, neurotic dis- 
orders, and psychotic disorders. This finding 
stands in marked contrast with adult sex 
differences, which, beginning in adolescence, 
eventuate in a greater female prevalence in 
neurotic disorders and affective psychotic 
disorders, a greater male prevalence in per- 
{sonality and gender identity disorders, and 
по sex difference in schizophrenic disorders. 

The reasons for the childhood sex ratios and 

their lack of continuity into adulthood were 

examined in light of what is currently known 
about the differential endowment and ex- 

Periences of the sexes. 

This contrast between child and adult sex 
differences in psychopathology is perhaps the 
most salient finding to emerge from the review 
and suggests the two following conclusions. 
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First, the validity of the contrast relies to a 
significant extent on the use of referrals 
rather than community surveys to arrive at 
the childhood sex differences. To the extent 
that this obviously limited source is vali- 
dated in other ways, it seems apparent that 
the male child is more at risk for maladjust- 
ment than the female. Hence it would be 
wise for those engaged in primary prevention 
programs to take cognizance of this and 
address their efforts accordingly. 

Second, though differential stress is clearly 
a major factor in explaining the contrast, it 
is equally clear that sex role per se is not 
the exclusive mediator of this stress. Bio- 
logical factors play an important role and 
need to be given far more attention than they 
have been given in the recent past. 


Reference Note 


1. Locksley, А., & Douvan, E. Problem behavior in 
adolescents. Unpublished manuscript, University 
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Jencks's method of analysis of the heredity-environment data is presented, but 
with the important modification that cognizance is taken of the principle of 
genetic variation with age, that is, that the genotypic value (G) varies with 
age. As a result of this modification and a more critical examination of the 
original data on IQ, a solution is obtained that agrees with Jencks's figure for 
covariance, but supports the Burt-Jensen emphasis on heredity in that it as- 
signs 75% of the remaining variance to heredity and only 25% to environment. 
The present study can be regarded as integrative in that (a) it eliminates most 
of the discrepancies in the field and (b) it uses Jencks's approach, albeit modi- 
fied, to produce what is essentially Burt's result. 


Of the many attempts at solving the problem 
of IQ heritability, probably the best known is 
that of the late Sir Cyril Burt. His claim that 
7596-8095 of IQ variance could be attributed 
to hereditary differences and only 20925-25975 
to environmental differences became general 
knowledge in the United Kingdom, with the 
publication of the article, “The Multifactorial 
Theory of Inheritance and its Application to 
Intelligence" (Burt & Howard, 1956). In the 
United States, however, Burt's work did not 
receive any marked degree of recognition until 
Jensen (1969) wrote in support of Burt's find- 
ings. Kamin (1974) aroused further interest in 
Burt 5 years later, but in this case it was the 
voice of criticism, directed mainly at Burt's 
handling of his data. 

The solution that Jencks et al. (1972) offered 
in Appendix A of their book Inequality might 
be regarded as one of the strongest alternatives 
to that of Burt and Jensen. According to this 
solution, hereditary differences account for 
about 45%-50% of the total IQ variance and 
environmental differences for 3005-3595. The 
remaining variance (about 1895-1992 of the 
total) is attributed to heredity-environment 
covariance, the concept that parents of good 
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heredity not only pass on their superior genes 
to their children but also tend to provide them 
with superior intellectual environments. 

In their 1956 article, Burt and Howard gave 
a figure of about 10% for covariance. But this 
was obtained for IQ group test data, and since 
they were primarily looking for data amenable 
to the application of Fisher's genetic theory 
(Fisher, 1918), they favored adjusted IQ 
assessments in which the attempt was made 
to eliminate systematic (i.e., between families) 
environmental effects. The result was the 
reduction of the covariance term virtually to 
zero. (It was this adjustment of IQs that 
became a main object of Kamin's criticism.) 

Besides the efforts of Burt and Howard and 
Jencks et al., mention can be made of other 
approaches, such as multiple abstract variance 
analysis (MAVA; Cattell, 1960). Of particular 
importance is the work of two schools of bio- 
metrical genetics: the Birmingham school 
(Fulker, 1974; Jinks & Eaves, 1974; Jinks & 
Fulker, 1970) and the Hawaiian school (Rao, 
Morton, & Yee, 1974, 1976). Both these schools 
represent developments of the classical genetic 
theory laid down by Fisher (1918). The 
Birmingham school was influenced also by 
Mather (1949) and the Hawaiian school by 
Wright (1931). An important contribution of 
Wright was his development of path analysis, 
a technique that is used both by the Hawaiian 
school and by Jencks. 
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The essential difference between the two 
schools is their treatment of dominance, the 
concept that originated with Mendel, namely, 
that genes do not necessarily combine ina 
simple additive fashion at any one locus but 
can interact, thus adding to the genetic vari- 
ance. (There is also the possibility of inter- 
action between genes at different loci—what is 
known as epistasis.) Dominance is an important 
feature of the Birmingham school's approach 
to IQ heritability analysis. The Hawaiian 
school, on the other hand, chooses to exclude 
dominance from their genetic model, the argu- 
ment being that it is more important to assess 
environmental effects (Rao et al., 1976). 

Another major difference between the two 
schools lies in their estimates for covariance. 
The Hawaiian school, using Jencks et al.’s 
(1972) American data and a path model, have 
obtained an estimate on the order of 10%. The 
Birmingham school, using Burt's (1966) data, 
^have found no evidence for covariance. This is 
to be expected, since Burt’s data derive from 
adjusted assessments. But even in the case of 
Jencks et al.’s data, they have obtained the 
same zero result (Jinks & Eaves, 1974). This 
failure to detect covariance in Jencks et al.'s 
data throws considerable doubt on the power 
they claim for their method of weighted least 
squares (see below). 

* The Hawaiian school (Rao et al., 1976) have 
obtained the remarkable result that IQ 
heritability (ie., the proportion of the total 
IQ variance to be attributed to hereditary 
factors) is very much greater for children than 
for adults: 67% compared with 21%. As I have 
shown in a recent publication (Gourlay, 1978), 
this result has practical (or empirical) implica- 
tions that are very difficult to accept. For 
example, it follows that whereas the correlation 
between siblings (as children) is .53, the 
correlation between siblings (as adults) is .74. 
The difficulty is almost certainly due to the 
omission of dominance from the genetic model. 

ES feature of both schools is their use of 
high-powered statistical techniques—maximum 
likelihood and weighted least squares—in which 
all the data under consideration are combined 
in the one analysis with due weight being given 
to each item. Both these methods provide not 

only standard errors for the parameter esti- 
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mates obtained but also tests of goodness of 
fit of the models employed. 

On the face of it, the statistical methods used 
by the schools compare very favorably with the 
piecemeal type of analysis of Jencks, Burt, and 
others. On the other hand, the piecemeal 
approach permits the application of corrections 
for various error factors that affect the raw 
data (e.g., the selective placement factor in the 
case of foster-child studies). By not allowing 
for such factors, the parameter estimates of 
the schools are given a precision that can be 
quite spurious. Thus, applying weighted least 
squares to Jencks et al.’s data, Jinks and Eaves 
(1974) obtained an estimate of .29 + .02 for 
the environmental variance between families. 
The figure is almost certainly inflated by 
selective placement, for which Jinks and Eaves 
made no allowance. It is also significant that 
the application of their method to Jencks et al.'s 
(1972) data gave no evidence of covariance; 
apparently the method is just not sensitive 
enough to reveal the covariance that is un- 
doubtedly there. 

The present study does not use the methods 
of maximum likelihood or weighted least 
squares, but follows the general approach of 
Jencks. As justification, it can be argued (a) 
that by correcting or adjusting data for all 
relevant factors before the final analysis, one 
is more likely to obtain a solution that is 
consistent with all the known facts and (b) 
that this is preferable to the use of high- 
powered methods, which yield standard errors 
and tests of goodness of fit but produce 
parameter estimates that are obviously un- 
acceptable empirically. 

Although following the general approach of 
Jencks, the present study contains a number of 
important modifications, in one or two cases 
amounting to actual corrections: 

1, A more critical examination is made of 
the data that derive from the four classical 
foster-child studies: Burks (1928), Freeman, 
Holzinger, & Mitchell (1928), Leahy (1935), 
and Skodak and Skeels (1949). This examina- 
tion has the effect of largely removing the 
discrepancy that Jencks et al. found between 
the results obtained from the analysis of 
foster-child-study data and kinship correla- 
tions. 
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2. A new principle is introduced that has 
not so far appeared in heritability analysis, 
namely, that the genotypic value (G) is a 
function of age as well as ability. Its introduc- 
tion removes two major difficulties that Jencks 
et al. encountered: (a) their failure to reconcile 
an item of the Skodak and Skeels data with 
their basic analysis (in the end, they dismissed 
the Skodak and Skeels study as “deviant” 
[Jencks et al., 1972, pp. 281–283 ]) and (b) the 
problem of explaining the large difference that 
exists between the correlations for siblings and 
fraternal twins (about 11%-12% of the total 
IQ variance). The general effect of introducing 
the new principle into heritability analysis is 
to increase the estimate of the hereditary 
component and to decrease the estimate of the 
environmental component. 

3. In applying their basic path model 
(Jencks et al., 1972, pp. 279-281) Jencks et al. 
appeared to assume that the path coefficients 
from the child's G and E (environmental value) 
to his IQ were the same for the foster child as 
for the child living with his parents. This is 
an oversight and is corrected in the present 
analysis. 

4. Jencks et al. have been criticized for not 
making their application of Fisher's (1918) 
genetic theory more explicit. Thus Jinks and 
Eaves (1974) felt that a weakness of their 
approach was their failure to deal systemati- 
cally with dominance. Also, in one of their own 
notes (Jencks et al., 1972, p. 317) there is a 
cryptic reference to a comment by Crow that 

' charges them with ignoring the effects of 
dominance. The present study attempts to set 
out more clearly Jencks et al.’s use of genetic 
theory. In particular, it corrects their formula 
for the genotypic correlation between fraternal 
twins, in which they confuse the genotypic 
correlation between spouses with the correla- 
tion between their additive deviations. 

The outcome of the new analysis is a solution 
that appears to be consistent with all the data 
relevant to IQ heritability analysis. It supports 
Jencks in his estimate for covariance, but is 
closer to Burt and Jensen in its estimate of the 
relative importance of hereditary and environ- 
mental differences. 


Theoretical Basis of Analysis 


The Basic Model 
IQ heritability analysis invariably starts 
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with a simple additive model: 
G--E, (i 


where in genetic terminology, IQ is th 
phenotypic, С the genotypic, and E the еп- 
vironmental value. The corresponding varianci 
equation is 
var (IQ) = var (G) + var (E) 

+ 2 соу (GE), (2) 
where соу (GE) is the covariance between б 
and Е. 

If one standardizes all three variables (i.e, 
with M=0 and SD = 1), then, applying 
Jencks et al.'s notation, one can write 

Q = hG + eE, (3) 
where Q represents IQ (standardized). The 
corresponding variance equation is now 


1= № + е + 2hes, (4) 


measured intelligence (IQ) = 


) 


where /* and e? represent the fractions of the 
total IQ variance due to hereditary and 
environmental differences, respectively, and 
2hes represents the fraction due to covariance, 
s being the correlation between G and E. 

Genetic—environmental interaction. Some 
critics (e.g., Layzer, 1974) would insist that 
Equations 1 and 3 should also contain a term 
that represents genetic-environmental inter, 
action (in the statistical sense). This would also 
mean additional terms in Equations 2 and 4. 
However, the most likely form of the inter- 
action is a product term in G and E, and it 
can be shown that for such a term the variance 
is negligible. It follows that for the purpose of 
heritability analysis, it is fairly safe to omit 
geneticcenvironmental interaction from the 
basic model. 

The age factor. Heritability analysis in the 
case of plants and animals (excluding humans) 
usually involves variables that characterize 
the mature organism, for example, size and 
quality of fruit or crop, milk and beef yields, 
and so on. The factor of age does not normally 
enter into the analysis. Consequently, it is 
generally assumed that (a) for a given рорша“ 
tion, 4 and e are fixed (ie., heritability is @ 
constant for the population) and (b) for апу 
member of the population, G is fixed. 

The same assumptions are to be found in the 
field of IQ heritability analysis (there is 016 
exception). Yet, it is obvious that IQ herita-# 
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bility analysis is different from plant and 
animal analysis in that since much of the data 
concerns children with ages ranging from 4 to 
17 years, one is no longer dealing with a 
homogeneous population of mature individuals. 
Consequently, there is a distinct possibility 
that G varies with age and that even when 
heritability analysis is confined to children of 
the same age (x), the values of h and e also 
vary with x. It appears, therefore, that to 
allow for the age factor, Equation 3 should be 
written in the form 


0, = №6, + eEs (5) 
where х indicates that Л, е, and С as well as Q 
and E can vary with age. 

Following Wright (1931), the Hawaiian 
school of biometrical genetics have gone some 
way toward recognizing the importance of an 
age factor. They allow for the possibility that 

+h and e can vary with age, but still make the 
standard assumption that G for the individual 
is fixed. As a result of their analysis, they give 
two values for 7 and е2: the values for a child 
population—a sort of average over all the ages 
involved—and the values for an adult popula- 
tion (Rao et al., 1976). The values are 12 = .670 
and e? = [.094 (common environment) 4- .135 
(residual)] for the child population and 
«= 211 and е = [.506 (common environ- 
ment) + .151 (residual)] for the adult, where 
| common environment means common family 
environment. The estimates for the covariance 
(2hes) are .101 and .132, respectively. 

As indicated earlier, these results have some 
rather strange empirical implications. In 
particular, it follows from the Hawaiian esti- 
mates that the correlation between the IQs of 
the adopted child and the natural mother is .26; 
but when the adopted child becomes an adult, 
the correlation falls to .15. This deduction is 
quite at variance with the Skodak and Skeels 
(1949) data, in which the correlation between 
adopted child and natural mother increases 
from about .28 to about .40 as the (average) 
age for adopted child moves from 4 years 
3 months to 13 years 6 months. 

The most likely explanation of the oddness 
of the Hawaiian results is the omission of 
dominance from their genetic model. But there 
is another important aspect to consider. Ig- 

i noring selective placement, it follows from 
Equation 5 that the IQ correlation between 
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adopted child (AC) and natural mother (NM) 
is given approximately by 


r (Qaca, Оум) = har (Gace, Qnm). 


Consequently, if Gac does not vary with x, 
r(Gac, Охм) will be a constant, and the corre- 
lation between Олс: and Qnm will vary directly 
with A. Thus if hz decreases with x (the 
Hawaiian case), the correlation r(Qacz, Охм) 
will also decrease with x. On the other hand, 
if the correlation between the two IQs increases 
with x, as is suggested by the Skodak and 
Skeels data and as one might expect on a priori 
grounds, then either 4, must increase with « 
or r(Gacs, Охм) must increase with x (or both). 
The first alternative is not likely. So it appears 
that the second alternative must hold, that is, 
the correlation between Gacz and Охм will 
increase with a. It follows that Gac (or simply 
G) varies with age x. This principle, that G 
varies with age, is an important feature of the 
analysis provided in this study. It is discussed 
more fully in the next section. 

There remains the question of whether # and 
е also vary with age. Most likely there is some 
variation of л and e with age, and it is very 
probable that it takes the form of a decreasing 
h and an increasing e. However, for the pur- 
poses of the present study, it is assumed that 
no serious error will be incurred if, for ages 
5 years and onwards, one takes / and e as 
constant and therefore also the covariance 
(2hes). The basic model for the study is 


therefore h 
Qz = hG: + еЕ., (6) 


where Qz, Gz, and Ё, vary with age x. 


The Genotypic Value (G) as a Function of Age 
and Ability 


Probably the main reason why past investi- 
gators have universally accepted a fixed G is 
that the genotype is fixed at conception. How- 
ever, it must also be remembered that genetic 
development is a program in time that unfolds 
as the organism interacts with its environment. 
Different genetic factors operate at different 
points in time; moreover, the time schedule 
varies from individual to individual. It follows 
therefore that G must vary with age. 

In the case of an intelligence test, such as 
the Stanford-Binet, this should be easily 
accepted, since it is well-known that the factor 
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content at any one age is different from that 
at another. But even in the case of a uni- 
dimensional variate such as height, the Gs for 
any two ages will not be the same (except, of 
course, at maturity). Obviously, in the case of 
different abilities, for example, the ability to 
read and the ability to do an intelligence test, 
it follows a fortiori that the Gs for one ability 
will not be the same as those for the other, even 
at the same point in time. 

How does G for an IQ test such as the 
Stanford-Binet vary with age? Actually the 
pattern of variation is very similar to that for 
IQ itself. If the correlation between IQ at age 
16 or 17 and IQ at an earlier age x is plotted 
against x, the curve obtained is a rising curve 
that levels out to the value of one at x — 16 
or 17 (cf. Jensen, 1969, p. 18). In the case of G, 
the corresponding curve is similar in form and 
lies below the IQ curve, except at age 16 or 17 
where the two curves come together. For the 
moment I simply demonstrate this result; 
later I derive it from the Skodak and Skeels 
data. (See Figure 1 on p. 612.) The demon- 
stration depends on taking the results for a 
typical heritability analysis, say, those of 
Jencks et al. (1972), and assuming that h, e, 
and s are constant with age (see above). 

According to Jencks, an acceptable analysis 
of the total IQ variance would be the following : 
heredity, 46.595; environment, 35% (20% 
between families and 15% within a family) ; 
and covariance, 18.5%. Equationwise, Jencks 
et al.’s solution can be expressed 


Q = .6826 + 447EF + .387ER, (7) 


where EF and ER are the between-families and 
within-family environmental measures and all 
four variates Q, G, EF, and ER have been 
standardized (M = 0; SD = 1). 

Consider now a particular value of x, say, 
х = 5. According to Jensen (1969, p. 16), the 
correlation between IQs at ages 5 and 16 (or 
17) is about .70, when correction is made for 
attenuation. It follows from Equation 7 that 


70 = .465r (G5, G16) + .20 
+ ASr(ERS, ER16) + .185, 


where G5 represents the value of G at age 5, 
and so on. The correlation r(ER5, ER16) can- 
not be zero. One can easily imagine that at 
least a quarter of the within-family environ- 
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mental variance is accounted for by age 5, 
Let me therefore take .50 as the value for this. 
correlation, recognizing that it could well be 
an underestimate. пина in Equation 6 
one finds that r(G5, G16) = .24/.465 = .516. 
Obviously, the value of G at age 5 is consider- 
ably different from the value of G at age 16, 
Furthermore, it follows that the curve for 
r (Gx, G16) lies below the curve r(Qx, 016). 

The above has been worked out on the basis 
of Jencks et al.’s solution that did not apply 
the principle that G varies with age. Naturally, 
it would be just as easy to carry out the 
demonstration using a solution that incor- 
porated the principle. 

Implications of the variability of G with age. 
The recognition of the principle of the varia- 
bility of G with age has two immediate conse- 
quences. First, any definition of С must specify 
not only the population and the IQ test for 
which G is being defined but also the age group. 
Following Falconer (1967, р. 113), one might’ 
define the genotypic value for a given IQ test: 
at age x as follows: 

If one would replicate the genotype in a 
number of individuals and measure them for. 
IQ at age x, after they lived in environmental 
conditions normal for the population, their 
mean environmental deviation would be zero, 
and their mean phenotypic value (i.e., IQ) 
would consequently be equal to the 10' 
genotypic value of that particular genotype 
at age x. 

Second, the basic model must show that G | 
as well as E and Q varies with age. This, of 
course, has already been done (see Equation 6). 
There are several other important conse- 
quences: 

1. The variation in the correlation between 
IQ at 16 years of age and IQ at earlier ages has 
often been quoted as evidence of the incon- 
stancy of IQ, that is, that IQ varies consider- 
ably as a result of environmental change. The 
introduction of the principle that the genotypic 
value varies with age provides a different 
explanation for the inconstancy, namely, thàt 
most of the variation is of a genetic character. 
Even on the basis of the simple demonstration 
presented earlier, it appears that environ 
mental change accounts for no more than & 
quarter of the change in the IQ correlation 
with age. 
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There is an obvious parallel with Piaget's 
у theory of stages in cognitive development. 

Piaget’s theory, although interactionist, is 
essentially genetic in character—certainly im- 
plicitly so—and it is possible to argue that the 
principle of the variability of G with age is a 
translation of Piaget into statistical terms. 

2. The correlation between the genotypic 
values of siblings is not independent of the 
|! ages of the siblings—as is generally assumed— 
but varies with the age interval between them. 
The greater the age interval, the smaller the 
correlation. In fact, if one considers children 
of age 16 and their siblings at earlier ages x, 
the correlation between the Gs of the sibling 
pairs (one at age 16 and one at age x) varies 
with x in the same manner as r(Gz, G16), 
except that the maximum value (at x — 16) 
is on the order of .50. 

It follows that the correlation between the 
108 of ordinary siblings is depressed vis-à-vis 
the correlation for fraternal twins—not just 
because of greater environmental differences 
but also because of the greater differences in 
their Gs. This, of course, clears up the problem, 
which obviously worried Jencks, of how to 
explain the large difference between the corre- 
lations for siblings (different ages) and those 
for fraternal twins (same age): .59 versus .70 
1 (Jencks et al., 1972, pp. 289; 290). 

3. In the same way, the correlations between 
the Gs for child and parent vary with the age 
of the child. Skodak and Skeel's (1949) study 
is unique in that it provides confirmatory 
empirical evidence. The result also explains 
Burt’s (1966) finding that the correlation be- 
tween parent (adult) and child is .50, whereas 
the correlation for parent (as child) and child 


is .56 (cf. Jensen, 1969, p. 49; Kamin, 1974, 
pp. 94; 95). 
4. It should now be apparent that the 


principle of G variability with age has im- 
portant implications for heritability analysis. 
Previous analyses have been wrong in that 
they have ignored differences in age between 
parent and child and between siblings. Ob- 
viously, analyses must be carried out on data 
for which the Gs are at the same age level, and 
where the data are not in this form to begin 
with, they must be adjusted so that the Gs are 
comparable. Oddly enough, Burt, who has 
been subjected to so much criticism of late, 
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largely avoided the problem of G variability 
by concentrating on twin data in which age 
differences did not operate. 


The Genetic Model 


It follows from the definition of G, as also 
from the basic model (Equation 6), that G 
includes all the underlying genetic factors. In 
the literature, there is some confusion on this 
point in that the term genolypic value is some- 
times applied only to additive genetic effects 
(genetic interaction effects such as dominance 
are treated as phenotypic). However, for the 
purposes of this study, it must be made clear 
that G embraces not only additive effects but 
any other genetic effects one chooses to 
consider. 

An examination of IQ heritability studies 
shows that only three main genetic effects are 
considered: (a) additive effects if mating were 
random, (b) additive effects due to assortative 
mating (the concept that mating is not random 
and that the additive effects of spouses are 
correlated), and (c) dominance (the concept 
that the genes at any one locus can interact and 
produce a nonadditive effect). Other genetic 
factors are possible, for example, epistasis (the 
concept of interaction between genes at 
different loci), but it is generally assumed that 
these effects are not important for IQ herita- 
bility analysis and can be ignored. 

In restricting genetic factors to these three 
main effects, IQ heritability analysis follows 
the classical genetic theory of Fisher as laid 
down in his 1918 article. The present study, 
like other studies, does not attempt to advance 
on this theoretical basis. 

Expressing the genetic model in equation 
form, one can write G=A+D, where A 
= additive effects (including assortative mat- 
ing) and D= dominance effects. The corre- 
sponding variance equation is 


Vo = Vat Vo. (8) 
There is no covariance term, A and D being 
uncorrelated. 

The additive variance V4 is often termed 
the genic variance and can be split into two 
subcomponents: (a) the variance due to addi- 
tive effects if mating were random and (b) the 
additional additive variance due to assortative 
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mating. Burt (1975, p. 127) called these two 
components the additive variance (V4) and 
the variance due to assortative mating (V Aw). 
But to avoid confusion, I call them both addi- 
tive variance and denote them by Vag) and 
V 4v; that is, Equation 8 can be written 


Ve = Vam + Vacamy + Vo. 


T now standardize G, A, and D (with M = 0 
and SD = 1) and write 


G = aA + dD, (9) 
where 
1226-44, (10) 
a? = Va/(Va+ Vp), 
and 


Ф = Vp/(Va-c- Vo); 


that is, a? and d? are the proportions of the 
genotypic variance to be attributed to additive 
effects and dominance, respectively. Also, it 
should be noted that the term a? is equivalent 
to c» in Fisher’s (1918) analysis. 

I now consider the relationship between the 
genetic measures for parents and offspring. In 
the discussion that follows, it is understood 
that the Gs, As, and Ds have been defined for 
the same age (or level) of development; that 
is, either children’s values have to be defined 
for the adult level or parents’ values have to 
be defined for the same age as the children. I 
assume the former. The Gs, As, and Ds for 
father, mother, and offspring are distinguished 
by subscripts F, M, and C, respectively. 

Since the child gets half of his genes from 
each parent, it follows that 


Ac = 50(Ав + Ам) 


+ random term (within family). (11) 


A similar equation can be written that ex- 
presses Gc in terms of Ср and Gy, namely, 


Go = g(Gr + См) 


+ random term (within family), (12) 


where the same symbol (g) is used as in 
Jencks et al. (1972). 

The coefficient g will not be .50, since G in- 
cludes dominance as well as additive effects. 
It is important that one obtain an expression 
for its magnitude. 
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As a first step, it might be noted that the 
correlation between Gc and the G of either. 
parent (Gp) is 


r(Gc, бр) = gl 1 + r(Gr, Gu) ], 
which can be written 
r(Gc, Gp) = g(1 + p), 


where r(Gr, Gu) = р (Jencks et al.'s notation) 
= (Fisher’s notation). (It might be noted 
that Fisher used the same symbol и to denote 
both the correlation between the genotypic 
values and the correlation between the pheno- 
typic values.) 

The correlation r (Gr, Gw) implies assortative 
mating. It is assumed as in Jencks et al. that 
Ср and Gy are correlated through the IQs of 
the parents. It follows that 


$ = r(Gr, Gu) 
= r (Gr, Qx)r(Qr, Qu)r (Om, бм) 


But by Equation 3, 
r(Gr, Ор) = h + es = (Ом, См). 


Hence 


(13) 


P = (h + es)?r (Qr, Ом), (14) 
which is Jencks et al.’s (1972, p. 273) result. 
Similarly, following Fisher’s genetic model, 
as did Jencks et al., Ар and Ам are correlated 
through Gr and См, where 


"(Аъ Ам) = a^r(Gr, Gu) (15) 


(see Equation 9). 

Denoting 7(Ак, Aw) by A as in Fisher 
(Jencks et al. made the mistake of omitting 4 
from their analysis), one has А = ар = cw 
(Fisher’s notation). (This equation is not to 
be confused with the other well-known equa- 
tion of Fisher, A = cycou. As noted earlier, 
Fisher used и in two senses. When и denotes 
the correlation between the genotypic values 
of parents, A = cou. But when y denotes the 
correlation between their phenotypic values, 
A = cop. It would have been better if Fisher 
had used two symbols, say, u, and up.) 

Corresponding to Equation 15, one also 
has r(Dr, Рм) = Фр and r(Ar, Du) = adp 
= "(Ам, Dr). By means of these equations, 
one can derive another expression for r (Gc; Ge) | 
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Table 1 


Summary of Relationships Between Genetic Components in Fisher's (1918) and Jencks et al.'s 


(1972) Terminology 


Component Fisher Jencks et al. 
r(Gr, См) и р = (h + esrr(Qs, Ом) 
а = Va/(Va + Vp) са 2g 
r(Av, Ам) А = сш 22р 
r(Gc, Gr) 5a + #) gU +g) 
i (С, Ga) "25(1 + а + 244) 225 + .5g + 21р 
Vaao/(Va + Vp) c — A) 2g(1 — 22) 
Vau /(Va + Vp) px Agp 
@ = Vo/(Va + Vo) 1=а 1— 2g 
as follows: which, using Equation 10, reduces to 


(Со, Ge) = EL(aAc + Рс) (aAr + dD»)] 
E((aA» + dDg)[.50a (Av + Ам) 
+ dDc + random term] 

(by Equation 11; where E — expected value). 
| Assuming that the correlation between Dc and 
Dy is negligible, this becomes 


r(Go, Gr) = .50a2(1 + A) + .5002 (adp), 


' 


which reduces to 
r(Go, бр) = .50a*(1 + Р): (16) 


, Comparing Equations 13 and 16, it is seen 
that g = .50а° = .50c, (Fisher's notation) ; 
that is, c (Fisher) = 2g(Jencks) = а. 

Since it is required later, I also derive the 
correlation between the genotypic values for 
fraternal twins (or siblings at the same age). 
Thus, one can write 


(Gs, Gi) = E[(a41 + dD3) (04: + dD)] 


= E([.50a (Av + Ам) + dDi 
+ random term] 


) x [.50a(Ar + Am) + dD: 
+ random term]}. 
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It must now be noted that whereas the Ds 
for offspring are uncorrelated with the As and 
Ds for parents, the Ds for siblings are corre- 
lated. I assume the standard figure for the 
correlation, namely, .25 (cf. Falconer, 1967, 
P. 157). It follows that 


(С, Gz) = .25a2(2 + 24) + .250, 


(G1, 62) = .25 + .25a? + 500A 
= 251+о+ 20А) (Fisher) 
ог 


(Gr, Ge) = .25 + 50g + 28? (Jencks). (17) 


This equation corrects Jencks et al.’s mistake 
(1972, p. 303): They seem to have assumed 
that р, the correlation between parental Gs, is 
the same as the correlation. between their 
additive deviations. 

Table 1 summarizes a number of the above 
results and includes expressions for the com- 
ponents of the genetic variance. In each case, 
both Fisher's and Jencks's terminology are 
presented. 


The Analysis 


The general approach is that of Jencks et al. 
(1972). In other words, a solution is sought 
mainly through data on the correlations be- 
tween parents and children, both natural and 
foster. The only other item of data of equal 
importance is the correlation for fraternal 
twins (of the same sex). The twins correlation 
has the advantage over the correlation for 
ordinary siblings in that because twins are of 
the same age, their Gs are comparable. This 
permits the immediate application of Fisher's 
(1918) genetic theory (see, in particular, 
Equation 17). 

As in Jencks, all correlations are corrected 
for attenuation. Error terms do not therefore 
appear in the variance analysis. 


604 


Notation 


The symbols NC and AC are used to denote 
natural (own) and adopted child; NP, NF, 
and NM and AP, AF, and AM are used to 
denote natural and adopting parent, father, 
and mother, respectively. The symbol Q is used 
for IQ (standardized) ; but in the case of corre- 
lations involving IQs, it is normally omitted 
from the notation, except where there could 
be confusion. Thus, r(F, M) denotes the 
correlation between the IQs of parents (cor- 
rected for attenuation) and r(ACx, NM) de- 
notes the correlation between the IQs of 
adopted child at age x and natural mother. 
On the other hand, a correlation such as 
r(Gncz, Охм) retains the Q in the notation. 

"Three expressions that occur frequently in 
the analysis are represented by special symbols. 
These are a = (k + e)! У, = r(NCx, NP), 
and Z = r(AC, AM; NSP), where NSP indi- 
cates the correlation between AC and AM 
with no selective placement. 


Basic Data 


A large part of the data is taken from 
Jencks et al. (1972), who provide an excellent 
summary of American data related to IQ 
heritability analysis. The first two important 
items are (a) the correlation between the IQs 
of father and mother (the empirical measure 
of assortative mating), r(F, M) = .57, and 
(b) the correlation between the IQs of parent 
and own child living with parent. 

Jencks et al.’s study gives an estimate of .55 
for the latter correlation, but the statistic has 
to be qualified. One must remember that in 
accordance with the principle of the variability 
of G with age, r(NC, NP) will vary with the 
age level of the children from whose IQs the 
correlation was derived—The younger the 
children, the lower the correlation. It is 
assumed that the figure of .55 applies to 
children whose ages are about average for 

foster-child studies. I take this average as 
9.3 years, the figure for the Leahy (1935) 
study. I therefore write 


Yu = r(NC9.3, NP) = 55. — (18) 


Also important is the correlation between 
the IQs of fraternal twins. It is assumed that 
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the correlation is independent of the age level 
of the twins. Also, like Jencks, I take the 
figure provided by the Newman, Freeman, 
and Holzinger (1937) study; that is, I write 


r(1, 2) = .692 = TIQDZT 
(Jencks et al.’s notation), 


(19) 


This statistic is considerably larger than the .59 
(corrected for attenuation) reported by Burt 
(1966). However, as Kamin (1974) pointed 
out, there are several large-scale studies that 
support the Newman et al. figure. Also, as 
has already been argued, the principle of the 
variability of G with age implies a figure rather 
higher than the .57 generally accepted for 
siblings. (One must remember also that the 
factor of greater environmental differences for 
siblings vis-à-vis twins further increases the 
difference between the two correlations.) 
Finally, Jencks et al. provide data from the 
four foster-child studies: Burks (1928), Free 
man (1928), Leahy (1935), and Skodak and 
Skeels (1949), 

Together the studies provide an estimate of 
Z — r(AC, AM; NSP). The Skodak and Skeels 
study is of particular importance in that it 
furnishes data that are used to derive the 
variation of r(Gxc;, Охм) with age x and also 
of r(Gx, G16) with x. 


Basic Equations 

The equations required for the analysis are 
set out below. Their derivation, together with 
underlying assumptions, is briefly indicated 
in the Appendix. 

l. For children living with their natural 
parents, | 

Охо: = hGyoz + еЕмс. (20) 

This, of course, is the same as Equation 6. 

2. For foster children with no selective 
placement, 


Оло: = (1/о) (Gacs + eEacz), 


where a = (k + ез), Obviously, it is assumed 
that the genetic and environmental variances 
are the same for foster children as for oWh 
children ог, at least, that the data can be ad- 
justed in accordance with this assumption. The 
covariance for foster children is of course 2610 
if there is no selective placement. 


(21) 


Ў In the case of a moderate degree of selective 
y placement (as for the Burks, Leahy, and 
Skodak and Skeels studies—Freeman is the 
exception), it can be shown that little error is 
involved in assuming that Equation 21 still 
holds (see Appendix). 

3. For foster child and foster parent (no 
selective placement), 
Z = "(АС, AP; NSP) 

= (e/a)r(Exc, Qu»). (22) 

4, For foster child and foster mother (SP 
refers to selective placements), 
r(ACx, AM; SP) 

= Z + (h/«)r(Gnez, Quw)r(NM, AM); (23) 


that is, the correction for obtaining Z from 
r(ACx, AM; SP) 


= —(h/a)r(Gnez, Qxm)r(NM, AM). (24) 
, 5. For parent and own child, 
Y, = r(NCx, NP) 
= hr(Gucs Qu») + aZ. (25) 


Assuming that for x = 16, the Gs for children 
and parents are comparable, one can write 


hr(Gxcis, Qu») 
= gh(h + еура + (Е, М), (26) 


x where р is defined as earlier (see Equation 12). 
6. For covariance, 
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1+ (Е, M) 
T. For foster child and natural mother, 

r(ACx, NM; NSP) 

" = r(ACx, NM; SP) — Zr(NM, AM), (28) 


where the second term on the right-hand side 
is the correction for selective placement. Also, 


(АС, NM; NSP) 
= (h/o)r(Gucs Quw). (29) 
„8. For fraternal twins (of the same sex), 
r(1, 2) = (61, Ga) 
+ 2hes + &r(Ex, Ез), 
Vhere it is assumed that 


S= r(Gy Е) = r (Gz, Ез) 
= (6, Ex) = r(Gs, EJ. 


2hes = (У — 22). (27) 


(30) 


Y 
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For identical twins living together, it is 
generally accepted that the environmental 
variance within a pair of twins is .03 (3%). 
According to Jencks et al. (1972, p. 308), the 
environmental variance within a fraternal twin 
pair is not likely to be much more than .03. 
But some critics would regard this as a rather 
tenuous assumption. Thus, Kamin (1974, 
p. 99), referring to data from same-sex and 
opposite-sex fraternal twins, argued that the 
environmental variance for fraternals must be 
considerably higher than that for identicals. 
However, there is evidence that the IQ vari- 
ance for boys is greater than that for girls (cf. 
Publications of the Scottish Council, 1949), 
and it can be argued that the data to which 
Kamin refers are more easily explained on the 
basis of greater genetic variance for boys than 
for girls. This would mean, of course, that 
there should be a difference in IQ heritability 
between boys and girls and that the result of 
all “mixed” analyses, like that of the present 
study, is only an average for the sexes. 

Nevertheless, for the moment, I treat the 
environmental variance within a fraternal twin 
pair as an unknown quantity and denote it 
by w; that is, e'r(Ex, Ез) = ё — о. Equation 
(30) then becomes 


7(1,2) = (61, Gi) + ле + è — ә. (81) 


The genetic correlation (G1, бз) is given by 
Equations 14 and 17. Substituting in Equation 
31, one obtains 


r(1, 2) 
= №[.25 + -50g + 2g (h + es)r(F, M)] 
+ 2hes+e—w. (32) 
Outline of Analysis 


The first step is to obtain an estimate of 
Z = r(AC, AM; NSP). The figure adopted 
was the average of four estimates, one from 
each of the four foster-child studies. Since this 
involved the application of corrections for 
selective placement, dependent for their value 
on the final solution, a reiterative method had 
to be employed for the analysis as a whole; 
that is, initially approximations to these 
corrections were used. Then when a solution 
was obtained that had been derived from these 
values, the corrections were revised. The 
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procedure was repeated until no further 
revision was required. 

For simplicity of presentation, the reitera- 
tions are omitted from the account that 
follows, and only the final values for the 
corrections appear. Since a = (h?+ е)! is 
involved in these corrections, the final value 
for o is assumed in making this simplified 
presentation. It follows that in completing the 
analysis, it is necessary to show that the values 
finally obtained for k? and & are consistent 
with the initially assumed value for o. The 
remaining steps of the analysis are as follows: 

1. Values are found for /r(Gwcis Омр) in 
Equation 26 and for Ys in Equation 27. De- 
tails of this are presented later. A value for the 
covariance (2hes) follows immediately from 
Equation 27; also, Equation 26 can be ex- 
pressed entirely in terms of the four unknowns, 
g, h, e, and s. 

2. Equations 26, 27, 32, and 4 can then be 
solved so that values for g, /, e, and s are ob- 
tained for a range of values of w (the environ- 
mental variance within a fraternal twin pair). 

3. Finally, a suitable value for о is deter- 
mined from a consideration of the likely values 
for е? (the total environmental variance) and 
its between-families and within-family com- 
ponents. Data used in making this decision 
are (a) the difference in the correlations for 
fraternal twins and ordinary siblings and (b) 
the results of studies of separated identical 
twins. With the choice of a value for w, the 
other unknowns (g, h, e, and s) are fixed. One 
can then write down the contributions of 
heredity, environment, and covariance to the 
total IQ variance, and, using the formulae in 
Table 1, one can derive the Va, Улам), 
and V p components of the genetic variance. 


The Correlation Z — r(AC, AM; NSP) 


The Burks (1928) study. Burks's study 
yielded the following statistics: r(AC, AM) 
= .23 and r(AC, AF) = .09, both correlations 
corrected for attenuation. 

The correlation r(AC, AF) is abnormally 
low and can be shown to be the cause of the 
spuriously high value for the between-families 
environmental variance (17.6%) that Burks 
obtained from her multiple regression analysis. 
I therefore deal only with r(AC, AM). In any 
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case, selective placement appears to operate 
mainly through the mothers (natural and 
foster); also, statistics on selective placement, 
are usually with respect to mothers. The 
correction for selective placement is given by 
Equation 24. Assuming a value of .16 for| 
(NM, AM) (cf. Jencks et al., 1972, p. 277) 
and applying the reiterative method, one ob- 
tains a value of —.062 for the correction (the 
final value of a — .902); that is, it appears 
that Z(Burks) = .23 — .062 = .168. 

A further correction for restriction in range 
of family environment might be necessary, 
but it is not likely to be large. One must simply 
bear in mind that the figure of .168 could bel 
on the low side. 

The Leahy (1935) study. The values 
obtained for r(AC, AP) were as follows: 
r(AC, AM) = .20 and r(AC, AF) = .15, ой 
.214 and .161 (approximate) when corrected 
for attenuation. In her study, Leahy did not 
correct for attenuation, but did apply a correc- 
tion for restriction in range. (The standard 
deviation of the IQs for the foster children was 
only 12.5.) This correction gave her values 
of .24 and .19, respectively, and it was these 
values that Jencks et al. combined with the 
Burks and Freeman (1928) values to get their 
overall mean for r(AC, AP), uncorrected for 
attenuation. However, a closer examination of 
Leahy’s study shows that the small value for 
the standard deviation of the foster children’s 
105 was due mainly to a restriction in the 
genetic range of the foster children; a correc- 
tion for this must reduce the correlations and 
not increase them. There was, nevertheless, 
some restriction in the environmental range of 
the foster children, as is evidenced by the fact 
that the standard deviation of the environ 
mental status score was 54.3 for foster children 
and 59.6 for controls.. || 

When all the necessary corrections are made; f 
it is found that the correlation between IQac 
and IQam amounts to about .18 (because 0 
their lengthiness, these calculations are not 
reproduced here) ; that is, Z(Leahy) = .18. 

The Freeman (1928) study. This is the 
largest of the foster-child studies, involving 
401 foster children. Correlations yielded bY 
this study are considerably higher than thos? 
obtained in the Burks and Leahy studies. Since 
the average age of adoption was 4 years ^ 
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| months, selective placement is an obvious 
explanation, but this was, resisted strongly by 
Freeman. The values of r(AC, AP) obtained 
by Freeman (uncorrected for attenuation) 
were r(AC, AM) = .28 and r(AC, AF) = .34. 
Another important statistic in Freeman's study 
is r(Qac, HR) = .48, where HR is the home 
rating of the foster home. 

In a reanalysis of Freeman’s data, to be 
submitted for publication, the difficulty of 
selective placement was avoided by a com- 
paratively simple technique. Instead of con- 
sidering the correlation between the IQs of 
foster children and the HRs of the foster 
homes—a procedure that cannot avoid the 
factor of selective placement—I estimated the 
IQ gains of the foster groups as a result of 
being moved from natural to foster homes. 
Estimates were also derived for the mean 
HRs of the natural homes—Freeman gave only 
" the HRs for the foster homes—and, as a result, 
a measure of the mean gain in IQ per unit 
increase in HR was obtained. This, of course, 
can easily be converted to a correlation. In 
this way, I obtained a revised measure for 
r(Qac, HR), which, when corrected for atten- 
uation, amounted to only .295. 

_ The question arises, What is the correspond- 
ing value of r(AC, AM)? It is obviously some- 

; thing less. As a result of further calculation, I 
was able to show that this correlation must be 
about .20; that is, Z(Freeman) = .20. 

The Skodak and Skeels (1949) study. This 
study does not provide values for r(AC, AP). 
But it has the special feature of providing 
values for r(AC, NM) at different ages of the 
foster child. 

» The foster children were tested on four 
Occasions, at mean ages of 2 years 2 months, 
4 years 3 months, 7 years 0 months, and 
13 years 6 months. In the case of 63 children 
(40 girls and 23 boys), the IQs were also avail- 
able for the natural mothers. In all cases, the 
Stanford-Binet scale was used. But at the last 
testing (M = 13 years 6 months) the children 
Were also given the 1937 Terman-Merrill scale. 
This was done in view of the inaccuracy of the 
Stanford-Binet standardization at the older 
ges. 

Skodak and Skeels provided a magnificent 
appendix to their study, and, using the data 
Provided there, I was able to obtain the values 
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Table 2 

Correlations Between IQs of Foster Child 

(AC) and Natural Mother (NM) for Different 
Ages of the Foster Child 


r(AC, NM) 
M age 
(years-months) Girls Boys Total 

2-2* .202 —.248 .037 (.00)ь 
4-3 .299 .233 .275 (.28) 
1-0 .363 .330 348 (.35) 
13-6 .520 .066 .381 (.38) 
13-6* 472 .292 415 (.44) 


Note. The correlations were derived from Skodak 
and Skeels's (1949) data and are uncorrected for 
attenuation; for girls, n = 40; for boys, n = 23. 

a Both children's and mothers’ IQs were derived 
from the Stanford-Binet scale. 

b Correlations in parentheses are figures given by 
Skodak and Skeels (1949). 

Children’s IQs were derived from the Terman- 
Merrill scale and mothers’ IQs from the Stanford- 
Binet scale. 


of r(AC, NM) at the different ages, for both 
boys and girls and for the two together (see 
Table 2). Two of the values obtained for the 
total group differ slightly from the correlations 
provided by Skodak and Skeels (in particular, 
the Terman-Merrill value for 13 years 6 
months). 

Three things should be noted: 

1. The effect of selective placement, as 
Jencks et al. (1972, p. 282) pointed out, is 
relatively unimportant. The correction is 
—Zr(NM, AM) (see Equation 28), which for 
r(NM, AM) = .16 and Z = .20 works out at 
—.032 (approximate). (It is understood that 
in correcting for selective placement, the same 
correction has to be applied to all correlations 
in Table 2.) 

2. From an examination of Table 2, it is 
seen that for both girls and the total, the 
correlations increase steadily with chrono- 
logical age. When a curve is fitted to the data, 
one obtains a rising curve that flattens out at 
the later ages. In the case of the correlations 
for the boys, the same trend is discerned, but 
the figures are very erratic. Sampling error 
obviously comes to mind. 

3. With the number of boys and girls 
amounting to only 40 and 23, respectively, the 
degree of sampling error can be large. If one 
thinks in terms of a rising curve, the effect is 
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to raise or lower the position of the curve but 
not to change its general shape. Later analysis 
appears to indicate that sampling error is 
positive for the girls and distinctly negative for 
the boys. Furthermore, partly as a result of its 
greater homogeneity and smaller Size, the boys' 
group shows greater fluctuation in its correla- 
tions. Also, despite the general homogeneity 
of the boys' group, the IQs for one boy and his 
natural mother lie well outside the range of 
the others, and this, together with the faulty 
Stanford-Binet Standardization, adds con- 
siderably to the erratic variations in the corre- 
lations for the group. These irregularities also 
show to some extent when the boys are com- 
bined with the girls and the correlations for 
the total group are obtained. 

In view of these considerations, it is obvious 
that one has a certain amount of difficulty in 
deciding what one should take as the value of 
r(AC, NM) for the four mean ages of the 
Skodak and Skeels study. Despite the obvious 
differences in the boys’ and girls’ groups, I 
choose the values for the total group; in the 
case of the oldest of the four ages, 13 years 
6 months, where the value is critical, I some- 
what arbitrarily choose the figure .415. This 
estimate has at least the advantage of being 
almost the same as the .41 assumed by Jencks 
et al. (1972, p. 282). 

Thus, for the four Skodak and Skeels mean 
ages, the values of r(AC, NM) are .037, 275, 
:348, and .415 or, correcting for attenuation, 
:040, .299, .378, and .451, 

When I now apply the correction for selec- 
tive placement (— -032), the values become 
:008, .267, .346, and .419, In addition to these 
values, one requires the values for ages 8.2 
and 9.3, the respective mean ages of the Burks 
and Leahy studies and also the value for age 
16. By simple interpolation and extrapolation, 
these values are found to be :364, .384, and .423 
(approximate) ; that is, at age x = 93, 


7(AC9.3, NM; NSP) = 384, 
and therefore by Equation 29, 
(Слог. Qum) = .3840 = 346 


assuming that the final value fora = -902). 

It now follows from Equation 25 that oZ 
= .55 — .346 = .204, whence 2 = -204/.902 
= .226; that is, Z(Skodak and Skeels) = .226, 


NEIL GOURLAY 


mates for 2 = r(AC, NM; NSP), namely, 
Burks, .168; Freeman (revised), .200; Leahy, 
-180; and Skodak and Skeels, .226. The agree- 
ment is remarkably good. The average of the 
four estimates is 194, but since the figure for 
the Burks study is probably on the low side, 
I take .20 as the final estimate; that is, 


Z (final estimate) — .20. (33) 


This estimate is somewhat less than that of 
Jencks et al., which appears to be about .23 
on correction for selective placement. One of 
the main reasons for the difference is Tencks 
et al's acceptance of the abnormally high 
figures produced by the original Freeman 
study. Another contributing factor is the 
mistake made by Leahy (in correcting her 
correlations for restrictions in range). Also, it 
is seen that there is now no need to reject the 
Skodak and Skeels study 
Jencks. 


Derivation of Values for (буо, Оум) and 
У. = r(NCI6, NM) 


As was pointed out earlier in the outline of 
the analysis, values are required for these two 
expressions in order to simplify two of the four 
equations required for the final solution. In 
deriving the values, 
the Skodak and Skeels r(AC, NM; NSP) 
correlations for sampling error. The procedure 
is the reverse of that followed in estimating Z 
for the Skodak and Skeels study. 

Assuming a = .902 and using the overall 
estimate of Z as given by Equation 33; 


oZ = 902 X .20 = .180. 
From Equations 18 and 25, it follows that 
(Смс, NM) = .550 — .180 = .370. 
Substituting in Equation 29, 
7(AC9.3, NM; NSP) = .370/902 = 410. 
Comparing this with the value of .384 
Provided by the Skodak and Skeels study, it 
15 now seen that sampling error, for boys and 
girls together, is small and involves a correction 
of only + .026; that is, the final values of 


7(AC, NM; NSP) for the four ages involved 
in the Skodak and Skeels study are .034, .293, 


(34) 


The averaged result. One now has four | 


| 


as “deviant,” as did 


it is necessary to correct ` 


372, and .445. For ages 8.2 and 9.3, the 
interpolated values are .390 and .410. Last, 
Yor age 16 the extrapolated value becomes .449. 
By Equation 29, it follows that 


hr(Gxcis Qnm) = 449a = 405. (35) 
Also, by Equation 26, 
gh(h + еә) + r(F, M)] = .405 
or, substituting for r(F, M), 
1.57¢h(h + es) = 405. (36) 
' Last, by Equations 25, 34, and 35, 
Vis = r(NC16, NM) 
= 405 + .180 = .585. (37) 


The Covariance 2hes 


The relevant equation is 27. By Equations 
34 and 37, it follows that 


* 
2hes — .186. (38) 


Also, a= (k +e) = (814)! = .902, the 
á value used throughout the above analysis. 
The estimate for the covariance (.186) is 
identical with that of Jencks, despite differ- 
ences in the statistics. It is also much higher 
than the 1495-1596 obtained directly from the 
Burks, Freeman, and Leahy studies. Tt should 
however be remembered that the environ- 
mental measures employed by these studies 
(can only be approximate; it is possible that a 
more precise measure would yield a higher 
estimate for the covariance. 


The Final Solution 


As indicated earlier, four equations lead to 
the final solution. These are hè + & + 2hes = 1 
(Equation 4); 1.57gh(h-+ es) = 405 (Equa- 
AL tion 36); and 2hes = .186 (Equation 38). The 
fourth equation is 32, which on substituting 
.692 for r(1,2) (Equation 19) and .57 for 

r(F, M) becomes 


692 = 12,25 + .50g + 1-148 (h + e] 
+ 2hes + € — w- 


These four equations involve five unknowns: 
g, h, e, s, and w. Since о has a minimum value 
of .03 (the environmental variance within an 
identical twin pair) and, as Jencks et al. 


у 


HEREDITY VERSUS ENVIRONMENT 


609 


Table 3 
Values of h?, €, and Other Parameters 
Corresponding to Range of Values of w 


e 


Parameter 03 .04 .05 .06 .07 .08 
2hes 4186 .186 .186 .186 .186 .186 
g .361 .368 .376 .383 .392 .400 
n 1621 .607 .593 .579 .566 .552 
e 1193 .207 .221 .234 .248 .262 
5 (269 .263 .257 .253 .249 .245 
r(Gi, бз) .553 .559 .566 .573 .580 .587 


Моје. о = environmental variance within a fra- 


ternal twin pair. 


stated, is not likely to exceed this figure by 
any great amount, I solve the four equations 
for g, h, e, and s, taking in turn values of 
103, .04, ..., -08 for w. The results are pre- 
sented in Table 3. 

There remains the question of what value to 
choose for w. There are two approaches to this 
question. The first concerns the difference be- 
tween the correlations for fraternal twins and 
ordinary siblings: about .12 (.69 vs. .57). 
Jencks et al. (1972, pp. 289—209) had difficulty 
in trying to explain this difference. In partic- 
ular, they considered the effect of the imper- 
fections of the Stanford-Binet standardization. 
But probably at most these account for only 
.03, that is, about a quarter of the difference 
in correlation. In the end, Jencks et al. accepted 
12 as largely representing the difference in the 
environmental variance within a pair for 
siblings vis-à-vis twins. 

However, the principle of the variability of 
G with age is relevant here; the correlation 
between the genotypic values of ordinary 
siblings is less than that for fraternal twins— 
because of the age difference. A tentative 
analysis suggests that as much as .05 can be 
attributed to the G variability factor. Together 
with the effect of the faulty standardization, 
this suggests that possibly no more than .05 
can be attributed to the difference in environ- 
mental variance for siblings vis-à-vis fraternal 
twins. 

One would imagine that the difference in 
the environmental variance within a pair for 
identical and fraternal twins would be much 
less than that for fraternal twins and ordinary 
siblings—again because of the age factor—say, 
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Table 4 
Estimates of Between- Families and Within- Family Components of Environmental Variance e 


for Range of Values of w 


Variance estimates .03 04 .05 .06 .07 .08 
Present study 

Within-family 

component (og)* .08 109 10 1 42 A3 
Environmental 

variance (e?) 193 207 221 234 248 262 
Between-families 

component (vz) 113 117 121 124 .128 132 

Newman, Freeman, & Holzinger (1937) and Burt (1966)4 

Ca .165* 
ове .085 075 .065 .055 .045 .035 
% reduction in vg due to 

selective placement* 24.5 35.9 46.3 55.6 64.9 73.5 


“vr = w + .05 (assuming that difference between fraternal twins and siblings for environmental variance 
within family is .05). 
> Obtained by subtracting estimate of ог (present study) from estimate of е? (Newman, Freeman, 


Holzinger, 1937; Burt, 1966). 
° Obtained by calculating 100 (1 — R) when R = ratio of estimate of vg (Newman, Freeman, & Holzinger, 


1937; Burt, 1966) to estimate of vg (present study). 
4 Studies used separated identical twins. 


* Average of Newman, Freeman, and Holzinger (1937) and Burt (1966). 


:01 or .02. It appears therefore that о can be 
.04 or .05, but is unlikely to be more. It also 
follows that the environmental variance within 
a family is no more than .09 or .10 (i.e., 9% 
or 10% of the total IQ variance). 

A second approach is provided by the data 
from studies in which identical twins are 
brought up separately or apart (ITA). If the 
twins are separated at an early age and if there 
is no selective placement, then the variance 
within a pair is an estimate of the total en- 
vironmental variance (both between families 
and within family). The Newman et al. (1937) 
study gives a figure of 36.3 (corrected for error) 
for the variance within a pair, that is, 16.2% of 
the total IQ variance (15* = 225). The figure 
for Burt’s (1966) group test IQ data is only 
slightly more. The weakness of ITA studies is 
the likelihood of a considerable degree of 
selective placement being involved in the 
assignment of the twins to their foster homes. 
For this reason, there is a general tendency to 
dismiss these studies as useless for the purpose 
of heritability analyses. This, however, is a 
mistake. An important point, which is over- 


looked, is that selective placement affects only 
between-families environmental differences; it 
does not affect the factors that contribute to 
within-family environmental differences. Con- 
sequently, the 16.5% derived from the New- 
man et al. and Burt studies can be regardedy 
as the sum of two quantities: (a) the full 
environmental variance within a family and 
(b) a fraction of the environmental variance 
between families, depending on the degree of 
selective placement present. Having noted 
this point, one can perform the calculations 
presented in Table 4. 

The first half of Table 4 sets out estimates 
of the within-family and between-families = 
environmental variances obtained for the 
present study by assuming values of .03 to .08 
for w. By subtracting the within-family 
variance (vp in Table 4) from the e? = .165 of 
the Newman et al. and Burt studies, one also 
obtains the estimates of the between-families 
variance for the ITA studies (ов in Table 4). 
The differences between the figures in rows 3 
and 5 can be attributed to selective placement, 
and in the last line of the table, the reduction 
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NM; NSP) and Other Derived Functions With 


Chronological Age (x) 


ios) | 5. _______ 
Function 247 45 7.0 8.2 9.3 13.5 16 
r(ACx, NM; NSP) 112 243 .359 .390 410 A45 449 
Corrected* 034 .293 .372 445 
hr(Gacz, Дим)“ 101 20 .324 .352 .370 402 .405 
í r(Gx, G16)* .250 .541 .800 .869 913 991 1.000 
r(NCx, NC16; E constant)? .544 .722 .878 .920 .948 .995 1.000 
+ Note. АС = foster child; NM = natural mother; NSP = no selective placement. 
a The figures in this row are the four Skodak and Skeels (1949) correlations for the 63 adopted children and 


(in the first row) 


v Obtained by multiplying the figures in the first listed function by a = (h +e) (= 


* Obtained by dividing the first listed function 


and sampling error. The figures above 
correlations (see Figure 1). 
.902). 


4 Obtained by multiplying the third listed function by h?(=.607) and adding .393 (see Equation 39). 


in variance due to selective placement is given 
as a percentage. It seems very reasonable to 
assume that selective placement does not 
produce a reduction in variance greater than 
50%. It follows, therefore, that w should be 
in the range .03–.05. 

In light of the two arguments presented 
above, I choose w = .04 in order to obtain the 
final solution. It follows from Tables 3 and 4 
that the components of the IQ variance are 
60.7% for heredity, 20.7% for environment 
(9% within a family and 11.7% between 
families), and 18.6% for covariance. 

The solution is a compromise between Burt 
») and Jencks. It supports Jencks et al.’s figure 

for covariance, but is closer to Burt (and 
Jensen) as far as the relative importance of 
heredity and environment is concerned (since 
the ratio 60.7:20.7 is approximately 75:25). 
« The components of the genetic variance. For 
w= 04, „(Са Ga) = 559 (see Table 3). It 
follows that the genetic variance between 
families is .607 X 599 = .339, that is, 33.97 
of the total IQ variance. The within-family 
component is 60.7 — 33.9= 26.8%. How- 
ever, in accordance with the principle of the 
variability of G with age, these results apply 
only to fraternal twins or siblings tested at the 
same age. For siblings of different ages, the 
between-families component will decrease 
with increase in age difference and the within- 
family component increase. 

Turning now to Table 1, it can be shown 
that for о = .04, p = (h + es)? r(F, М) = 460 


а 


= 2(.368) (460) = .330. Also, 
Vam = 2g(1 — А) = 487 (48.7%), Vacam) 
= 26А = 250 (25%), and Vp —1-— 2g 
= .263 (26.3%) ; that is, 26.3% of the genetic 
variange can be attributed to dominance. 


and A = 2gp 


A Check on the Final Solution 


The method that has been described might 
be regarded as deriving from two hypotheses : 
(a) that the genotypic value G varies with age 
and (b) that this genetic variation with age 
can be measured through the variation of the 
correlation between the IQ of the adopted 
child and the IQ of the natural mother. Since 
the environmental component of the IQ 
variance is relatively small, a corollary to 
these hypotheses is that the variation of the 
correlation between IQ at a chronological age 
of 16 years and IQ at earlier ages can be 
explained mainly in terms of the variation of G 
with age, although changes in the individual's 
environment over the years must make a 
contribution. 

Thus the solution, and therefore the rationale 
on which it is based, can be checked by using 
it to derive the correlations between the IQs of 
children at 16 years and their IQs at earlier 
ages (x) for the theoretical case in which there 
is no environmental change between the two 
ages. On the basis of the two hypotheses and 
the corollary just stated, I would expect the 
theoretical correlations to be slightly greater 
than the observed correlations, the difference 
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t(NCx,NC16;Econst,) 


1(NCx,NC16) 


1(Gx,G16) 
4L yeu 
# Be r(ACx,NM;NSP) 


CHRONOLOGICAL AGE(x) 


CORRELATION 


Figure 1. The variation of correlations with age (x); 
r(NCx, NC16; E constant) = correlation between IQ 
at age x and IQ at age 16 with no change in environ- 
ment E; r(NCx, NC16) = correlation between IQ at 
age x and IQ at age 16 as obtained from empirical data ; 
r(Gx, G16) — correlation between genotypic, value at 
age x and genotypic value at age 16; r(ACx, NM; 
NSP) — correlation between IQ of adopted child at 
age x and IQ of natural mother with no selective 
placement. 


(or drop) increasing with the size of the interval 
between the chronological ages of x and 16. 
Furthermore, I would expect the drop in 
correlation to remain well within the bounds 
of the total variance attributed to environ- 
mental differences (20.795) if a reasonable 
amount of that variance is attributed to fairly 
constant environmental differences between 
families and also something to prenatal 
environmental differences. (Throughout the 
ensuing argument, it is understood that all 
correlations have been corrected for atten- 
uation.) 

For the theoretical case of no environmental 
change, one has for the child living with his 
own parents, 


r(NCx, NC16; E constant) 
= E[ (hGycz + eExcz) (Смс + €Excio) ] 
= (Смог, Gress) + 2hes + e, 

making the usual assumption that covariance 


is constant for x 2 5. Substituting the values 
for 2hes and е? provided by the final solution, 
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one obtains 


r(NCx, NC16; E constant) 
= (Смс, Geis) + 393. (39) 


It is necessary now to derive an expression 
for "(Слог Смсів) that will enable one to 
determine its value for any given x. I assume 
that 


r(Gxcz Оум) = r(Gucz, Guciu)r(Gucis On), 


whence 
Gnez, Qnm) 
Grex G _ "(Смс 
r(( NC: NC16) ERO Onna) 


_ r(ACx, NM; NSP) 
_ r(ACI6, NM; NSP) 


(by Equation 29); that is, "(Слог Gyow) 
= r(ACx, NM; NSP)/.449, substituting the! 
value for r(AC16, NM; NSP) obtained earlier, 
By means of Equations 39 and 40, one can | 
carry out the calculations summarized in 
Table 5. (The second function has been added 
to Table 5 to provide the values necessary for 
the calculation of selective placement correc- 
tions involved in the Burks and Leahy studies.) 
Figure 1 gives a diagrammatic representation 
of the results. Also included in the diagram is 
the curve representing the variation of 
r(NCx, NC16) with x, as shown by empirical 
data (cf. Jensen, 1969, p. 18). П 
It is seen that the theoretical curve 
r(NCx, NC16; E constant) lies above the em- 
pirical curve r(NCx, NC16), the differenc 
between the two correlations increasing as 4 
decreases. However, the differences between 
the two correlations are quite small for x > 5. 
Thus at age x = 5, it amounts only to 
about .07. : 
It follows therefore that the correlation 
between IQ at earlier ages and IQ at age 16 
can be explained mainly in terms of the varia- 
tion of G with age and owes comparatively" 
little to environmental changes with age. In 
other words, the analysis presented in this 
Section provides a convincing check on the 
final solution and lends further support to the 
hypotheses and methods on which it was based. 


(40) 
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Appendix 


Derivation of Basic Equations Used in the Analysis 


1. For own children, 
Qnez = hGnez + eEncz 


where k? + е + 2hes = 1 and s = rag, like ћ 
and e, is assumed to be constant for x > 5 


years. 
2. For foster children (no selective place- 


ment), 
Олс: = (1/а) (hGacz + eEacz) (Equation 21), 


where a = (h? + &?)!. With no selective place- 
ment, there is no covariance. It is also assumed 
that the genetic’and the environmental vari- 
ances for foster children are the same as for 
own children. 

In the case of selective placement, let 


Олс = h'Gacz + е'Елољ 


where 1 = k? + e? + e^ + 2h'e'rog (spy. 
Applying the same assumption as for no 
selective placement, it follows that 


ГА (2 +e)! (1 — covac)i 
h + е) а : 


where covac = 2h’e'rax sp). 

In the case of the Burks, Leahy, and 
Skodak and Skeels studies, in which only a 
moderate degree of selective placement is in- 
volved, it can be shown that covac lies in the 
range 0-.03. It follows that 


= Ма and е = e/a. 


(Equation 20), 


ag 
e 


3. For foster child and foster parent with no 
selective placement (Equation 22), 
Z — r(AC, AP; NSP) 
= EL (h'Gace + е Елс:)Олр] 
= E(e’ExczQap), 


since with no selective placement, there can be 
no correlation between Qap and Gac; that is, 


Z = (e/a)r(Eacz, Qar) = (e/a)r(Encz, Омь), 


assuming that parents treat foster children like 
their own children. 

With the further assumption that the corre- 
lation of Емс: with Омр is constant or ap- 
proximately constant with age, it follows that 


Z = (e/a)r(Enc, Qu»). 


4. For foster child and foster mother with 
selective placement (Equation 23), 
r(ACx, AM; SP) 
= EL (h'Gacz + е Елог)0лм] 
= Z + (h/a)r(Gacz, Qam). 
Assuming that selective placement takes! 
place mainly through matching of mothers 
(natural and foster), 
"(Саса Qam) = r(Gacz, Охм)”(Охм, Qam) 
= r(Gnez, Qnm)r(NM, AM); 
that is, 
r(ACx, AM; SP) 
= Z + (л/а), (Смс, Qxm)r(NM, AM), 
5. For parent and own child (Equations 25 
and 26), 


Y, = r(NC,, NP) 


[i 


E (hGnez + eEncz)Onv] 
hr (Gnez, Омр) + aZ 


1 


(by Equation 22). 

Assuming that at age x = 16, the G's of 
parents and children are comparable: 

Gucis = g(GNr + Gym) + random term; 
that is, Й 

г(Сксл Омр) = E[g(Gur + Gym) On]. 


Making the same assumption as Jencks et al. 
(1972) and Fisher (1918), that assortative 
mating takes place through the phenotypic 
values (i.e., IQs), 
E(GyrQnm) 

= r(Gyr, Qnm) 

= r(Gur, Oxx)r (Quz, Охм) 

(h + es)r(F, M) = E (Gym, Qnr), 


6. The derivation for covariance 2hes (Едба- 
tion 27) is as follows: Implicit in Jencks et al.'s 
(1972, p. 268) path model is the assumption 
that the G of the child is correlated with the 
E entirely through the IQs of the parents. The 
same assumption is made here, except that in 
accordance with the principle of the variability 
of G with age, it is necessary that the G's of 


“i 


whence à 
№ (Скс1в, Qu») = gh(h + es)[1 + r(F, М)Ј. 


A 


) 
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parent and child be comparable, that is, belong 
to the same age level. It is assumed that this 
child (x) is 16. I therefore 


is so when the age of 
derive the covariance at age 16 in accordance 


with Jencks et al.'s assumption and then make 
the standard assumption that the value of the 


covariance at X — 16 is the same for all other 


ages at which x 2/5; 


Grois = #(бме + Сим) + random term. 


|. Also, applying simple regression analysis, 


Enos = k (ONE + Охм) + random term, 
where 


r(Exc, Qu») _ aZ 


Er = Ti+ M) 


~1-+7r(F, М) 
(by Equation 22); that is, 


2hes— 2heE (Guc1e, Excis) 
=2heE[g(Guet+Gn)k (Qnr+Onm) ] 
= 2hegk2 (пез) 14 (Е, М)] 


* 
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497 
E Bree м)] 


аак 
1+, (Е, M) 
7. For foster child and natural 
(Equations 28 and 29), 
r(ACx, NM; SP) | 
= E[(h'Gace + e Васа) On] 
= (h/a)r(Gacz, дим) + (e/a)r(Eacz, Qum). 
The first term on the right-hand side is 
r(ACx, NM; NSP). From this, Equation 29 
follows. Also, 


"(Едса дим) = "(Васа 


(Yis—aZ) by Explanation 5. 


mother 


Qam)r (Qam, Qnm) 
= (aZ/e)r (AM, NM) 


(Equation 22), making the usual assumption 
that selective placement takes place through 
matching of mothers. Hence, 


(ејајг (Елса дим) = Zr(AM, NM). 
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Male speech and female speech have been observed to differ in their form, 
topic, content, and use. Early writers were largely introspective in their analyses; 
more recent work has begun to provide empirical evidence. Men may be more 
loquacious and directive; they use more nonstandard forms, talk more about 
sports, money, and business, and more frequently refer to time, space, quantity, 
destructive action, perceptual attributes, physical movements, and objects. 
Women are often more supportive, polite, and expressive, talk more about 
home and family, and use more words implying feeling, evaluation, interpre- 
tation, and psychological state. A comprehensive theory of “genderlect” must 
include information about linguistic features under a multiplicity of conditions. 


Both casual and serious observers of the 
human condition have long recognized that 
communication between the sexes is often 
frustrating. A possible cause of the difficulty 
is that men and women may in fact not 
really be speaking the same language (Jong, 
1977; Reik, 1954). 

Aspects of form, topic, content, and use? 
of spoken language have been identified as 
sex associated, Either men or women are 
more likely to produce specific utterances, 
Informal observations, speculations, and ster- 
eotypes in each category are discussed first. 
This presentation is followed by a report of 
empirical findings? from a variety of com- 
munication situations. Although reports of 
stereotypes and evidence for male and female 
Spoken language differences do not always 
coincide, they both contribute to one's un- 
derstanding of sex roles and communication. 


This review is based on a dissertation submitted 
to Teachers College, Columbia University in par- 
tial fulfillment of the requirements for the PhD 
degree. Deep appreciation is expressed to Edward 
Mysak, Lois Bloom, and Mary Parlee for their 
useful suggestions, criticism, and encouragement. 

Requests for reprints should be sent to Adelaide 
Haas, Department of Speech Communication, State 
University of New York, New Paltz, New York 
12562. 


Form 


The form of utterances can be described in terms, 
of their acoustic, phonetic shape . . . in terms of 
the units of sound, or phonology, the units of 
meaning that are words or inflections, or morph- 
ology, and the ways in which units of meaning are 
combined with one another, or syntax. (Bloom & 
Lahey, 1978, p. 15) 


Perhaps the most widespread belief about 
men’s speech as compared with women’s is 
that it is coarser and more direct. An early 
observer of style in language, Jesperson” 
(1922/1949), observed women’s speech to be 


the following ways: Men are readier to coin 
and use new terms, pun, utter slang expres- 
sions, and employ profanity and obscenity. 
Women, on the other hand, 


are shy of mentioning certain parts of the human 
body and certain natural functions by the direct and 
often rude denominations which men and espe- 
cially young men prefer when among themselves. 
Women will therefore invent innocent and euphe- 


1 Тһе categories form, topic, content, and ше 
were suggested by Lois Bloom of Teachers Col- 
lege, Columbia University and are described in 
Bloom and Lahey (1978). 

2 Mary Рапее of Barnard College, Columbia Uni- 
versity suggested an evaluative review, separating 
stereotypes from empirical findings. 


Copyright 1979 by the American Psychological Association, Inc. 0033-2909/79/8603-0616$00.75 
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t 
generally more conservative than men’s 
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d 


SEX DIFFERENCES IN SPOKEN LANGUAGE 


mistic words and paraphrases which sometimes may 

[ in the long run come to be looked upon as the 
plain or blunt names and therefore in their turn 
have to be avoided and replaced by more decent 
words. (p. 245) 


affirmed that “we all know 
‘man talk’ and a “woman 

He observed that “men. . · 
will not hesitate to say ‘Hell’ or ‘Damned.’ 
.. . Women will rarely say ‘It stinks’ pre- 
ferring to state that it has a bad smell” (p. 
14). 

More recently, Kramer (1974b) quoted 
the following: “The New Seventeen on people 
who use ‘those four letter words’: Boys find 
it especially repugnant when girls use those 
words, One boy described girls who use pro- 
fanity as having nothing better to say” (p. 
22). 

Lakoff (1973) observed that men use 

» stronger expletives such as shit and damn, 
whereas women use weaker ог softer profan- 
ity such as ой dear, goodness, от fudge. Farb 
(1974) suggested that dear me and gracious 
are part of the female lexicon, and Ritti 
(1973) stated that most teachers of the sixth 
grade are well aware that young girls use far 
more “expressives” such as oh and wow than 
do the boys in their classes. 

Farb wrote, “Nowadays young women use 
words that were formerly taboo for them 
with as much freedom as young men use 
them" (p. 50), but young men are not per- 
mitted the more euphemistic expressions. 
However, research on people's perceptions 
of language as either male or female suggests 

,that the earlier stereotypes of coarse, free 
male language contrasted with euphemistic 
female forms still hold. Garcia-Zamor (Note 
1) asked four boys and four girls in an 
upper-middle-class nursery school to indicate 
whether certain utterances were produced by 
а male or female doll; shit was seen by both 
boys and girls as male, and drat was seen by 
‘both as female. In a study of adults’ stereo- 
types, Kramer (1974a) asked college stu- 
dents to determine whether various captions 
taken from New Yorker cartoons were ut- 
tered by males or females. Men in the car- 
toons were found to swear more than women 
and for more trivial reasons. 


Reik (1954) 
that there is à 
talk" (p. 14). 
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A careful review of the literature revealed 
no empirical studies of the comparative use 
of expletives. Profanity and obscenity do not 
readily submit to laboratory study. Docu- 
mentation of this stereotype would require 
recording speech of female-only, male-only, 
and mixed-sex groups in various settings. 
The speakers should certainly not know they 
are being observed. 

Reports by individual investigators writ- 
ing about their own experiences (Key, 1975; 
Lakoff, 1975) strongly suggest that the form 
of expressives is sex associated. A possible 
explanation is that expressives "serve dif- 
ferent functions for men and women. Males 
use them when they are angry or exasperated. 
... But women's exclamations are likely to 
convey enthusiasm" (Kramer, 1974a, p. 83). 

The form of women's language is reputed 
to be more polite than the form of men's. 
Lakoff (1975) noted that “women are sup- 
posed to be particularly careful to say 'please' 
and ‘thank you’. . . à woman who fails at 
these tasks is apt to be in more trouble 
than a man who does so” (p. 55). She specu- 
lated that “the more one compounds a re- 
quest, the more characteristic it is of women's 
speech" (p. 19). An example of a doubly 
compound request is *Won't you please close 
the door?" (p. 18). 

Only one very limited empirical study of 
politeness forms was found: 16 women born 
in Maine around 1900 used more politeness 
forms than 12 male counterparts when inter- 
viewed by college students (Hartman, 1976). 

According to Austin (1965), high, oral 
sounds and giggling sounds are appropriate 
for females in courtship, whereas males pro- 
duce low and nasal sounds. Coser (1960) re- 
corded verbal interactions involving humor 
at 20 staff meetings of a mental hospital. 
She found that senior stafí members (psychi- 
atrists) made more jokes than junior staff 
members (paramedics) and that men made 
more witticisms than women (99 out of 103), 
but women often laughed harder. Coser sug- 
gested that this concurs with the sex roles of 
male authority and female receptivity. Haas 
(1978) similarly found that girls laughed 
more than boys in mixed-sex dyads. 
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Women are permitted to cry, as reflected 
in Кеу (1975) observation that “if a female 
talks or cries into a pillow it's ‘muffled sob- 
bing’; if a male does the same, it’s ‘blub- 
bering,’ with negative connotations” (p. 109). 
Crying has been observed more frequently in 
girls than in boys. In an analysis of 200 
quarrels of preschool children, Dawe (1934) 
found that 35.8% of the girls cried compared 
with only 20.2% of the boys. 

Several writers (Labov, 1966; Levine & 
Crockett, 1966; Trudgill, 1972) have specu- 
lated that men use more slang expressions 
than women or even that slang is man’s do- 
main, Conklin (Note 2), however, observed 
that women’s vernacular has not been studied 
and suggested a need to especially examine 
the dialect of all-female groups. Empirical 
Phonological studies of -in versus -ing end- 
ings (Fischer, 1958), of -uh versus -er endings 

(Levine & Crockett, 1966; Wolfram, 1969), 
and of f, t, and th usage (Wolfram, 1969), 
show black females more likely to use stan- 
dard forms than black men. Similar results 
were found in studies of pronominal apposi- 
tion, as in “my brother he went to the park,” 
and. multiple negation (Shuy, Wolfram, & 
Riley, 1968). Garvey and Dickstein (1972) 
noted more nonstandard forms in the speech 
of six dyads of boys from four population 
groups (black, white, and low and middle 
socioeconomic status) than in matched girls. 

Joffe (1948) noted sex differences in the 
vernacular of menstruation, including the 
greater use of color references by men and 
of personification by women, For example, 
men might say “she’s waving the red flag,” 
whereas women might refer to “having my 
friend.” This finding was part of a larger 
study in New York City on attitudes and 
beliefs about menstruation. 

Jesperson ( 1922/1949) believed women 
leave sentences unfinished or dangling more 
often than men. In an informal survey of 
television panel discussions, Bernard (1972) 
noted that women are more frequently in- 
terrupted than men. This may help explain 
the unfinished sentences. No empirical evi- 

dence for sex differences in sentence com- 
pleteness has been noted. Zimmerman and 
West (1975), however, reported in a study 
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of 11 male-female dyads that “virtually all 
the interruptions and overlaps are by the J 
male speakers (98% and 100% respec- 
tively)" (p. 115). They further noted that 
not one of the women who were interrupted 
protested. Similar results were reported by 
Eakins and Eakins (1976). 

Women have long had the reputation for 
being more loquacious than men: “Où lemme 
il y a, silence il n’y a” (Where there’s woman, 
there's no silence.) “The tongue is the sword ! 
of a woman, and she never lets it become 
rusty” (China). “The North Sea will sooner 
be found wanting in water than a woman at 
a loss for a word” (Jutland; cited in Jes- 
person, 1922/1949, p. 253). Jesperson ђе- 
lieved that 


the superior readiness of speech of women is a 
concomitant of the fact that their vocabulary is 
smaller than that of men. But this again is con- 
nected with another indubitable fact, that women , 
do not reach the same extreme points as men, but 
are nearer the average in most respects. (p. 253) 


He gave many examples of how women are | 
supposed to talk ahead of thinking, to talk | 
more than men. 

Lakoff (1975) informally observed longer 
sentence forms in women than in men, pos- 
Sibly resulting in the impression of more 
speech. For example, women are more likely "| 
to compound a request: “Will you help me 
with these groceries, please?" is more charac- 
teristic of women than “Help me" or even 
"Please help me with these groceries." Em- 
pirical evidence, however, suggests that at 
least under certain conditions women's sen- 
tences are shorter than men's, For example, 
at professional conferences, the mean time 
used by women asking a question was re- 
ported to be less than half that used by men 
(Swacker, 1975). 

Studies of sex differences in length of 
utterance in children indicate that girls are 
significantly superior to boys at various. 
matched age levels in mean length of utter- 
ance (Winitz, 1959), Maccoby (1966, p. 
335) reported similar results in her summary 
of 19 studies. Garvey and Ben Debba (1974), 
however, found no sex differences in words 
per utterance among same-sex or mixed-sex 
dyads ranging in age from 3j to 53 years 
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and participating in free-play testing situa- 

pes In considering mean length of utter- 
ance of children, language maturation must 
be considered a factor, since utterances nor- 
mally become longer as skill in language in- 
creases and most studies show that girls de- 
velop language facility earlier than boys. 
Limited evidence, then, suggests that al- 
though in early childhood female sentences 
are longer than those of males, by adulthood 
the reverse may be true. 

. Mixed results have been reported in studies 
of verbosity (Maccoby, 1966, p. 335). In 
a task involving adults’ responses to pic- 
ture stimuli, Wood (1966) concluded that 
men tend to use more words than women in 
responding to a given stimulus. Like results 
in similar situations were found by Argyle, 
Lalljee, and Cook (1968) and Swacker 
(1975). Cherry’s (Note 3) review of 11 
ystudies dealing with children’s quantity of 
speech reported that girls tended to exceed 
boys in this dimension in 6 of the studies. 
No differences were noted in 4 studies. 

The participants in а communication influ- 
ence quantity of verbalization. In mixed-sex 
groups, men tend to talk more than women 
(Argyle et al., 1968; Bernard, 1972). 

Among children the composition of the 

+ communication group also seems to affect 
verbosity. Mueller (1972), in a study of “the 

4, maintenance of verbal exchanges between 

young children” (ages 34-5 years) found 

that “boys talked significantly more than 
girls" (p. 933) in a free-play situation to 

same-sex peers. Brownell and Smith (1973), 

„however, reported more verbal productivity 

among 4-year-old girls in comparison with 

same-age boys in mixed-sex dyads, triads, 

, and small groups. In preschool children, then, 

on 

boys have been found to talk more to boys 

and girls to talk more in mixed-sex groups— 

the reverse of the adult pattern. 

4 Entwisle and Garvey (1972) reported sex 

differences in verbal productivity among 

Baltimore children, with girls more produc- 

tive than boys; note that this finding is 

most marked among those of lower social 
class. Possibly no real difference exists in the 
quantity of talk that is produced by men and 
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women, but “girls are not supposed to talk 
as much as a man” (Kramer, 1974b, p. 17). 

In sum, the stereotype clearly shows 
women to be more verbose than men. Em- 
pirical evidence is mixed. Girls seem to talk 
somewhat more than boys, but adult women, 
especially in the company of men, have been 
found to talk less than their male com- 
panions. 


Topic 


Topic refers to the subject matter of the 
spoken utterance, to what the conversation 
is about. 

Kramer (1974b) captured much of the 
folklore related to topics of male and female 
conversations through her study based on 
New Yorker cartoons: 


Men hold forth with authority on business, poli- 
tics, legal matters, taxes, age, household expenses, 
electronic bugging, church collections, kissing, base- 
ball, human relations, health and—women’s speech. 
Women discuss social life, books, food and drink, 
pornography, life's troubles, caring for а husband, 
social work, age, and life-style. Several of the 
students who rated the cartoon captions said they 
considered all statements about economics, business 
or jobs to be male. (p. 83) 


The interviews by Komarovsky (1967) 
suggest similar stereotypes in blue-collar 
families. One 28-year-old wife commented 
that “[men] think we [women] are silly and 
talk too much. They think that women gossip 
a lot and they are against it” (p. 150). A 
36-year-old husband noted that women want 
“to talk about kidstuff and trivia like Mrs. X 
had her tooth pulled out” (p. 150). Women 
reported that they enjoyed talking about the 
family and social problems. Both sexes ac- 
knowledged that men prefer to talk about 
cars, sports, work, motorcycles, and local 
politics. 

Klein's (1971) observations of the working 
class in England are similar: 


Just as men in the clubs talk mainly about their 
work and secondly about sport and never about 
their homes and families, so do their wives talk 
first of all about their work, i.e.: their homes and 
families, and secondly within the range of things 
with which they are all immediately familiar. (p. 73) 
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In mixed-sex conversations the impression 
is that women initiate topics that are rarely 
followed through by men (Bernard, 1972; 
Chesler, 1972). 

Three studies in the 1920s of conversa- 
tional topics using tape-recorded fragments 
of conversations on city streets are of inter- 
est. Moore (1922) recorded 174 conversa- 
tions in New York City and reported that 
man-to-man topics included money and busi- 
ness (4896), amusements or 


time, sports 

of the time, and other 

of the time, Women talked to 
women about men (22%), clothing or dec- 
oration (19%), and other women (15%). 
Women talked about people in 37% of the 
conversations. Man-to-woman topics included 
amusement and sports (25%), money and 
business ( 19%), and themselves (23%). 
Women talked to men about amusements or 
Sports (2476), clothing or decoration (1796), 
and themselves ( 1796). 

In 1927, C. Landis analyzed 200 London 
conversations. The all-male topics were simi- 
lar to those in New York City and Columbus, 
but the women talked about a wider variety 
of topics among themselves. Landis suggested 
that in mixed-sex conversations, “the English- 
man when talking to a feminine companion 
adapts his conversation to her interests while 
American women adapt their conversations 
to the interests of their masculine compan- 
ions” (p. 357). 

In a study of “the women of the telephone 
company,” Langer (1970a, 1970b) reported 
that men discussed politics among them- 
selves, whereas women avoided religion and 

politics in their conversations, 
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Mulcahy (1973), using a self-disclosure 
questionnaire with 97 adolescents, reported у 
that female same-sex disclosure was greater 
than male same-sex disclosure. Major topics | 
for girls were "tastes, interests, and person- | 
ality” (p. 343); for boys high disclosure 
clustered about "tastes, and interests, work 
(studies), and attitudes and opinions" (p, 
343). "The lowest disclosure area for males | 
was Body, whereas it was 
males" (p. 354). 

Sause (1976) reported that kindergarten 
girls made more reference to the female role 
than did kindergarten boys, and this was the 
only category that girls referred to more than 
boys in this study of 144 subjects, Boys 
talked more about family and home environ- 
ment, recreation, other people, and animals, 
but the differences were not significant. Utter- 
ances were all to a male examiner who en- 
couraged the children to talk about two. 
stimulus objects—an irregularly shaped block 
and a toy fire engine. 

Knowledge of conversational topics is lim- 
ited. Although the evidence supports the 
stereotype that women talk more about 
people and men more about money, business, 
and politics, the studies date back to the 
1920s. Times have been changing! 


Content 


Content refers to the "categorization of 
the topics that are encoded in messages," 
Such as "object in general,” “actions in gen- 
eral" and the "possession relation in gen- 
eral" (Bloom & Lahey, 1978, p. 11). Con- 
tent differs from topic, since topic refers to 
particular objects, events, and ideas, whereas 
Content refers to the more Beneral concept 
of how the topic is referenced. 

Women's language is more emotional and 
evaluative than men's according to the stereo- 
type (Jesperson, 1922/1949; Kramer, 1974a;. 
Lakoff, 1975; Pei 1969; Reik, 1954). Jes- 
Derson wrote of women's fondness for hyper- 
bole and their greater use of adverbs of in- 
tensity such as awful, pretty, terribly nice, 
quite, and so. These all suggest value judg- 
ments. Reik believed terms such as darling, 
divine, sweet, adorable, I could just scream, 


Money for fe- | 
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jI nearly fainted, and I died laughing are 
female associated. Pei observed “extravagant 
adjectives” such as wonderful, heavenly, di- 
vine, and dreamy in women’s speech. Again 
the focus is on emotional value judgment. 

Lakoff’s (1975) list of female adjectives 
includes adorable, charming, lovely, and di- 
vine. Male adjectives are great, terrific and 
neat. Kramer (1974a) suggested that “words 
of approval” (p. 22) such as nice, pretty, 
darling, charming, sweet, lovely, cute, and 

k precious are used more frequently by women. 

Hartman (1976) tested and supported 
Lakoff's hypothesis that women use evalua- 
tive adjectives more than men. In her study 
of 70-year-old native Maine men and women, 
she found that women compared with men 

^ used many more words such as lovely, de- 
lightful, wonderful, nice, pretty, pathetic, 
pretty little, smartly uniformed, cute, dear- 
"est, gentle, gaily, beautifully, lovelies, very 
very, devoted, meek, perfectly wonderful, 
and stylish. Most women used awful and 

| pretty to mean very and so. 

Wood (1966) analyzed the speech of 36 
college students (18 men and 18 women) as 
they described photographs of a man's face. 
She found that males referred more directly 
to what was actually in the picture. Females 

' х 
were more interpretative and tended to be 
more subjective in their descriptions. Bar- 
А) ron's (1971) study of speech by teachers 
and pupils during regular classroom activities 
showed patterns similar to those reported by 
Wood. Through an analysis of the gram- 


matical cases of speaker's utterances, Barron- 


«found that women used more participative 
and purposive cases and men used more in- 
strumental and objective cases. Specifically, 
, women talked more about how people felt 
&. and why they behaved in certain ways. Men's 
Speech focused more on objects and actions 
related to these objects. 
, Gleser, Gottschalk, and Watkins (1959) 
studied the speech of 90 white adult men and 
women who were asked by a male examiner 
to tell about “any interesting or dramatic 
life experiences you have had” (p. 183). As 
did the other studies, this investigation re- 
vealed that women used significantly more 


Pk words implying feeling, emotion, or motiva- 
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tion (whether positive, negative, or neutral) ; 
they made more self-references and used more 
auxiliary words and negations. Male subjects 
referred more to time, space, quantity, and 
destructive action. This can be viewed as 
supporting Eble's (Note 4) suggestion that 
terms of hostility are more associated with 
men. 

Physical movement was more frequently 
referenced by kindergarten boys than by 
girls (Sause, 1976). Boys also used signifi- 
cantly more words classified as self, space, 
quantity, good, bad, and negative words. 
Garcia and Frosch (1976) also found that 
males talked more about spatial relations 
than females. Their subjects were 40 black, 
Anglo, and Spanish-speaking adults, ranging 
in age from 18 to 65 years, who were asked 
to respond to two pictures (one "female 
room" and one “male outdoors scene") from 
current magazines. Females described items 
in terms of patterns and colors more than 
did males. Also of interest was the observa- 
tion that “each [sex] group went into imme- 
diate detail when describing the visual which 
was stereotyped to their sex group, but 
paused to ‘orient themselves’ to the environ- 
ment when approaching the other visual” 
(p. 68). 

Comparative use of adjectives was studied 
by Kramer (1974b), Brandis and Henderson 
(1970), and Entwisle and Garvey (1972). 
College students writing descriptions of black 
and white photographs did not differ in the 
type or number of prenominal adjectives used 
or in the number or variety of -ly adverbs 
(Kramer, 1974b). However, according to the 
studies by Brandis and Henderson and Ent- 
wisle and Garvey, girls use more adjectives 
than boys. The Brandis and Henderson study 
was on spoken language by 5-year-old work- 
ing-class British children; the Entwisle and 
Garvey study was based on the written lan- 
guage of ninth graders when asked to write 
imaginative stories after viewing four stimu- 
lus pictures. 

Garvey and Dickstein (1972) found that 
fifth-grade black and white boys of low and 
middle socioeconomic status used the posses- 
sive construction more frequently than fe- 
males of the same age, race, and socioeco- 
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nomic status during oral communication in- 
volving problem-solving tasks. 

The stereotype of the content of spoken 
language, then, points to positive value judg- 
ments as female marked and hostile judg- 
ments as male marked. The empirical evi- 
dence suggests that the content of adult fe- 
male speech includes more words implying 
feeling, auxiliary words, negations, evaluative 
adjectives, interpretations, psychological state 
verbs, and purposive cases. Adult males use 
more terms referring to time, Space, quantity, 
destructive action, and perceptual attributes 
and more objective cases. Boys have been 
reported to use more words related to self, 
Space, quantity, good, bad, negation, and 
possession. It is likely that girls use more 
adjectives. Studies of adult use of adjectives 
show mixed results, 


Use 


"Language use consists of the socially and 
cognitively determined selection of behaviors 
according to the goals of the speaker and the 
context of the situation" (Bloom & Lahey, 
1978, p. 20). 

Bernard (1972) suggested that “instru- 
mental" talk is male associated. Men are 
Stereotyped as the conveyors of information 
and fact. Women "tend to be handicapped 
in fact-anchored talk. . . . They are... less 
likely to have a hard, factual background, 
less in contact with the world of knowledge" 
(р: 153). The male instrumental style in- 
cludes lecturing, argument, and debate. This 
has not been empirically documented to date. 

Assertiveness was observed as part of the 
male stereotype by Kramer (1974b) in her 
study of cartoon captions. Lakoff (1975) 
Suggested that women's speech is nonasser- 
tive. This concept has been developed by 
other writers, Kuykendall (Note 5) wrote 
that "clean, effective vigorous speech and 
writing is just what women, gua women, 

learn not to produce so as not to appear too 
assertive and so to offend” (p. 4). Further- 
more, “Assertion of competence and power 
by a female is regarded as deviant behavior 
so that she becomes the recipient of social 
sanctions" (Unger, Note 6, p. 43). Wolman 
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and Frank (1975) observed that in a pro- 
fessional peer group a woman was labeled“ 
bitchy or manipulative when her behavior 
was assertive and directive. Nursery school 
children also believe that competitive and 
aggressive language is appropriate for males 
only, as demonstrated by a study in which 
boys and girls were asked to ascribe various 
uttered sentences to a girl or boy doll (Gar- 
cia-Zamor, Note 1). Dawe ( 1934) found that | 
when nursery school children quarreled, boys 
were assertive by threatening and forbidding 
more often than girls. 

Tentativeness has been stereotyped as fe- 
male. Lakoff (1975) suggested that tag ques- 
tions (e.g, “It’s cold, isn't it?”) are used 
far more often by women than by men. This 
form of question avoids assertion and gives 
the addressee the option of agreeing or dis- 
agreeing. Women's speech is said to be 
"hedge marked." э 

Empirical evidence is mixed. Hartman 
(1976) reported that tentativeness was 
clearly female associated among the 70-year- 
old Maine natives whose speech she studied. | 
This was revealed in the women’s greater 
Production of qualifiers such as perhaps, 
I suppose, I just feel, probably, and as I in- 
terpret it and tag questions such as “Well, 
most people would say marriage, wouldn’t 
they?” and “It was grandmother, wasn’t it?” 
Swacker (1975) found that female college 
students indicated approximation when using 
numbers (“about six books”), whereas only 
one male used the tentative form in a task 
requiring the description of three pictures by 
Albrecht Dürer. However, in dyadic con- 
versations of college students, Hirschman 
(Note 7) found no difference between the 
sexes in the overall Proportion of qualifiers 
such as maybe, probably, I think, and I guess. 
In a somewhat larger study, Hirschman 
(Note 8) found that males uttered 7 think 
twice as much as females, (1 think is usually 
considered a qualifier, but Hirschman sug- 
gested that it served primarily as a way for 
more assertive speakers to present their 
opinions.) Loban (Note 9) reported that ex- 
Pressions of tentativeness including supposi- 
tion, hypothesis, and conditionality are asso- 
ciated with effective users of language from ; 
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kindergarten through sixth grade. Hass and 

À Wepman (1973) similarly found that uncer- 
tainty increased as a function of age in chil- 
dren 5 to 13 years old and noted that "there 
are many fine points about the uncertainty 
scores [with regard to the Age X Sex inter- 
action] that demand further investigation" 
(p. 305). Baumann (1976) analyzed 7% 
hours of tape of adults in various settings 
for confirmatory tag questions and qualify- 
ing prefatory statements. She found only 20 
examples altogether and no sex-associated 
use. 

Men and women may make requests in 
different ways. Lakoff (1975) observed that 
women state requests and men issue com- 
mands, Hennessee and Nicholson (1972) re- 
ported that in over 1,000 television commer- 
cials, men gave almost 90% of the directives, 
that is, the advice or commands to buy a 

* particular product. In a naturalistic study of 
the conversations of a single married couple, 
Soskin and John (1963) reported that the 
husband gave far more directives than the 
wife. In one critical situation when they 
were rowing and the boat capsized, mainly 
the husband gave regulative statements such 
as demands, suggestions, and prohibitions. 

7 Hirschman (Note 8) tested the hypothesis 
that women are more supportive than men. 
No overall differences were found between 

4. the college men and college women studied, 
although females used ‘mm hmm" signifi- 
cantly more than males and most of these 
utterances occurred in female-to-female con- 
versations, In mock jury deliberations, 

4Strodtbeck and Mann (1956) reported that 
women agreed, concurred, complied, accepted, 
and supported other speakers almost twice 
as much as men did. Similarly, women were 
antagonistic or offensive half as often as men. 

Conversely, men were more assertive. Sup- 
portive behavior can be inferred from the 

emotional sensitivity Alvy (1973) reported 
to be more characteristic of grade-school 
girls than of boys of lower, middle and upper 

Socioeconomic status in an experiment of 
listener-adapted communication. 

In use, then, men’s speech reputedly serves 
to lecture, argue, debate, assert, and com- 


~ mand. Women's speech is stereotyped as non- 
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assertive, tentative, and supportive. Limited 
evidence confirms that males are more asser- 
tive and issue more directives; females are 
often more tentative and supportive. 


Conclusions and Implications 


Do male and female spoken language dif- 
ferences exist? The stereotypes abound, and 
evidence has been accumulating, especially 
since the beginning of this decade. 

Women's speech is said to contain more 
euphemisms, politeness forms, apology, laugh- 
ter, crying, and unfinished sentences. They 
are reputed to talk more about home and 
family and to be more emotional and posi- 
tively evaluative. Further, women’s speech is 
stereotyped as nonassertive, tentative, and 
supportive. Women are also said to talk more 
than men. 

Men, on the other hand, are reputed to 
use more slang, profanity, and obscenity and 
to talk more about sports, money, and busi- 
ness. They are reputed to make more hostile 
judgments and to use language to lecture, 
argue, debate, assert, and command. 

Empirical evidence is less clear, partly be- 
cause studies can only sample limited popu- 
lations in specific situations. Further, sex 
differences in American English are only sta- 
tistical differences. No feature of spoken 
American English is used exclusively by one 
sex or the other. In general, however, em- 
pirical studies of form confirm that males 
use more nonstandard forms than females 
and that females laugh and cry more. Older 
Maine women, at least, are more polite, and 
sixth-grade girls claim they use more ex- 
pressives. Contrary to the stereotype, adult 
men have been found to be more loquacious, 
but it is unclear whether boys or girls are 
more verbose. Studies from the 1920s support 
the stereotype that men talk more about 
money, business, and politics and that women 
talk more about home and family. The em- 
pirical evidence supports the stereotype of 
content differences in men’s and women’s 
speech. Various studies found that women 
use more emotional language and men focus 
more on perceptual attributes and destruc- 
tive action, The males studied were generally 
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more assertive and directive than the women. 
One study found that women are more sup- 
portive than men, and the results of research 
on tentativeness are mixed. 

Are these isolated, unrelated variations in 
Speech, or is there a logical clustering that 
points to “systems of co-occurring, sex-linked 
signals,” or “genderlects,” as Kramer (1974b, 
p- 14) proposed? 

If, in fact, one can say that there is a 
male speech style and a female speech style, 
then rules and restrictions can be written for 
each much in the way that grammatical 
Structures are described. This task is com- 
plicated by two major observations: (a) 
Sex differences in spoken language that have 
been identified in English are sex preferen- 
tial as opposed to sex exclusive (Bodine, 

1975); that is, there is no evidence that any 
linguistic feature is used exclusively by one 
sex in our society; variations have been 
found only in frequency of production. (b) 
Sex is not the only variable to influence 
speech style. There is a complex interaction 
of personal characteristics such as sex, age, 
education, occupation, geographical region, 
ethnic background, and socioeconomic status 
and contextual factors such as communica- 
tion, situation, environment, and participants. 

Despite these complications, a start has 
been made at constructing a grammar of 
style for men's and women's language (Lak- 
off, Note 10). Lakoff focused on women's 
style and suggested that it is basically one 
of deference. She suggested that the various 
phonological and lexical forms and the syn- 
tactic-pragmatic features identified as occur- 
ring more often in women’s speech add up to 
a pattern of deference. However, deference 
alone does not make a woman's style. Other 
characteristics of the individual and the con- 
text combine to form the complete style. 
Lakoff pointed to a need to learn which styles 
can coexist and which cannot. Even more 
important is the need to know which sex- 

associated spoken language features are real 
and to document conditions under which 
they occur. 

Communication can be viewed as a micro- 

cosm of social behavior. Much of human in- 
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teraction occurs at the linguistic level. As. 
Gumperz and Hymes (1972) pointed out, “ 


If sociolinguistic research often begins as an ex- 
tension of linguistics, it must end as an intension 
of the social sciences—but in the idiom of disciplines 
that is only to say that it changes from a way of 
studying language to a way of studying man as a 
social being. (p. 466) 


The stereotypes and evidence discussed in 
this article have significant implications for 
the power structure between the sexes and 
indeed the psyche of both men and women, 
Future researchers need to be sensitive to 
situations in which they observe sex-associ- 
ated speech and to be cautious of making pre- 
mature judgments. In any event, there is 
little doubt that recent interest in gender and 
language will continue to generate worthwhile 
exploration into this topic. Clinicians and 
theoreticians alike will thereby increase theirs 
understanding of this important dimension 
of human communication. 


Reference Notes 


1. Garcia-Zamor, М. A. Child awareness of sex- 
role distinctions in language use. Paper presented 
at the meeting of the Linguistic Society of 
America, San Diego, Calif., December 1973. { 

2. Conklin, N. F. Perspectives on the dialects of © 
women. Paper presented at the meeting of the 
American Dialect Society, 1973. 

3. Cherry, L. J. Sex differences in child speech: 
McCarthy revisited. Princeon, N.J.: Educational 
Testing Service, February 1975. 

4. Eble, C. C. How the Speech of some is more 
equal than others. Paper presented at the meet- 
ing of the Southeastern Conference on Lin- 
guistics, University of North Carolina at Chapel 
Hill, 1972. 

5. Kuykendall, E. Sexism in language. Unpublished 
manuscript, State University of New York Col- 
ne at New Paltz, Department of Philosophy, 
1976. 

6. Unger, R. К. Status, power and gender: An 
examination of parallelisms. Paper presented at 
the Conference on New Directions for Research, 
on Women, Madison, Wis., May-June 1975. 

7. Hirschman, І, Female-male differences in con- 
versational interaction. Paper presented at the 
meeting of the Linguistic Society of America, 
San Diego, Calif, December 1973. 

8. Hirschman, L. Analysis of supportive and as- 
Sertive behavior in conversations. Paper pre- 
sented at the meeting of the Linguistic Society 
of America, July 1974. 


SEX DIFFERENCES IN SPOKEN LANGUAGE 


9. Loban, W. D. The language of elementary school 
children (Report No. 1). Champaign, Ill: Na- 
tional Council of Teachers of English, 1963. 

10. Lakoff, R. Women's styles of speaking: Their 
psychological significance. Paper presented at 
the Conference on Women’s Language, Graduate 
School and University Center of the City Uni- 
versity of New York, April 1977. 


À 


References 


Alvy, K. T, The development of listener adapted 
communications in grade-school children from 
different social-class backgrounds. Genetic Psy- 
chology Monographs, 1973, 87, 33-104. 

Argyle, M., Lalljee, M., & Cook, M. The effects of 
visibility on interaction in a dyad. Human Rela- 
tions, 1968, 21, 3-17. 

Austin, W. M. Some social aspects of paralanguage. 
Canadian Journal of Linguistics, 1965, 11, 31-39. 

Barron, N. Sex-typed language: The production of 
grammatical cases. Acta Sociologica, 1971, 14, 24- 
12, 

"Baumann, M. Two features of “women’s speech?” 
In B. L. Dubois & I. Crouch (Eds), The sociol- 
ogy of the languages of American women. San 
Antonio, Tex.: Trinity University Press, 1976. 

| Bernard, J. The sex game. Englewood Cliffs, N.J.: 
Prentice-Hall, 1972. 

Bloom, L., & Lahey, M. Language development and 
language disorders. New York: Wiley, 1978. 
Bodine, A. Sex differentiation in language. In B. 
Thorne & N. Henley (Eds.), Language and sex: 
Difference and dominance. Rowley, Mass.: New- 

bury House, 1975. 

Brandis, W., & Henderson, D. Social class, language 

^ and communication. London: Routledge & Kegan 

Paul, 1970. 

Brownell, W., & Smith, D. R. Communication pat- 
terns, sex and length of verbalizations in the 
speech of four-year old children. Speech Mono- 
graphs, 1973, 40, 310-316. 

Chesler, P. Women and madness. Garden City, 

^ N.Y.: Doubleday, 1972. 

Coser, R. L. Laughter among colleagues. Psychiatry, 
1960, 23, 81-95. 

Dawe, H. C. An analysis of 200 quarrels of pre- 

A. f children. Child Development, 1934, 5, 139- 

Eakins B. & Eakins, G. Verbal turn-taking and 
exchanges in faculty dialogue. In B. L. Dubois 

,& I. Crouch (Eds), The sociology of the lan- 
guages of American women. San Antonio, Tex.: 
Trinity University Press, 1976. 

Entwisle, D. R, & Garvey, C. Verbal productivity 
and adjective usage. Language and Speech, 1972, 
15, 288-298. 

Farb, P. Word play: What happens when people 
talk. New York: Knopf, 1974. 

Fischer, J. L. Social influences on the choice of a 
linguistic variant. Word, 1958, 14, 47-56. 


‹ 


"қ. 


625 


Garcia, С. N., & Frosch, S. Е. Sex, color and money: 
Who's perceiving what? Or men and women: 
Where did all the differences go (to?)? In B. L. 
Dubois & I. Crouch (Eds.), The sociology of the 
languages of American women. San Antonio, Tex.: 
Trinity University Press, 1976. 

Garvey, C., & Ben Debba M. Effects of age, Sex, 
and partner on children’s dyadic speech. Child 
Development, 1974, 45, 1159-1161. 

Garvey, C., & Dickstein, E. Levels of analysis and 
social class differences in language. Language and 
Speech, 1972, 15, 375-384. 

Gleser, G. C., Gottschalk, L. A., & Watkins, J. The 
relationship of sex and intelligence to choice of 
words: A normative study of verbal behavior. 
Journal of Clinical Psychology, 1959, 15, 182- 
191. 

Gumperz, J. J., & Hymes, D. H. Directions in socio- 
linguistics: The ethnography of communication. 
New York: Holt, Rinehart & Winston, 1972. 

Haas, A. Production of sex-associated features of 
spoken language by four-, eight-, and twelve- 
year old boys and girls (Doctoral dissertation, 
Columbia University, Teachers College, 1977). 
Dissertation Abstracts International, 1978, 39, 23A. 

Hartman, M. A descriptive study of the language 
of men and women born in Maine around 1900 
as it reflects the Lakoff hypotheses in "Language 
and woman's place." In B. L. Dubois & I. Crouch 
(Eds.), The sociology of the languages of Ameri- 
can women. San Antonio, Tex.: Trinity Univer- 
sity Press, 1976. 

Hass, W. A., & Wepman, J. M. Constructional vari- 
ety in the spoken language of school children. 
Journal of Genetic Psychology, 1973, 122, 297- 
308. 

Hennessee, J., & Nicholson, J. NOW says: TV com- 
mercials insult women. New York Times Maga- 
sine, May 28, 1972, pp. 12-13; 48-51. 

Jesperson, O. Language. New York: Macmillan, 
1949. (Originally published, 1922.) 

Joffe, N. F. The vernacular of menstruation. Word, 
1948, 4, 181-186. 

Jong, E. How to save your own life. New York: 
Holt, Rinehart & Winston, 1977. 

Key, M. R. Male/female language. Metuchen, N.J.: 
Scarecrow Press, 1975. 

Klein, J. The family in “traditional” working-class 
England. In M. Anderson (Ed.), Sociology of the 
family. Baltimore, Md.: Penguin Books, 1971. 

Komarovsky, M. Blue-collar marriage. New York: 
Vintage Books, 1967. 

Kramer, C. Folklinguistics. Psychology Today, June 
1974, pp. 82-85. (a) 

Kramer, C. Women’s speech: Separate but unequal? 
Quarterly Journal of Speech, 1974, 60, 14-24. (b) 

Labov, W. The social stratification of English in 
New York City. Washington, D.C.: Center for 
Applied Linguistics, 1966. 

Lakoff, R. Language and woman’s place. Language 
in Society, 1973, 2, 45-80. 


626 


Lakoff, R. Language and woman's place. New York: 
Colophon/Harper & Row, 1975. 

Landis, C. National differences in conversations. 
Journal of Abnormal and Social Psychology, 1927, 
21, 354-357. 

Landi, M. H., & Burtt, H. E. A study of con- 
versations. Journal of Comparative Psychology, 
1924, 4, 81-89, 

Langer, E. The women of the telephone com- 
pany: Part 1. New York Review of Books, March 
12, 1970, pp. 16; 18; 20-24, (a) 

Langer, E. The women of the telephone company: 
Part 2. New York Review of Books, March 26, 
1970, pp. 14; 16-22. (b) 

Levine, L, & Crockett, Н. J. Speech variation in 
a Piedmont community: Post-vocalic r. In S. 
Lieberson (Ed.), Explorations in sociolinguistics. 
The Hague, Netherlands: Mouton, 1966. 

Maccoby, E. E. (Ed.). The development of sex dif- 
ferences. Stanford, Calif: Stanford University 
Press, 1966. 

Moore, H. T. Further data concerning sex differ- 
ences. Journal of Abnormal and Social Psychol- 
ogy, 1922, 4, 81-89. 

Mueller, E. The maintenance of verbal exchanges 
between young children. Child Development, 1972, 
43, 930-938. 

Mulcahy, G. A. Sex differences in patterns of selí- 
disclosure among adolescents: A developmental 
perspective, Journal of Youth and Adolescence, 
1973, 4, 343-356. 

Pei, M. Words in sheep's clothing. New York: Haw- 
thorn Books, 1969. 

Reik, T. Men and women speak different languages. 
Psychoanalysis, 1954, 2, 3-15. 

Ritti, A. Social functions of children’s speech (Doc- 
toral dissertation, Columbia University, Teachers 
College, 1972. Dissertation Abstracts International, 
1973, 34, 2289B. 


ADELAIDE HAAS 


Sause, E. F. Computer content analysis of sex dif- 
ferences in the language of children. Journal o, 
Psycholinguistic Research, 1976, 5, 311-324, 

Shuy, R. W., Wolfram, W. A., & Riley, W. K. Field 
techniques in an urban language study. Washing- 
ton, D.C.: Center for Applied Linguistics, 1968. 

Soskin, W. F., & John, V. P. The study of spon- 
taneous talk, In R. Barker (Ed.), The stream of 
behavior. New York: Appleton-Century-Crofts, 
1963. 

Strodtbeck, F. L., & Mann, R. D. Sex role differ- 
entiation in jury deliberations, Sociometry, 1956, 
19, 3-11. 

Swacker, M. The sex of the speaker as a socio- 
linguistic variable. In B. Thorne & N. Henley 
(Eds.), Language and sex: Difference and domi- 
nance. Rowley, Mass.: Newbury House, 1975. | 

Trudgill, P. Sex, covert prestige and linguistic change 
in the urban British English of Norwich. Lan- 
guage in Society, 1972, 1, 179-195. 

Winitz, H. Language skill of male and female 
kindergarten children. Journal of Speech and 
Hearing Research, 1959, 2, 377-391. 

Wolfram, W. A sociolinguistic description of De- 
troit Negro speech. Washington, D.C.: Center foi 
Applied Linguistics, 1969. 

Wolman, C, & Frank, Н. The solo woman in а 
professional peer group. American Journal of 
Orthopsychiatry, 1975, 45, 164-171. 

Wood, M. The influence of sex and knowledge of 
communication effectiveness оп spontaneous 
speech. Word, 1966, 22, 112-137. 

Zimmerman, D. H., & West, C. Sex roles, inter- 
ruptions and silences in conversation. In B. Thorne 
& N. Henley (Eds.), Language and sex: Differ: 
ence and dominance. Rowley, Mass: Newbury 
House, 1975. 


Received February 3, 1978 m 


ological Bulletin 
55. Vol. 86, No. 3, 627-637 


Habituation Model 


of Systematic Desensitization 


Fraser N. Watts 
j King's College Hospital 
University of London, London, England 


The relevance of habituation as à mod 


lel for response decrement in desensitiza- 


tion is considered. A discussion of the relationship between habituation and 
extinction leads to the view that there are no sound reasons for explaining 
desensitization as an extinction rather than as à habituation phenomenon. The 
maximal habituation theory of desensitization proposed by Lader and Mathews 


is discussed and 


relevant evidence reviewed. Finally, a revised habituation 


theory of desensitization, based on the dual-process theory of habituation, is 


elaborated, and the role in desensitization of relaxation, 
rval lengths are discussed in the context 


stimulus lengths, and interstimulus inte 
laxation and an incremental stimulus 


of this theory. It is suggested that rel 


stimulus intensity; 


hierarchy may reduce sensitization rather than facilitate habituation. 


More than 10 years ago, Lader and Wing 
(1966) proposed a habituation theory of de- 
¥sensitization. The theory was subsequently 
elaborated by Lader and Mathews (1968) 
and has become known as the maximal habit- 
uation theory of desensitization. Though it 
has not won general acceptance, it has stimu- 
lated a significant body of research and has 
increased our knowledge about the processes 
at work in desensitization. It has thus 
at least one test of a useful theory. However, 
+ it is now clear that this version of the habit- 
uation theory of densensitization has a num- 
, ber of weaknesses. It is the purpose of this 
article to review relevant research and to 
elaborate a reformulation of the habituation 
theory of desensitization in the light of the 
dual-process habituation 
„Thompson, 1970) that 
since the maximal habituation theory of de- 
sensitization was proposed. 


4 Desensitization: Habituation or Extinction? 


Habituation can be defined as the waning 


of a response to a stimulus that occurs when 
4 
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the stimulus is repeatedly presented. It is 
usually said to be à central nervous system 
process, which distinguishes it from other 
decremental processes such as fatigue. It is 
also said to apply only to the decrement of 
unconditioned responses. This distinguishes 
it from extinction, which applies to condi- 
tioned responses. Though the habituation of 
a wide variety of responses has been de- 
scribed in the literature, the most extensively 
investigated has been the habituation of the 
orienting response to а repeated auditory 
stimulus. At first sight it may seem surpris- 
ing that a habituation rather than an ex- 
tinction model of desensitization should be 
under discussion (Evans, 1973), and the first 
task of a review of the habituation model 
must be to deal with this question. 

If habituation applies to unconditioned re- 
sponses and extinction to conditioned re- 
sponses, the question of whether desensitiza- 
tion should be termed a habituation or an 
extinction process can be settled by defini- 
tion on the basis of whether the responses 
being modified in desensitization are condi- 
tioned or unconditioned. However, the ap- 
proach taken here is an empirical one, to see 
what empirical differences exist between ha- 
bituation and extinction. and whether the 
decremental processes that operate in de- 
sensitization are more closely analogous to 
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those found in habituation or to those found 
in extinction. 

The most widely canvassed empirical dif- 
ference between the two is that habituation is 
a short-term or temporary change in re- 
sponsiveness, whereas extinction is a long- 
term or permanent one. Van Egeren (1971) 
has even suggested that this should be the 
defining distinction between habituation and 
extinction. However, such a proposal involves 
attaching a new meaning to the term habit- 
uation in view of the fact, acknowledged by 
Van Egeren, that the decrement of the ori- 
enting response on repetition of a stimulus 
can sometimes be relatively long-term. 
Though habituation may often be a rela- 
tively short-term process, it seems that un- 
conditioned responses do not always show 
shorter term decrements than conditioned 
ones. Kimmel (1973), in a discussion of the 
differences between habituation and condi- 
tioning, reached the parallel conclusion that 
“habituation cannot be differentiated from 
conditioning simply on grounds of its tempo- 
rariness” (p. 229). 

It has been suggested by Razran (1971, 
p. 30) that habituation, unlike extinction, is 
not affected by cognition. However, this view 
of habituation seems to be incorrect. The 
rate of habituation is certainly modified by 
instructions that affect the set with which 
subjects approach stimuli (e.g., Pendergrass 
& Kimmel, 1968). 

Another possible difference between the 
processes of habituation and those of ex- 
tinction is that extinction produces a stimulus 
with  response-suppressant properties but 
habituation does not. The crucial test of re- 
sponse suppression (Rescorla, 1969) is the 
summation test. If a stimulus has acquired 
response-suppressant properties as a result 
of habituation or extinction, when it is sub- 
sequently combined with a separate condi- 
tioned stimulus, it should produce a smaller 
response than the conditioned stimulus would 
produce on its own. Reiss and Wagner 
(1972) have shown that by this test habitua- 
tion does not give response-suppressant prop- 
erties to the stimulus concerned. If it were 
well established that extinction does indeed 

give a stimulus such properties, there would 
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be a good reason for regarding habituation 

and extinction as different processes. How- ў 
ever, the evidence on this point (Gray, 1975, 

pp. 105-106; Rescorla, 1969) is weak and 

so fails to establish a critical difference be- 

tween the processes of habituation and ex- 

tinction. 

A number of other similarities and differ- 
ences between habituation and extinction 
have been investigated. It is known that stim- 
ulant drugs retard both processes (Hilgard 
& Marquis, 1940; Lynn, 1966). In addition, 
temporal massing of stimuli normally facili- 
tates both processes (Kling & Stevenson, 
1970). Among differences that have been 
reported, Kling and Stevenson found that ex- 
tinction produced an initial increase in re- 
sponsiveness, though habituation did not. 
However, until such differences are replicated 
and their theoretical significance clarified, 
they can hardly provide an adequate em- 
pirical basis for concluding that different 
processes are involved in habituation and 
extinction. It can thus be concluded that no 
reliable empirical differences have been es- 
tablished between extinction and habituation 
on which the question of which process op- 
erates in desensitization could be settled. 
Thus, at present, it seems reasonable to use 
either habituation or extinction as a model 
for desensitization. 

However, there are some general theoreti- 
cal grounds for suggesting that the habitua- 
tion of orienting responses may prove a par- 
ticularly useful analogue for the decrement 
of anxiety in desensitization. They depend 
on the argument advanced by Gray (1975, 
1976) that both novel stimuli that elicit 
orienting responses and stimuli that have 
been paired with punishing stimuli and arouse 
anxiety activate what Gray described as the 
"behavioural inhibition" system. This sys- 
tem produces an inhibition of ongoing be- 
havior and increases arousal (though it does 
not produce the kind of response-suppressant 
effect measured by the summation test). 
Sedative drugs decrease the level of opera" 
tion of the system, It is also relevant 10 
note that this behavioral inhibition system 
has been found to be particularly active 1n 
neurotic introverts (Nicholson & Gray, 
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| 1972), who are also the personality group of 
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which phobics are largely composed (Marks, 
1969). So if it is correct that aversive stimuli 
and novel stimuli activate the same physio- 
logical-behavioral system, it seems reason- 
able to suggest that the processes of response 
decrement to both kinds of stimuli are simi- 
lar. There is thus some reason for thinking 
that the habituation of orienting responses is 
worth pursuing as an analogue of response 


| decrement in systematic desensitization. 


Maximal Habituation Theory 


The clearest available statement of the 
habituation model of desensitization is the 
maximal habituation hypothesis proposed by 
Lader and his colleagues (Lader & Mathews, 
1968; Lader & Wing, 1966). The basic pos- 
tulates of the model are that the rate of 
observed reduction in the magnitude of the 
galvanic skin response (GSR) to repeated 
presentations of phobic situations is a habit- 
uation process and that this habituation pro- 
cess is maximized by aspects of the proce- 
dure (notably relaxation) that lower central 
arousal (measured in terms of spontaneous 
GSR fluctuations). The model focuses en- 
tirely on the rate of response decrement as 
the dependent variable (measured as the 
slope of the linear decremental trend cor- 
rected for initial response levels). It is as- 
sumed that increasing the rate of response 
decrement in desensitization makes it a more 
effective treatment in the long term. 


Individual Differences 


The analogy between desensitization and 
habituation is supported by data on individ- 
ual differences. Lader, Gelder, & Marks 
(1967) found a significant (.49) correlation 
across subjects between the rate of auditory 
habituation and clinical response to desensi- 
tization. In addition Lader (1967) showed 
that specific phobics exhibited both faster 
auditory habituation and a better clinical 
ems to desensitization than other pho- 
ics. 

These initial reports were followed by sev- 
eral similar studies. The most directly com- 
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parable study is that reported by Gillan and 
Rachman (1974), which failed to find a sig- 
nificant correlation between the same two 
measures. However, the number of subjects 
who received desensitization (16) was too 
small for this result to be given much weight. 
In addition, the correlation may have been 
lowered by the use of an experimental design 
that necessitated the omission of relaxation 
from the desensitization of half these sub- 
jects. Klorman (1974) also found a non- 
significant correlation between the rate of 
GSR decrement in auditory habituation and 
fear change after exposure to films of phobic 
stimuli, though the pattern of exposure of 
these films did not follow standard desensi- 
tization practice. Lang (1970), using a simi- 
lar film exposure procedure in a complicated 
and somewhat incompletely reported experi- 
ment, has provided evidence that supports 
Lader et al/s original finding. The rate of 
GSR habituation to tones of 50 dB and 100 
dB correlated significantly with fear change 
as a result of the standard form of desensi- 
tization that includes relaxation. (The cor- 
relations were apparently low or insignificant 
when relaxation was omitted.) A number of 
significant correlations were also found be- 
tween rate of GSR decrement to auditory 
tones and rate of decrement to phobic stimuli 
(Lang, 1970, p. 153), though there were too 
many insignificant correlations for the results 
as a whole to be convincing. In addition, the 
interpretation of this study is seriously af- 
fected by the fact that the rate of decrement 
to phobic stimuli did not correlate signifi- 
cantly with treatment outcome. This means 
that the relationship between treatment out- 
come and habituation rate to auditory stim- 
uli was not mediated by the rate of anxiety 
decrement in desensitization. Instead of ha- 
pituation rate directly influencing treatment 
outcome, it is more likely that both were a 
product of some other variable such as gen- 
eral arousal and that the specific mechanism 
relating arousal to outcome is something 
other than habituation rate. 

АП the above studies were concerned with 
GSR habituation, though heart-rate decre- 
ment has also been examined. Lang, Mela- 
med and Hart (1970) reported a very high 
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correlation (.91) between heart rate decre- 
ment during desensitization and treatment 
outcome. However, Van Egeren (1970) found 
that the correlation between habituation rate 
to phobic versus neutral stimuli held only for 
skin conductance and not for heart rate and 
other variables. The explanation of such dis- 
crepancies between outcome indices can 
partly be explained in terms of their differ- 
ential sensitivity at different arousal levels. 
As Lader (1975) has pointed out, the GSR 
is more sensitive at lower and heart rate at 
higher levels of arousal. 

So far the support for the habituation 
model from studies of individual differences 
has been weak and inconsistent. However, 


the results should not be taken as counting. 


strongly against the model. The proposition 
under consideration is that there are similar 
response decrement processes at work in 
auditory habituation and systematic desensi- 
tization. It seems to be a general feature of 
Psychology that outcome measures in two 
different response systems can be the result 
of similar processes and that these measures 
correlate only weakly across subjects. Sha- 
piro (1966) has put this forward as a general 
Proposition, giving as one example that there 
can be low correlations across subjects be- 
tween the strengths of different conditioned 
responses (GSR, eyeblink, etc.) › but never- 
theless in both cases the conditioned re- 
Sponses are affected by the same variables, 
such as frequency of reinforcement. In the 
same way, response decrement in auditory 
habituation and systematic desensitization 
could be the result of similar Processes, with- 
out the degree of decrement found in these 
two situations necessarily being highly cor- 
related across individuals, 


Relaxation 


Reciprocal inhibition theory (Wolpe, 1958) 
sees the role of relaxation training in de- 
sensitization as producing a state of mus- 
cular relaxation incompatible with the state 
of anxiety usually elicited by phobic stimuli. 
This state of reciprocal inhibition is turned, 
by reinforcement, into a more Permanent 
state of conditioned inhibition. In contrast, 
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the maximal habituation model (Lader & 
Mathews, 1968) views the role of relaxation 
as lowering central arousal (measured by 
spontaneous GSR fluctuations). This in turn 
increases the rate of response decrement, 

In support of this view, Mathews and 
Gelder (1969) showed that relaxation does 
indeed reduce the frequency of spontaneous 
GSR fluctuations. In addition, Lader and 
Wing (1966) showed that -sedatives had a 
similar effect and also resulted in a faster 
rate of GSR decrement to auditory stimuli, 
It thus seems reasonable to suppose that the 
arousal-lowering effect of relaxation has a 
similar effect on the rate of response decre- 
ment in systematic desensitization. Subse- 
quent evidence has generally supported the 
view that relaxation lowers arousal, though 
relaxation is apparently not a uniquely effec- 
tive procedure for achieving this result. The 
results of experiments that examine its ef-« 
fects therefore depend on what control 
Procedure relaxation is compared with, Re- 
laxation produces lower levels of arousal than 
attention to instructions (Mathews & Gelder, 
1969), than an “eyes-open” control condi- 
tion (Teasdale, 1971), or than looking at a 
neutral slide (Benjamin, Marks, & Huson, 
1972), but is not more effective than proce- 
dures that simply omit Jacobsonian muscular * 
relaxation exercises but that would in other 
respects be equally likely to induce a state 
of lowered arousal (Edelman, 1971; Gross- 
berg, Note 1). 

Support for the assumption that relaxation 
increases the rate of response decrement in 
desensitization has been weaker. There is 
some supporting evidence from Van Egeren 
(1971) and Wolpe and Flood (1970), though 
others (Benjamin et al, 1972; Waters & 
McDonald, 1973; Waters, McDonald, & 
Koresko, 1972) have reported negative re- 
sults. One of the problems in interpreting 
some of these results is that relaxation seems. 
to result in increased responsiveness to the 
initial presentation of a phobic stimulus. For 
example, that Wolpe and Flood (1970) re- 
ported response decrement with relaxation, 
but not without it, seems to be entirely at- 
tributable to the larger initial responses in 
the relaxation condition, There is also some 
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y doubt about whether relaxation produces 
+ faster decrement of subjective anxiety. Wa- 
ters et al. (1972) reported that subjects 
reached the criterion of no subjective anxiety 
in fewer trials with relaxation, but Benjamin 
et al. (1972) found no differences in sub- 
jective anxiety decrement between a relaxed 
and a nonrelaxed condition. 

It is even more doubtful whether relaxa- 
tion facilitates response decrement in audi- 
tory habituation. Lader and Mathews (1968) 
had no relevant evidence at the time they 
published their theory. However, this was 
subsequently investigated by Teasdale (1971) 
in a series of four experiments, using tones 
of 70 dB and 92 dB. He found no effect of 
relaxation on the rate of response decrement. 
In addition, Freeling (1972) reported that 
relaxation did not increase response decre- 
ment to a series of pistol shots. The assump- 
tion that relaxation increases the rate of 
response decrement has thus received only 
very weak support, and it is doubtful whether 
a version of the habituation model that has 
the effect of relaxation as а central assump- 
tion can be maintained. However, an alter- 
native version of the habituation model of 
desensitization is elaborated in the remainder 
of this article, which does not assume a direct 
effect of relaxation on the habituation pro- 
cess. 


Dual-Process Habituation Theory 


So far the term habituation has been used 
to refer to an observable process of response 
decrement, but the recent development of 
dual-process habituation theory (Groves & 
Thompson, 1970; Thompson, Groves, Teyler, 
& Roemer, 1973) makes it necessary to 
clarify this usage. Dual-process theory pro- 
poses that observable response decrement is 
the summation of two inferred processes; 

shabituation and sensitization. Response in- 
crement is equally the summation of the same 
two inferred processes. 

It is not obvious that sensitization is a 
good term for the process of incremental 
change in responsiveness to а repeated un- 
conditioned stimulus that Groves and Thomp- 
son referred to. The term already has a dif- 
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ferent technical meaning in the context of 
conditioning theory (e.g., Kimble, 1961), but 
it seems that even more confusion would be 
caused by adopting a different term in the 
present article. 

Habituation and sensitization are most 
conveniently defined by contrasting their 
characteristics: 

1. Habituation is a purely decremental 
process, whereas sensitization at first grows 
and then decays. 

2. Habituation affects a particular response 
to a particular stimulus, whereas sensitization 
affects general responsiveness. 

3. Habituation is independent of stimulus 
intensity (Thompson et al., 1973), whereas 
sensitization is positively related to stimulus 
intensity. 

4. Though both habituation and sensitiza- 
tion decay spontaneously, sensitization is the 
more transient phenomenon (Davis, 1972). 

5. Repeated series of habituation training 
trials result in progressively more habitua- 
tion but in progressively less sensitization. 

Groves and Thompson’s theory resembles 
the conclusions reached by Hinde (1966) 
about change in responsiveness on the basis 
of his review of the relevant ethological re- 
search. Hinde also found it necessary to 
postulate an incremental process that ac- 
companies the decremental one. There is also 
agreement that the decremental process is 
relatively stimulus specific and that the in- 
cremental process is a generalized one. 

It is clear that response increment can 
result from exposure to a phobic stimulus. 
For example, Miller and Levis (1971) gave 
snake-phobic subjects two avoidance tests 
separated by a 50-minute interval. There 
were four experimental groups who spent 
respectively 0, 15, 30, and 45 minutes of 
this time in visual observation of the snake. 
The no-exposure group showed the least 
avoidance behavior during the second test 
(and significantly less avoidance than the 
15-minute-exposure group). This indicates 
that exposure activated an incremental pro- 
cess. For the three groups that received some 
degree of exposure to the snake between 
tests, there was a tendency for the amount of 
exposure to be negatively associated with 
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avoidance behavior during the second test. 
Stone and Borkovec (1975) obtained similar 
results. The no-snake-exposure and the 45- 
minute-snake-exposure groups showed less 
avoidance behavior and autonomic arousal 
during the postexposure test than did the 
15-minute-exposure group. The greater re- 
sponsivenes of the 15-minute than the no- 
exposure group during the posttest indicates 
the presence of some kind of incremental 
process, whereas that this increased respon- 
Siveness was not found in the 45-minute- 
exposure group is consistent with the decay 
of sensitization over this period. 


Relaxation 


The maximal habituation theory (Lader & 
Mathews, 1968) proposes that the role of 
relaxation in desensitization is to lower 
arousal and that this in turn facilitates the 
rate of habituation. Dual-process theory in- 
vites a different view, namely, that relaxa- 
tion reduces the amount of sensitization that 
takes place. Relaxation may not affect habit- 
uation at all, but its effects on sensitization 
summate with the habituation process to in- 
rease the rate of response decrement. 

As sensitization is a relatively transient 
Process, the facilitatory effect of relaxation 
on response decrement would be only short- 
term, It is relevant in this connection to note 
that Teasdale (1971) found that relaxation 
had no immediate effect on auditory habitua- 
tion, but resulted in less long-term decre- 
ment. Whether relaxation reduces the amount 
of long-term anxiety reduction in desensitiza- 
tion also is not clear. One problem in examin- 
ing this is that an avoidance test given imme- 
diately after treatment would affect per- 
formance during a subsequent avoidance test, 
Subjects who received desensitization with 
relaxation might do better at follow-up sim- 
ply because they had done better at an im- 
mediate posttreatment test and this had had 
an anxiety-reducing effect. It would there- 
fore be necessary to use a design in which 
different subjects were used for testing the 

immediate and the delayed effects of re- 
laxation on response decrement in desensi- 
tization. 
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Stimulus Intensity 


It has been shown that the use of a graded | 
incremental stimulus hierarchy facilitates re- 
sponse decrement to repeated auditory stim- 
uli (Davis & Wagner, 1969; Groves & 
Thompson, 1970), and this has been at- 
tributed, in the context of dual-process the- 
Ory, to the minimization of sensitization 
rather than to the maximization of habitua- 
tion. Dual-process theory also proposes that 
the value of a graded hierarchy in desensi- 
tization, like that of relaxation, is to mini- 


plies, as Davis (1972) pointed out, that the 
beneficial effects of an incremental hierarchy. 
would be relatively transient and might not 
appear on delayed testing. That Krapfl and 
Nawas (1970) found that the use of a graded 
hierarchy in desensitization was no more 
beneficial during testing immediately after M 
treatment than during the follow-up counts 
against this view. However, the superiority 
of a graded hierarchy in the short term is 
more likely to appear with severely handi- 
capped phobics. Кгарӣ and Nawas's experi- 
ment, like many others, used student snake 
phobics. In addition, as with the effects of 
relaxation, the immediate and long-term 
effect of a graded hierarchy should be tested 
in different subjects. Two other experiments 
(Klorman, 1974; Lang, 1970) that also failed 
to find a short-term advantage in an incre- 
mental hierarchy cannot be given much 
weight, as they used only two or three dif- 
ferent intensity levels. Davis and Wagner 
(1969) showed for auditory habituation that 
a hierarchy is only helpful if many small in- 
cremental steps are used. 

It is important in discussing the effects 
of stimulus intensity to make the distinc- 
tion between relative and absolute habitua- 
tion (Davis & Wagner, 1968). Relative ha- 
bituation refers to a situation in which 
habituation is produced and tested with the 
same stimulus. Absolute habituation refers 
to a situation in which habituation to one 
stimulus is tested with another stimulus. The 
distinction is especially important if the test 
stimulus is more intense than that used for 
habituation training. The conditions that are 
most favorable for absolute and relative ha- 
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mize sensitization. If this is correct, it im- 
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bituation are not necessarily the same. In 
particular, Davis and Wagner (1968) showed 
that high-intensity stimuli resulted in more 
absolute but not in more relative habituation 
at subsequent testing. Thus, an incremental 
hierarchy in desensitization might facilitate 
initial response decrement to the stimuli used 
in treatment without increasing the amount 
of response decrement to a separate set of 
test stimuli. The clinical objective of desensi- 
tization is to reduce responsiveness to stimuli 
of varying intensities that occur in the natu- 
ral environment and not just to reduce re- 
| sponsiveness to the stimuli used in desensi- 
tization training. The potential advantage of 
habituation to high-intensity stimuli (i.e., 
flooding) is that it may produce more re- 
sponse decrement when assessed by these 
"absolute" standards. This assumes, of 
course, that the sensitization produced by 

| * flooding can be successfully managed. 
Dualprocess habituation theory is cer- 
tainly better able than maximal habituation 
theory to explain that both desensitization 
and flooding are effective anxiety-reduction 
procedures. It is a serious paradox for maxi- 
mal habituation theory that a treatment that 
uses conditions (prolonged exposure to high- 
4 intensity stimuli) resulting in high levels of 
arousal should achieve response decrement 
at all. The explanation, in terms of dual- 
. process theory, is that whereas desensitization 
minimizes sensitization, flooding elicits it but 
also provides long enough exposures for it 
to decay again. Long sessions are important 
in getting the best results from flooding (e.g., 
«Stern & Marks, 1973). The repeated growth 
and decay of sensitization should reduce the 
extent to which sensitization occurs in the 
future. One implication of this is that flood- 
& ing should produce a more generalized reduc- 
tion in responsiveness than should desensi- 
tization, and consistent with this prediction 
js that Watson and Marks (1971) have 
shown that flooding to relevant and irrelevant 
fears is equally effective in the treatment of 
phobics. On the other hand, it is predicted 
that systematic desensitization (which, be- 
cause it uses low-intensity stimuli, is pre- 
sumably based on habituation rather than on 
^ the decay of sensitization) would be mark- 
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edly more effective if relevant rather than 
irrelevant fears were used. 


Stimulus Lengths 


Desensitization usually employs short (e.g., 
10 sec) presentations of imaginal stimuli, and 
normally presentations are terminated sooner 
if anxiety is reported. Dual-process theory 
sees the role of short stimulus presentations 
as also preventing sensitization. They would 
be especially helpful in doing this if other 
conditions (ie., relaxation and low-intensity 
stimuli) were such that they minimized sen- 
sitization. If relaxation and low-intensity 
stimuli were not used, short presentations of 
stimuli would probably be able to do little 
to prevent sensitization from developing, and 
longer presentations, which allow more time 
for inferred habituation to take place, would 
result in a greater degree of response decre- 
ment. 

There are two experiments that have found 
such an interaction effect between stimulus 
lengths and relaxation. Proctor (1969) found 
that when relaxation was used, desensitiza- 
tion with 5-sec exposures to slides produced 
more change in subjective anxiety during a 
subsequent avoidance test than did 20-sec 
exposures. The reverse obtained if relaxation 
was omitted. Sue (1975) reported a similar 
interaction in comparing 5-sec and 30-sec 
stimulus lengths. In desensitization without 
relaxation, longer stimulus exposures pro- 
duced more change on a behavioral avoidance 
test and on the Fear Survey Schedule (Geer, 
1965), but had no significant effect when 
relaxation was not used. Watts (1971) re- 
ported a similar interaction between stimulus 
intensity and stimulus length. Following a 
suggestion made by Koepke and Pribram 
(1966) in their attempt at resolving some ap- 
parently conflicting findings in the habitua- 
tion literature, Watts predicted and found 
that relatively low-intensity desensitization 
items habituated to zero anxiety more rapidly 
with short (5 sec) than with long (30 sec) 
presentations but that the reverse obtained 
for higher intensity items. 

The advantage of longer presentations is 
best attributed to the allowance of more time 


634 


for habituation to take place rather than to 
the decay of sensitization. The time scale 
(30 sec) is probably too short for the decay 
of sensitization. But in any case, the ad- 
vantage of longer stimulus presentations is 
known to be a relatively long-term one, 
whereas sensitization is relatively transitory 
in its effects. 

Watts (1971) provided evidence for the 
long-term value of longer stimulus presenta- 
tion in desensitization. He found that even 
when reduction to zero anxiety was achieved 
more rapidly with short presentations, more 
long-term desensitization was achieved with 
longer presentations. In the short term the 
combination of. short presentations of low- 
intensity stimuli and relaxation can appar- 
ently prevent the development of sensitiza- 
tion and so appear to facilitate response dec- 
rement, but little long-term advantage results 
from this. Where relaxation or low-intensity 
stimuli are not used, short presentations are 
apparently not sufficient to Prevent the ac- 
cumulation of sensitization, and there is not 
even any short-term advantage in using short 
stimulus presentations. In this case there is 
both an immediate and a long-term advan- 
tage in using longer stimulus presentations. 

Watts (1974) considered in more detail 
the mechanism by which longer stimulus pre- 
sentations result in more long-term habitua- 
tion. He suggested that the amount of long- 
term response decrement depends on the ex- 
tent to which the subject forms a clear model 
of the stimulus (c.f, Sokolov, 1963). In sup- 
port of this hypothesis, Watts was able to 
show that describing desensitization items 
each time they were Presented resulted in 
more long-term anxiety reduction, though it 
did not affect the rate of response decrement. 

It may well be that some of the confusion 

in the literature about whether relaxation 
facilitates desensitization could be resolved 
by taking lengths of stimulus presentation 
into account. Crowder and Thornton (1970) 
have drawn attention to the fact that studies 
such as their own that failed to show that 
relaxation facilitates desensitization tended 
to use relatively long presentations. Simi- 
larly, it is predicted that the facilitatory 
effects of a graded stimulus hierarchy would 
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be easier to demonstrate if short stimuly 
presentations were used. 

The interaction effects that have bi 
found between stimulus length and relaxa- 


ories of desensitization. There have so far 
been no suggestions as to how they could bg 
predicted from reciprocal inhibition theory om 
maximal habituation theory. Equally, cog! 
nitive theories of desensitization (Kazdin & 
Wilcoxon, 1976) would have no basis for 
predicting these effects. They thus provide 
important empirical support for the dual 
Process model of desensitization, with which 
they are in accord. 


Interstimulus Interval Length 


Another example of the differential effec 
of procedural variables on short-term and 
long-term change concerns interstimulus im 
tervals. Though short intervals usually facili- 
tate initial response habituation, they do not 
increase the amount of long-term habituation 
(Askew, 1970; Davis, 1970). This facilita- 
tory effect of short-term intervals can be at- 
tributed to refractory-period effects. Watts 
(1973) found that in desensitization also, 
interstimulus interval lengths have a short | 
term but not a long-term effect. However, | 
the short-term effect is different from that | 
found in auditory habituation. Longer inter- 
vals resulted in faster initial response decre- 
ment to relatively high-intensity desensitiza- 
tion items, though they had no effect with 
low-intensity items. Presumably the longer 
intervals prevented the accumulation of the 
Sensitization that can occur with high-in- 
tensity stimuli, but this preventative effect | 
would not be expected to have any long-term | 
benefit. 


Long-Term Anxiety Reduction My 


One of the central weaknesses of the maxi- 
mal habituation model, as developed by 
Lader and Mathews ( 1968), is its failure to 
take account of the long-term effects of pro- 
cedural variables. In particular, as has been 
argued, Lader and Mathews were probably 5 
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incorrect to assume that aspects of the de- 
sensitization procedure, such as relaxation 
and an incremental stimulus hierarchy, that 
appear to increase the initial rate of anxiety 
decrement in the short term are also of 
long-term benefit. In addition, short stimulus 
lengths and short interstimulus intervals can 
facilitate anxiety reduction in the short term, 
but can be of no long-term benefit. In the 
context of the dual-process theory elaborated 
here, it has been suggested that variables 
that facilitate short-term but not long-term 
decrement operate on sensitization rather 
than on habituation. But whether or not this 
is correct, it is now clear that theories of 
desensitization that do not give explicit con- 
sideration to both immediate and long-term 
effects cannot be adequate. There is a need 
for a great deal more research on the delayed 
effects of desensitization sessions. The few in- 
vestigations published so far (e.g, Agras, 
1965; Rachman, 1966) leave many questions 
unanswered. It is also necessary to become 
more precise about what is meant by long- 
term effects. Testing at delays of 1 hour, 24 
hours, and 1 week may produce quite differ- 
ent results, though it would be impossible to 
tease out such differences from the currently 


_ available research. In the meantime, the dual- 


process theory developed here has the unique 
distinction of making specific predictions 
about immediate and long-term effects. 


Conclusion 


It is probably too soon to make a final 
judgment on the adequacy of the habituation 
model of desensitization, though it can at 
least be claimed that it is a viable alterna- 
tive to the reciprocal inhibition theory. More- 
over, it has served the useful function of 
generating fresh problems and hypotheses for 
investigation. In particular, it has tended to 
generate more detailed research on the pat- 
tern of anxiety reduction than have other 
theories, In this way it has resulted in some 
significant advances in our understanding of 
the processes that operate in systematic de- 
sensitization. 

Some data have already accumulated (e£; 
the interaction effects between stimulus length 
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and relaxation/stimulus intensity) that can 
be predicted from the dual-process theory, 
but that appear to pose problems for any 
other current theory. The need for the future 
is to make the dual-process theory increas- 
ingly precise so that specific predictions from 
the theory can be tested and, if necessary, 
refuted. This article has tried to make a con- 
tribution to this theoretical task. 


Reference Note 


1. Grossberg, J. M. The physiological effectiveness 
of brief training in differential muscle relaxation 
(Tech, Rep. 9). La Jolla, Calif: Western Be- 
havioral Sciences, 1965. 
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The “File Drawer Problem” and Tolerance for Null Results 


Robert Rosenthal 
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For any given research area, one cannot tell how many studies have been con- 
ducted but never reported. The extreme view of the “file drawer problem” is 
that journals are filled with the 5% of the studies that show Type I errors, 
while the file drawers are filled with the 95% of the studies that show non- 
significant results. Quantitative procedures for computing the tolerance for filed 
and future null results are reported and illustrated, and the implications are 


discussed. 


Both behavioral researchers and statisti- 
cians have long suspected that the studies 
published in the behavioral sciences are a 
biased sample of the studies that are actually 
carried out (Bakan, 1967; McNemar, 1960; 
Smart, 1964; Sterling, 1959). The extreme 
view of this problem, the "file drawer prob- 
lem," is that the journals are filled with the 
5% of the studies that show Type I errors, 
while the file drawers back at the lab are 
filled with the 95% of the studies that show 
nonsignificant (eg. 5.05) results. 

In the past there was very little one could 
do to assess the net effect of studies, tucked 
away in file drawers, that did not make the 
magic .05 level (Rosenthal & Gaito, 1963, 
1964). Now, however, although no definitive 
solution to the problem is available, one can 
establish reasonable boundaries on the prob- 
lem and estimate the degree of damage to 
any research conclusion that could be done 
by the file drawer problem. 

This advance in our ability to cope with 
the file drawer is an outgrowth of the in- 
creasing interest of behavioral scientists in 
summarizing bodies of research literature Sys- 
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tematically and quantitatively, both with re- 
spect to significance levels (Rosenthal, 1969, 
1976, 1978) and with respect to effect-size 
estimation (Hall, 1978; Rosenthal, 1969, 
1976; Rosenthal & Rosnow, 1975; Smith & 
Glass, 1977; Glass, Note 1). One hopes that 
this interest in summarizing entire research, 
domains will lead to an improvement in book- 
keeping so that eventually all results will be 
recorded both with an estimate of effect size 
(e.g, r or d; Cohen, 1977) and with the 
level of significance obtained, or more prac- 
tically, with the standard normal deviate (Z) 
that corresponds to the obtained p (Rosen- 
thal 1978). Future appraisals of research 
domains of the type found in Psychological 
Bulletin should give estimates of overall 
effect sizes and significance levels; these esti- 
mates of overall significance can provide a 
basis for coping with the file drawer problem. 


Tolerance for Future Null Results 
Given any systematic quantitative review | 
of the literature bearing on a particular hy- 


+Standard normal deviates (Z) can be found by 
various methods, of which the following three are 
Most often useful: (a) Obtain the exact f asso- | 
ciated with the test statistic (e, t, F, or x?) ang | 
find the Z associated with that p in tables of the 
normal distribution; (b) if the effect size r or phi 
is given or can be computed, Z can be estimated by 
T(N)i; (c) if the effect size d is given or can p 
computed, Z can be estimated by [d*/(d°+4)] 
(ү). 


pothesis, for example, that psychotherapy is 
effective (Glass, Note 1), that women are 
| more sensitive than men to nonverbal cues 
(Hall, 1978), or that one person's expecta- 
tion for another person's behavior can come 
to serve as self-fulfilling prophecy (Rosen- 
thal, 1969, 1976), it is easy to calculate an 
| overall probability, based on all the inde- 
pendent studies available to the reviewer, 
that the effect in question is “real,” that is, 
not a Type I error (Rosenthal, 1978). The 
‘fundamental idea in coping with the file 
drawer problem is simply to calculate the 
number of studies averaging null results that 
must be in the file drawers before the overall 
probability of а Type I error is brought to 
any desired level of significance, say, p= 
.05. This number of filed studies, or the 
tolerance for future null results, is then 
evaluated for whether such a tolerance level 
¿is small enough to threaten the overall con- 
clusion drawn by the reviewer. If the overall 
level of significance of the research review 
will be brought down to the level of just sig- 
nificant by the addition of just a few more 
null results, the finding is not resistant to 
the file drawer threat. 


8 omputation 


Perhaps the simplest, most useful way of 
computing the overall р of a set of research 
studies is the method of adding Zs (Cochran, 
1954; Mosteller & Bush, 1954; Rosenthal, 
1978). This method requires only that one 
add the standard normal deviates of Zs asso- 

усіаќей with the ps obtained and divide by 
the square root of the number of studies be- 
ing combined. The result is itself a Z that 
can be entered in a table to find the asso- 
ciated overall р: 


Z. = УЕ = МЕДА, (1) 


Where Z, is the new combined Z, & is the 
number of studies combined, and 2, is the 
mean Z obtained for the & studies. 

To find the number (X) of new, filed, or 
unretrieved studies averaging null results re- 
quired to bring the new overall № to any de- 
sired level, say, just significant at р = 05 
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(Z = 1.645), one simply writes: 
1.645 = RZ, /Nk + X. (2) 


Rearrangement shows, then, that 
X = (k/2.706)[k(Z;)? — 2.706]. (3) 


An alternative formula that may be more 
convenient when the sum of the Zs (22) is 
given rather than the mean Z is as follows: 
X — [(3Z)? / 2.706] — &. One method based 
on counting rather than adding Zs may 
be easier to compute and can be employed 
when exact р levels are not available; but it 
is probably less powerful. If X is the number 
of new studies required to bring the overall ? 
to .50 (not to .05), s is the number of sum- 
marized studies significant at р < .05, and n 
is the number of summarized studies not sig- 
nificant at .05, then X = 19s — n. Another 
conservative alternative when exact р levels 
are not available is to set Z = .00 for any 
nonsignificant result and to set 2 = 1.645 
for any result significant at p < .05. 

Equations 1, 2, and 3 all assume that 
each of the & studies is independent of all 
other Ё — 1 studies, at least in the sense of 
employing different sampling units. There 
are other senses of independence, however; 
for example, one can think of two or more 
studies conducted in a given laboratory as 
less independent than two or more studies 
conducted in different laboratories. Such non- 
independence can be assessed by intraclass 
correlations. Whether nonindependence of this 
type serves to increase Type I or Type II 
errors appears to depend in part on the rela- 
tive magnitude of the Zs obtained from the 
studies that are correlated or too similar. If 
the correlated Zs are, on the average, as high 
(or higher) as the grand mean Z corrected 
for nonindependence, the combined Z one 
computes by treating all studies as inde- 
pendent will be too large. If the correlated 
Zs are, on the average, clearly low relative 
to the grand mean Z corrected for noninde- 
pendence, the combined Z one computes by 
treating all studies as independent will tend 
to be too small. 
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Illustration 


In 1969, 94 experiments examining the 
effects of interpersonal self-fulfilling prophe- 
cies were summarized (Rosenthal, 1969). 
The mean Z of these studies was 1.014, k 
was 94, and Z, for the studies combined was 
9.83 = 94(1.014) /(94)*. 

How many new, filed, ог unretrieved stud- 
les (X) would be required to bring this very 
large Z down to a barely significant level (Z 
— 1.645)? By Equation 3, 


X = (94/2.706) [94(1.014)2 
— 2.706] = 3,263. 


One finds that 3,263 studies averaging null 
results (Z = .00) must be crammed into file 
drawers before one would conclude that the 
overall results were due to sampling bias in 
the studies summarized by the reviewer. In 
a more recent summary of the same area of 
research (Rosenthal, 1976), the mean Z of 
311 studies was 1.180, & was 311, and X was 
49,457! Thus, nearly 50,000 unreported stud- 
ies averaging a null result would have to 
exist somewhere before the overall results 
could reasonably be ascribed to sampling 
bias. 


Discussion 


There is both a sobering and a cheering 
lesson to be learned from careful study of 
Equation 3. The sobering lesson is that 
small numbers of studies that are not very 
significant, even when their combined фр is 
significant, may well be misleading in that 
only a few studies filed away could change 
the combined significant result to a nonsig- 
nificant one. Thus, 15 studies averaging a Z 
of .50 have a combined of .026; but if there 
were only 6 studies tucked away showing 
a mean Z of .00, the tolerance level for 
null results would be exceeded, and the sig- 
nificant result would become nonsignificant 
(i.e., р> .05). Or if there were 2 studies 
averaging a Z of 2.00, the combined p 
would be about .002; but uncovering 4 new 
studies averaging a 2 of .00 would bring 2 
into the not significant region. 
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The cheering lesson is that when the num 
ber of studies available grows large or the 
mean directional Z grows large, the file 
drawer hypothesis as a plausible rival hy- 
pothesis can be safely ruled out. If 300 
studies are found to average a Z of +1.00 
it would take 32,960 studies to bring the 
new combined p to a nonsignificant level; 
that many file drawers full is simply too im. 
probable. | 

At the present time no firm guidelines can 
be given as to what constitutes an unlikely 
number of unretrieved or unpublished studies, 
For some areas of research 100 or even 500 
unpublished and unretrieved studies may be 
a plausible state of affairs, whereas for others 
even 10 or 20 seems unlikely. Probably any 
rough and ready guide should be based partly 
on & so that as more studies are known it 
becomes more plausible that other studies in 
that area may be in those file drawers. Pers 
haps one could regard as resistant to the 
file drawer problem any combined results 
for which the tolerance level (X) reaches 54 
+ 10. This seems a conservative but reason- 
able tolerance level; the 5% portion suggests 
that it is unlikely that the file drawers have 
more than five times as many studies as the 
reviewer, and the 10 sets the minimum num- 
ber of studies that could be filed away at 15, 
(when k = 1). 

It appears that more and more reviewers | 
of research literature are estimating average 
effect sizes and combined ps of the studies 
they summarize, It would be very helpful to | 
readers if for each combined p they pre 
sented, reviewers also gave the tolerance for 
future null results associated with their over 
all significance level. 


Reference Note 


1. Glass, G. V. Primary, secondary, and meta-anal- | 
ysis of research. Paper presented at the meeting 
of the American Educational Research Asso- | 
ciation, San Francisco, April 1976. M 
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Paradoxical Tranquilizing and 
Emotion-Reducing Effects of Nicotine 


David G. Gilbert 
Florida State University 


Investigations of the effects of nicotine on emotion and on indices of central 
and autonomic nervous system arousal are reviewed. Mechanisms that may 
account for the paradoxical finding that nicotine increases autonomic nervous 
system end-organ arousal yet has frequently been reported to reduce self-report 
and behavioral indices of emotion are evaluated. A number of mechanisms have 
been proposed, but none are backed by a convincing network of supportive data. 
The literature suggests that the mechanism(s) by which nicotine reduces indices 
of emotion is influenced by a wide variety of variables, including behavioral 
activity level, central nervous system arousal level, type of emotion, time since 
the administration of the drug, and the rate and dose of its administration. 
There is the need to demonstrate a paradigm that reliably causes nicotine-induced 
reductions of emotion. Once reliability has been established, further studies can 


One of the most paradoxical and intriguing 
facts in the field of emotions is that smoking 
and nicotine cause significant increases in 
physiological arousal, especially of autonomic 
end organs, yet frequently reduce behavioral 
and self-report measures of emotion and cause 
* people to report feelings of increased tran- 
quility (Frith, 1971; Ikard, Green & Horn, 
1969: Ikard & Tompkins, 1973). This para- 
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vary the paradigm's potentially influential variables. 


dox is underscored by the fact that a num- 
ber of major popular theories of emotion 
(Lange & James, 1922; Mandler, 1975; 
Schachter & Singer, 1962) view increased 
autonomic arousal as an essential compo- 
nent of emotional processes. Possibly because 
of the relevance of this paradox to current 
theories of emotion and smoking, there have 
been an increasing number of studies that 
pertain to the resolution of these apparently 
contradictory findings. 

Although some investigators have sum- 
marized various parts of the literature rele- 
vant to nicotine's emotion-decreasing and 
physiological - arousal - increasing properties, 
there has been no extensive review of studies 
selected for their relevance to the clarifica- 
tion of the physiological and psychological 
mechanisms that underlie these apparently 
contradictory effects. This review represents 
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an attempt to come closer to a determination 
of what these underlying mechanisms might 
be. First, studies demonstrating the paradox 
are reviewed. Then, mechanisms that have 
been proposed to explain the paradox are re- 
viewed and evaluated in terms of their sup- 
porting evidence. 


Paradox: Increased Autonomic Nervous 
System Activation With Decreased 
Emotion 


Physiological Effects of Nicotine 


The paradox exists only to the degree that 
nicotine produces simultaneous increases in 
physiological arousal and decreases in emo- 
tional experience or behavior. An examina- 
tion of the effects of nicotine on physiological 
arousal and emotion is complicated by the 
lack of single definitive indices of either 
emotion or physiological arousal, concepts 
that are not homogeneous entities but instead 
are categories or constructs that represent a 
wide variety of different processes. There- 
fore, a variety of different indices of each 
of these categories are reviewed. 

Effects on autonomic nervous system end 
organs. The smoking of one or two cigar- 
ettes or the administration of an equivalent 
amount of nicotine by other means typically 
causes significant sympathomimetic symp- 
toms, the most notable of which include an 
increase in resting heart rate of from 5 to 
40 beats per minute (Domino, 1973; Hill & 
Wynder, 1974; Roth, McDonald, & Sheard, 
1945), increased blood pressure (5 to 20 
mm of Hg), increased serum levels of epi- 
nephrine and  adrenalcortical compounds 
(Herxheimer, Griffiths, Hamilton, & Wake- 
field, 1967; Hill & Wynder, 1974; Jarvik, 
1970), and significant vasoconstriction (Herx- 
heimer et al., 1967; Simon & Iglauer, 1967). 

Almost all of the studies of the effects of 
nicotine on autonomic nervous system 
(ANS) arousal have maintained subjects in 
quiescent, nonemotional states prior to and 
after the administration of nicotine. Hence, 
there is some question about the extrapola- 
tion of these studies to subjects in active 
states, and evidence is quite meager in re- 
lation to this question. Two studies do sup- 
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port the notion that quiescent-state research 
is broadly applicable. These two studies 
found that high-nicotine cigarettes produced 
significantly greater increases in heart rate 
than low-nicotine cigarettes in moderately 
arousing emotional settings (Gilbert, 1978; 
Nesbitt, 1973). However, a study by Erwin. 
(1971) found that heart rate did not in- 
crease after the onset of smoking in subjects 
who moved about, performing their daily 
routines. 

Mechanisms by which nicotine induces 
these sympathetic symptoms have been re- 
ported to include direct effects oí nicotine 
on the particular end organs and indirect 
effects brought about by  nicotine-induced 
increases of serum epinephrine, which, in 
turn, stimulates these end organs (Jarvik, 
1970; Meyers, Jawetz, & Goldfien, 1974). 

Effects on central nervous system activity 
and structures. Contrary to the hypothesis 
that tobacco and nicotine produce central 
nervous system (CNS) tranquilizing effects, 
studies have typically shown that smoking- 
sized doses of nicotine (.005 to .1 mg per 
kg of body weight) produce electroencephal- 
ogram (EEG) activity characteristic of CNS 
arousal (alpha desynchronization, increased 
dominant alpha frequency, and reduction in 
total energy and variability of cortical EEG 
activity). The EEG arousal occurs immedi- 
ately after the intravenous administration of 
nicotine and by the time the individual 
completes smoking a tobacco cigarette. Sev- 
eral studies using subhuman mammals have 
reported this initial phase of EEG arousal 
to be followed by a phase of EEG tranquil- 
ization that occurs about 5-15 minutes after 
the intravenous administration of the drug 
and lasts up to 30 minutes (Bhattacharya & 
Goldstein, 1970; Domino, 1967; Goldstein, 
Beck, & Mundschenk, 1967). However, no 
such secondary phase of EEG tranquilization 
has been shown to occur in humans. Philips 
(1971) reported that the smoking of a single 
cigarette resulted in a significant increase in 
EEG arousal (mean alpha power reduction) 
for up to 20 minutes. Similarly, Knott and 
Venables (1977) found that smoking two 
cigarettes resulted in EEG arousal (increased 
dominant alpha frequency) for the duration 
of their recording period (15 minutes). Nu- 
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merous other studies have also shown that 
smoking and intravenous and subcutaneous 
administration of nicotine induce EEG 
arousal in humans (Bickford, 1960; Hauser, 
Schwartz, Roth, & Bickford, 1958; Mur- 
phree, Pfeiffer, & Price, 1967; Wechsler, 
1958) and in animals (Domino, 1973). How- 
ever, one series of studies (Armitage, Hall, 
& Morrison, 1968) showed that the effects 
of small amounts of nicotine depended criti- 
cally on dose and rate of administration 
when EEG and acetylcholine measures of 
cortical arousal were used in rats and cats. 
Furthermore, at some given doses and rates, 
it appeared that cortical arousal could be 
either increased or decreased, the effect vary- 
ing from cat to cat. 

On the other side of the coin, studies of 
smoking abstinence effects have consistently 
shown EEG sedation or depression effects 
following smoking deprivation (Itil, Ulett, 
Hsu, Klingenberg, & Ulett, 1971; Knott & 
Venables, 1977; Murphree & Schultz, 1968; 
Ulett & Itil, 1969). Deprivation also causes 
restlessness and dysphoria (Shiffman & Jar- 
vik, 1976; Ulett & Itil, 1969). This simul- 
taneous occurrence of EEG tranquilization 
and subjective dysphoria calls into question 
the validity of classical EEG measures as 
indices of emotional arousal. Furthermore, 
the occurrence of restlessness concurrently 
with EEG tranquilization calls into question 
the use of these EEG measures as indices of 
general arousal or activation level. 

The EEG-activating effects of smoking 
have been interpreted in a number of studies 

' (Bickford, 1960; Hauser et al, 1958; 
Wechsler, 1958) as resulting from the be- 
havioral act of smoking rather than from the 
physiological effect of nicotine or other phar- 
macological substances present in tobacco. 
Studies systematically varying the nicotine 
content of the cigarette smoked and studies 
using intravenous administration of nicotine 
in humans are urgently needed. However, 
the animal studies mentioned above, which 
used smoking-sized intravenous doses of 
nicotine, have shown that nicotine induces 
classical EEG signs of arousal, a finding that 
strongly supports the view that nicotine is at 
least partially responsible for the EEG-arous- 
“ing properties of smoking. 
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Less traditional, but possibly more sophis- 
ticated, measures of CNS arousal present a 
more complex picture of the effects of nico- 
tine. Brown (1967) reported that intrave- 
nous nicotine produced a state of mixed 
arousal and sedation in cats. After nicotine 
administration, these cats demonstrated a 
behavioral state of drowsiness combined with 
a substained potential for alertness. Nicotine 
also produced an unusual partitioning of 
EEG activity. Consistent with arousal, there 
was a net decrease in energy content when 
all EEG frequencies were considered, but 
there was a sedative effect (increased energy 
content) among lower frequency ranges. Nic- 
otine also led to a persistence of hippocampal 
theta frequency, which is characteristic of 
orienting responses. Finally, nicotine resulted 
in evoked responses in the hippocampus sim- 
ilar to those obtained during sleep. Vazquez 
and Toman (1967) also found that nicotine 
produced mixed and hard-to-explain effects 
on directly evoked brain potentials. In rab- 
bits, they found that nicotine caused transi- 
ent depression of the most prominent fast 
component but enhancement of the most 
prominent slow, negative component. 

In studies with humans, the effects of 
cigarette smoking on cortical, average visu- 
ally evoked potential and contingent nega- 
tive variation indices of cortical arousal sug- 
gest that nicotine influences cortical pro- 
cesses in a complex manner that can be 
either arousing or sedating, depending on the 
nature of the stimulus and the personality 
characteristics of the individual. Smoking 
has produced an arousing (increased re- 
sponse amplitude) effect on average visually 
evoked potentials elicited by low-intensity 
stimuli and a sedative effect on potentials 
elicited by high-intensity stimuli (R. A. Hall, 
Rappaport, Hopkins, & Griffin, 1973). On the 
other hand, Ashton, Millman, Telford, and 
Thompson (1974) reported that smoking 
produced cortical arousal (increased соп- 
tingent negative variation magnitude) in 
extraverts, but had the opposite effect (seda- 
tion) in introverts. This finding of the ef- 
fects’ dependence on personality is consistent 
with the findings of Brown (1967) and Ar- 
mitage et al. (1968), which showed that 
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the effects of nicotine in cats and rats can 
either arouse or depress EEG, cortical evoked 
potentials, and cortical acetylcholine produc- 
tion, depending on the characteristic be- 
havior of the individual animal. 

In conclusion, nicotine typically causes 
increases in the more traditional and gross 
measures of general CNS arousal when the 
prenicotine arousal state of the organism is 
low. However, more detailed and sophisti- 
cated measures of arousal show that the 
effects of nicotine on CNS activity are mul- 
tiple and depend on a variety of parameters. 
The picture is further complicated by re- 
ports of the onset of cortical sedation some 
5 or 10 minutes after the onset of arousal 
in cats and rabbits, but not in humans, and 
by findings of differential effects of smoking 
and nicotine with different doses and be- 
havioral predispositions. These conclusions 
indicate that future research should control 
for and systematically investigate the effects 
of these variables. Also, the findings of mixed 
arousal and sedation and of insignificant cor- 
relations between measures of CNS arousal 
indicate a need for simultaneous measure- 
ment of a variety of indices in different parts 
of the brain. Furthermore, this discordance 
of measures suggests that the concept of 
cortical or CNS arousal is inadequate to 
handle the data at hand. The inadequacy 
of the arousal construct is also attested to 
by the findings showing that smoking de- 
privation causes the simultaneous occurrence 
of EEG tranquilization and subjective rest- 
lessness-dysphoria. 

These inadequacies and the fact that the 
effects of nicotine on the CNS are highly 
complex and dependent on numerous pa- 
rameters suggest that a multiple interacting 
systems model is preferable to a unidimen- 
sional arousal model when one studies the 
effects of nicotine on brain activity and 
emotions. Such a systems approach suggests 
that consideration should be given to whether 
we are asking the right questions of the 
right parts of the CNS. 

This systems approach also stresses the 
importance of the mutual influence of vari- 
ous bodily systems on each other. In this 
line, it should be noted that alterations in 
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CNS activity brought about by nicotine. 
have usually implicitly been assumed to be 
the direct result of nicotine's effects on CNS 
mechanisms and structures. This assumption 
may, however, be in error, since the primary 
sites of nicotine's action may result from at 
least two other sources. First, Bickford 
(1960) has reviewed a number oí changes 
in peripheral bodily states (e.g., respiration 
rate, blood carbon dioxide and sugar levels, 
and drug-induced alterations of peripheral 
organ activity) that have been shown to in- 
fluence EEG activity. 

The important role that peripheral bodily 
and psychological states have in influencing 
EEG activity makes it clear that a major 
problem with studies of the effects of nico- 
tine on CNS arousal is that nearly all the 
studies have maintained their subjects in a 
low-arousal, quiescent baseline state before 
the administration of nicotine, The absence 
of studies whose subjects were in emotional 
or other high-arousal conditions before the 
administration of the drug makes the rele- 
vance of these studies to the suggested tran- 
quilizing properties of nicotine very ques- 
tionable. Future studies that systematically 
vary prenicotine arousal level are urgently 
needed. A related difficulty is the problem of 
generalization from restricted laboratory 
states of low stimulus input and low ac- 
tivity level to more natural and emotional | 
states. 

Effects of nicotine on skeletal muscle ac- 
tivity. A number of findings are consistent 
with the possibility that some, of nicotine's 
tranquilizing effects may result from the 
tendency of nicotine to reduce muscular ten- 
sion and responses to emotional stimuli. A 
rapid, short-lived reduction in muscular tem 
sion in spastic patients was observed after 
each subject smoked a single cigarette (Web- 
ster, 1964). Also, a depression of the pa- 
tellar tendon reflex (knee jerk) and asso- 
ciated musculature has been demonstrated 
in humans (Domino, 1973) and other mam- 
mals (see review by Silvette, Hoff, Larson, 
& Haag, 1962) following the administration 
of small amounts of nicotine. Furthermore, 
diminution of aversive-stimulation-induced 
jaw contractions and associated muscle po- 
tentials has been reported in human and 
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monkey subjects following smoking and the 
intravenous and oral administration of nico- 
tine (Hutchinson & Emley, 1973). It seems 
reasonable to assume that reductions of 
muscular activity in response to emotional 
stimuli are subjectively experienced as emo- 
tion reducing or tension relieving; thus, these 
effects of nicotine on muscular activity are 
consistent with reports of nicotine’s tran- 
quilizing properties. Somewhat inconsistent 
with this hypothesis, the flexor reflex has 
been shown to be relatively unaffected by as 
well as enhanced by nicotine (Silvette et al., 
1962). Nonetheless, overall, the evidence 
suggests that potentially significant reduc- 
tions of various muscular activity or re- 
sponses result when nicotine is administered. 

Nicotine may reduce muscular activity and 
tension by causing CNS emotional centers 
to reduce emotional motoric output, or its 
action may be more peripheral, such as in 
the spinal cord, the neuromuscular junction, 
or the muscles themselves. Domino (1973) 
has cited evidence that suggests that the 
mechanisms of the reduction are complex 
and involve both central and peripheral com- 
ponents. Obviously, there is a need for fur- 
ther studies of the effects of nicotine on 
stress and emotion-induced muscular ac- 
tivity. 

Summary. 'The physiological effects of 
smoking are essentially identical to those 
produced by the administration of equiva- 
lent doses of nicotine by other means. The 
predominant effect of nicotine on peripheral 
bodily structures is that of activation of end 
organs of the ANS in a sympathomimetic 
manner. It is true that the effects of nicotine 
on skeletal muscle action and on CNS ac- 
tivities are more complex and less well un- 
derstood and established. However, the es- 
tablishment of the paradox only requires that 
nicotine be shown to increase ANS arousal 
while simultaneously decreasing indices of 
emotion. The demonstration of the latter of 
these two requirements is now addressed. 


Tranquilizing and Emotion-Reducing 
Effects of Nicotine 


Self-reports. A majority of smokers re- 
Port that they smoke to reduce negative 
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affect or to achieve pleasurable relaxation 
(Ikard et al, 1969; Ikard & Tompkins, 
1973). In a national sample of over 2,000 
smokers, Ikard et al. found that 80% scored 
high on the factor of items indicating that 
they always or usually smoked for pleasur- 
able relaxation. However, approximately one 
quarter of this large sample also reported 
smoking at times for the purpose of stimula- 
tion. Frith (1971) asked 59 men and 39 
women to imagine themselves in 12 high- 
arousal situations and 10 low-arousal situa- 
tions and to indicate on a 7-point scale what 
their desire for a cigarette would be. Smokers 
reporting smoking a large number of cigar- 
ettes per day indicated a high desire to 
smoke in both the high- and low-arousal 
situations, Less heavy smokers, however, 
could be broken into two groups, those who 
reported a great desire to smoke in low- 
arousal situations and those who reported 
a great desire to smoke in high-arousal situ- 
ations. More men indicated a strong desire 
to smoke in the low-arousal circumstances, 
whereas more women reported a strong desire 
to smoke in the high-arousal situations. 
Experimental studies have also shown that 
smoking and other means of administering 
nicotine reduce reports of experienced emo- 
tion. In the earliest of these studies, John- 
ston (1942) published a series of observa- 
tions of the effects of nicotine on smokers 
and nonsmokers. Nonsmokers given 1.3-mg 
hypodermic injections of nicotine reported 
that an unpleasant light-headedness or muz- 
ziness appeared about 5 minutes after the 
injection. Smokers experienced the same sen- 
sations, but described them as pleasant. One 
patient receiving 4.3 mg of nicotine in water 
orally three times per day stated that it 
“steadied” her more than did phenobarbital. 
Heimstra (1973) described a series of better 
designed and controlled experiments in which 
groups of smokers were or were not allowed 
to smoke. If nicotine reduces emotions, with- 
holding nicotine should increase emotions 
relative to not withholding nicotine. Smoking 
reduced the reported amount of mood change 
that occurred from before to after a series 
of tasks (6 hours of driving stimulation and 
3 hours of pursuit rotary tracking, target 
detection, and reaction time tasks) and from 


F^ 


648 


before to after a stressful movie. Smoking 
consistently reduced the mood factors of ag- 
gression and anxiety, as measured by the 
Mood Adjective Check List (Nowlis, 1965). 

In a related series of studies, Franken- 
haeuser and Myrsten and their associates 
have shown that in smoking-deprived (up to 
15 or more hours) subjects, mental efficiency 
and subjective well-being are increased in 
a smoking situation as compared to a non- 
smoking situation (Frankenhaeuser, Myr- 
sten, & Post, 1970; Frankenhaeuser, Myr- 
sten, Waszak, Neri, & Post, 1968; Myrsten, 
Post, Frankenhaeuser, & Johansson, 1972). 
In one of these studies (Myrsten et al., 
1972), for example, smoking during reaction 
time tasks led to decreased self-estimates of 
irritation and boredom, as compared to the 
nonsmoking condition. 

In an experiment by Ague (1973), sub- 
jects suspended smoking for 8 hours, rated 
their moods, and then smoked one of four 
cigarettes of varying nicotine content. An 
hour after smoking subjects again rated their 
moods. Subjects rated themselves as feeling 
significantly more pleasant and relaxed an 
hour after smoking high-nicotine cigarettes 
than they felt an hour after smoking low- 
or no-nicotine ones, Nicotine dose level did 
not result in statistically significant differ- 
ences on other Mood Adjective Check List 
factors (Aggression, Anxiety, Surgency, Con- 
centration, Fatigue, Vigor, and Sadness), but 
there was a nonsignificant tendency for nico- 
tine to reduce anxiety and aggression. 

In summary, it can be said that smoking 
has consistently led, in the published studies 
to date, to improved mood states in de- 
prived smokers; but the question as to 
whether this improvement is caused by ef- 
fects of nicotine or by the oral and manipu- 
lative activities involved in smoking has not 
been addressed by any of the studies other 
than that of Ague (1973). The probability 
that oral and manipulative activity is at 
least partially responsible for the calming 
effects of smoking seems very high in light 
of Freeman's (1948) work, which has shown 
that oral activity reduces anxiety measured 
by self-report. It is essential, therefore, that 
future studies control for the variety of 

smoking-associated behaviors by varying the 
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nicotine content of the cigarettes smoked, 
The one reported study that did vary the 
nicotine content of cigarettes smoked (Ague, 
1973) found that only two of the nine mea- 
sured emotions (moods) were reduced or 
improved, and it is questionable whether 
these two self-report measures (pleasantness 
and inner tension) represented true or com- 


plete emotions. Even more desirable would | 


be studies administering nicotine to humans 
by means other than smoking (e.g., intra- 
venous, oral, or subcutaneous administra- 
tion). Such means of manipulating nicotine 
level would control for taste differences 
among cigarettes of different nicotine con- 
tents and could more definitively demon- 
strate the emotion-reducing properties of the 
drug. 

There is also a need for future studies that 
present emotional stimuli while varying nico- 
tine level, Since most studies to date have 
not presented emotion-eliciting stimuli, they 
have in fact dealt with moods rather than 
with emotions. Finally, the question of 
whether nicotine also produces tranquiliza- 
tion in nonsmokers needs investigation. 

Behavioral measures. Only two studies 
have investigated the effects of nicotine on 
behavioral measures of emotion in humans. 
The first of these investigations is that of 
Nesbitt (1973), who found that habitual 
smokers behaved less emotionally (were will- 
ing to endure stronger intensities of elec- 
trical shock) when smoking than when simu- 
lating smoking. He also found that smokers 
of a high-nicotine cigarette behaved less 
emotionally than smokers of a low-nicotine 
cigarette. He interpreted this study as sug- 
gesting that nicotine can reduce emotional 
behavior; however, the results are open to 
other plausible interpretations. First, Nes- 
bitt noted that an alternative explanation is 
the possibility that smokers of the high-nico- 


tine cigarette were satisfied with the ciga- | 


rette they were allowed to smoke, whereas 
smokers in the no-cigarette condition were 
initially told they would be allowed to smoke 
but later were informed that they would not 
be allowed to smoke after all, so that the 
no-cigarette condition may have led to feel- 
ings of frustration and anger with an asso- 
ciated decrease in desire to please the ex- 


- 
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perimenter by enduring the shocks. Similarly, 
low-nicotine cigarettes generally are consid- 
ered to be less satisfying than cigarettes of 
normal nicotine content, and the former, 
therefore, may have produced feelings of 
frustration or dissatisfaction similar to the 
feelings of the no-nicotine subjects, although 
not as strong. A second alternative inter- 
pretation is related to the finding that nico- 
tine increases detection thresholds for elec- 
trical shock (Mendenhall, 1925; Wenusch & 

Schéller, 1936). It may be, therefore, that 
nicotine decreases the perception of shock 
and thus accounts for the increased tolerance 
of shock. Thus, Nesbitt’s results are con- 
sistent with the hypothesis that nicotine re- 
duces emotional behavior, but they also are 
in line with alternative explanations. 

In the second study, Schechter and Rand 
(1974) found that habitual smokers deprived 
of smoking were 37% more aggressive on 
a Buss (1961) aggression machine task than 
were subjects who were allowed to smoke. 
Unfortunately, the deprived smokers did not 
have the opportunity to engage in the same 
oral and manipulative behaviors as did smok- 
ers, As a result of this confound, no conclu- 
sion as to the role of nicotine can be reached. 

In summary, too few studies have been 
teported and too many methodological prob- 
lems are evident in the studies that are avail- 
able to permit a definite statement about the 
tole of nicotine in altering emotional be- 
havior in humans. It can, however, be said 
that the studies reported to date suggest 
that nicotine may be able to reduce anxiety 
and aggression in habitual smokers, as de- 
termined by behavioral measures. 

_Emotion in animals. The leap from the 
discussion of emotions in man to emotions in 
subhuman animals is, of course, a question- 
able Proposition, since there is no assurance 
that the “anger” or “anxiety” that a rat, 
for example, experiences corresponds in a 


1 Significant manner to their counterparts in 


humans, Nonetheless, the effects of nicotine 
9n behavioral measures of emotion in sub- 
human species have received increased at- 
tention during the last decade, and some 
Strong trends appear to be developing in 
this area, First, animal studies have con- 
Sistently shown that smoking-dose levels of 
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nicotine reduce a variety of measures of 
aggression. For example, Silverman (1971) 
found a consistent reduction of aggression 
in both albino and hooded rats following 
small subcutaneous doses of nicotine. Sex- 
ual, investigatory, and submissive behaviors 
were also reduced very slightly, but general 
activity level was not significantly influenced. 
Consistent with these findings, Hutchinson 
and Emley (1973) reviewed a series of ex- 
periments that they and their co-workers 
have completed in recent years showing that 
the acute oral administration of small doses 
of nicotine (.04-.80 mg/kg) and the chronic 
oral administration of even smaller doses 
(as small as .002 mg/kg) reduced postshock 
biting in monkeys while simultaneously in- 
creasing preshock anticipatory motor be- 
haviors. These two simultaneous effects were 
also produced in their studies by two tran- 
quilizers (chlorpromazine and chlordiazepox- 
ide) and have been said to be characteristic 
of tranquilizer-type compounds (Emley & 
Hutchinson, 1972). These researchers also 
found that withholding nicotine after chronic 
administration caused increases in their mea- 
sures of aggression. The findings of less well- 
designed studies by Schechter (1974) and 
Kostowski (1968) are consistent with this 
general picture, in that they also showed 
nicotine-produced decrements of aggression 
in rats and ants, respectively. 

Indices of fear and anxiety in animals 
have typically been reduced by nicotine, but 
these effects have been less consistent than 
the effects of nicotine on aggression. A num- 
ber of researchers (Bovet, Bovet-Nitti, & 
Oliverio, 1966; Essman & Essman, 1971; 
Fleming & Broadhurst, 1975; Hutchinson & 
Emley, 1973; Morrison & Stephenson, 1972) 
have reported that nicotine reduces measures 
of anxiety (immobility, conditioned suppres- 
sion, exploratory behavior) in a variety of 
animals. On the other hand, Davis, Kensler, 
and Dews (1973) and Driscoll and his co- 
workers (Driscoll, 1976; Driscoll & Ваша, 
1970; Driscoll & Вашр, 1974) have re- 
ported nicotine-produced increases in indices 
of anxiety-fear (punishment-induced sup- 
pression of operant behavior and avoidance 
behavior). 

Studies of the effects of nicotine on cor- 
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relates and indices of anxiety-fear in sub- 
human species are difficult to interpret not 
only because of apparently contradictory re- 
sults but also because nicotine has a strong 
tendency to increase all forms of operant be- 
havior, those motivated by a fear compo- 
nent and those that are not. Consistent with 
the emotion-reduction hypothesis, Hutchin- 
son and Emley (1973) found what might be 
considered one of the best correlates of anxi- 
ety—conditioned suppression of positively 
reinforced behavior—to be reduced by low 
and intermediate doses of nicotine (.1—4 
mg/kg) administered subcutaneously in mon- 
keys and rats. Also consistent with the sug- 
gestion that nicotine reduces anxiety are the 
findings of studies that have shown increased 
mobility and exploratory behavior in threat- 
ening environments, that is, in shock avoid- 
ance conditions (Fleming & Broadhurst, 
1975; Morrison & Stephenson, 1972). These 
studies showing a reduction of the suppres- 
sion of operant behavior, however, cannot 
be cited as strong support indicative of nico- 
tine-induced reduction of anxiety-fear, since 
nicotine also increases a wide variety of 
operant behaviors not related to negative 
emotions. 

The findings that have been interpreted as 
suggestive of  nicotine-induced increases, 
rather than decreases, of fear and anxiety 
involve, for the most part, avoidance para- 
digms. Small and moderate doses (.5-.30 
mg/kg) of nicotine, like caffeine and am- 
phetamine, increase bar pressing in avoid- 
ance tasks (Balfour & Morrison, 1975; 
Davis et al, 1973). This increased rate of 
avoidance behavior is characteristically pro- 
duced by stimulant drugs and does not nes- 
essarily indicate increased fear or anxiety, 
as evidenced by the fact that nicotine and 
stimulant drugs also increase operant be- 
havior not related to negative emotions. 

Studies testing the effects of nicotine on 
punishment-induced suppression of positively 
reinforced behavior in one case (Morrison, 
1969) showed that nicotine and ampheta- 
mine reduced the punished behavior to below 
control values (which would be consistent 
with nicotine’s having  anxiety-increasing 
properties), whereas a tranquilizer (chlor- 
diazepoxide) increased responding to non- 
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punished levels. However, a more recent 
study (Morrison & Stephenson, 1972) showed 
that both amphetamine and nicotine slightly 
increased behavior that had simultaneous re- 
warding and aversive properties. In this 
latter case, these two drugs appear to have 
an antianxiety effect. | 

In summary, a review of the effects of | 
nicotine on behavioral measures of emotion 
in subhuman species suggests that the drug 
reduces a variety of aggressive Берата 
but its effects on other forms of emotional 
behavior are still open to question. In view 
of the mixed and hard-to-interpret effects | 
of nicotine on behavioral indices of anxiety- | 
fear, future research should investigate the | 
importance of the variety of variables sug. | 
gested earlier in this review in determining | 
whether nicotine has stimulant or sedative | 
properties. These variables include individual 
and strain differences, dose, paradigm type, 3 
rate of nicotine administration, intensity of | 
emotion-inducing stimuli,  preexperimental 
arousal level, and type of emotion. | 

Summary of the emotion-reducing effects 
of nicotine. The studies reviewed suggest 
that, at least in some situations, nicotine has 
emotion-reducing properties іп man and ani- | 
mals and that nicotine deprivation in chronic 
users causes relative increases of emotion. All i 
12 of the reported studies with humans have 
been consistent with the hypothesis that nic- 
otine has this effect. Furthermore, nicotine 
reduced indices of aggression in animals in 
all 7 studies available for review. And with 
the exception of avoidance measures, nicotine 
also has reduced correlates of anxiety-fear i 
in animals (even though these studies are; 
open to alternative explanations). 

There is moderately strong evidence for 
the existence of the paradox: Nicotine sig- 
nificantly increases arousal of essentially all 
aspects of the ANS, frequently increases in- 
dices of CNS arousal, and at the same time 
frequently results in reduced negative "i 
tional feelings and behavior and in increased 
feelings of tranquility and pleasure. The | 
question as to whether nicotine has tran- - 
quilizing properties in humans who are not 
chronic users of the drug has not been an- - 
swered. The studies involving human sub- - 


jects have a number of methodological flaws; | 
K 
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put are consistent with the animal literature 
in suggesting that nicotine reduces indices of 
emotion. 


Mechanisms Proposed to Explain 
the Paradox 


It is important to note that the different 
mechanisms that have been proposed as ex- 
planations of nicotine’s probable emotion- 
reducing properties are not necessarily mu- 
tually exclusive. Most of the mechanisms 
deal with only one level of analysis, so that 
different mechanisms may account for only 
part of the picture in a manner similar to 
the “blind men and the elephant” analogy. 
In addition, it should be kept in mind that 
since nicotine produces a large number of 
direct and indirect physiological changes, the 
different proposed mechanisms concern them- 
selves with the primary sites of nicotine's 
action, this primary action then leads to im- 
mensely complex chains of further action 
involving hormonal, neural, behavioral, and 
experiential changes. 

Addiction, withdrawal, and related pro- 
cesses cannot be considered adequate expla- 
nations of the paradox without specifically 
noting by what mechanism nicotine decreases 
emotion while simultaneously increasing 
arousal, Inconsistent with the view that the 
paradox is a manifestation of an addictive 
process, each dose of nicotine produces tran- 
sient increases of arousal relative to chronic 
arousal levels (Ague, 1973; Gilbert, 1978). 
Also, the paradox has been shown to appear 


vin individuals with no prior experience with 


drug (Hutchinson & Emley, 1973). How- 


ever, the opponent-process theory of addic- 


dons and emotions (Solomon & Corbit, 
1973) is helpful in that from it one can 
infer that nicotine may reduce emotion by 
one mechanism after its chronic administra- 
tion and by another mechanism in individ- 
uals without extensive prior use. Further- 
More, the opponent-process theory of addic- 
tions leads one to speculate that individuals 
may adapt to the heightened arousal levels 
brought about by nicotine and that depriv- 
ing the addict results in a decrease of 
arousal that may be subjectively experienced 
as aversive and thus cause the subject to 
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respond with more emotion. Consistent with 
this hypothesis, Heimstra (1973) has re- 
ported data showing that deprived smokers 
report simultaneous decreases in arousal (in- 
creased fatigue) and increases in aggression 
and anxiety. 

This review is limited to the effects of 
nicotine, Therefore, it does not consider oral, 
manipulative, attentional, attributional, and 
other psychological and behavioral factors 
associated with smoking independent of nico- 
tine, even though these factors may be very 
important in producing the satisfying effects 
of smoking. 


CNS Mechanisms 


Neurophysiological model. Miller (1973) 
pointed to a finding in his laboratory that 
showed that chemomicrostimulation-induced 
activation of muscarinic neurons in certain 
portions of the cat brain elicited aggression. 
He suggested that if the effects of muscarinic 
and nicotinic central cholinergic neural sys- 
tems are antagonistic like those of alpha and 
beta adrenergic systems, then nicotine’s calm- 
ing effect would be explained, since nicotine 
stimulates the nicotinic neural system and 
this system tends to inhibit the aggression- 
producing muscarinic system. Consistent with 
this hypothesis, Silverman (1971) has noted 
that the behavioral effects of nicotine are 
similar to those of benactyzine, which antag- 
onizes the muscarinic effects of acetylcholine. 
However, this study provides only indirect 
evidence, and the literature does not appear 
to provide more direct tests of the proposed 
nicotinic-muscarinic antagonism. Thus, the 
status of the hypothesis must be considered 
speculative and in need of systematic study. 

This proposed inhibition of muscarinic 
aggression circuits by nicotine may prove to 
account for nicotine's ability to reduce cer- 
tain forms of aggression, but it does not ac- 
count for the drug's apparent ability to les- 
sen emotions other than aggression, It is not 
clear whether nicotine reduces different emo- 
tions by altering emotion-specific neurophysi- 
ological pathways or by influencing a path- 
way or process common to all emotions. If 
the former were true, the effects of nicotine 
on different emotions would be expected to 
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vary more widely from emotion to emotion 
than if the latter were true. 

A review of the literature (Avis, 1974) 
makes it abundantly evident that the role 
of neurotransmitter systems in the elicitation 
and inhibition of emotion is extremely com- 
plex. This regulation of emotions is a func- 
tion of relationships between numerous trans- 
mitter systems rather than endogenous ac- 
tivity in one isolated system. This and other 
reviews also demonstrate that different emo- 
tions and different forms of the same emotion 
are mediated in the CNS by partially inde- 
pendent neurophysiological systems (Avis, 
1974; Gittelman-Klein, & Klein, 1971; Hei- 
ser & DeFrancisco, 1976). 

The suggestion that nicotine inhibits mus- 
carinic aggression circuits is consistent with 
Silverman's (1971) findings showing nico- 
tine's main effect on social behavior in rats 
to be a reduction of aggression. It also con- 
forms to the fact that in the nicotine studies 
to date, aggression is the one emotion that 
has most consistently been reduced. This 
model does not, however, explain why it is 
that many smokers report feeling tranqui- 
lized by smoking in spite of smoking's ANS- 
arousal-producing effects. 

It can be inferred from Miller's (1973) 
neurophysiological model that nicotine may 
reduce both behavioral and subjective indices 
of emotion by a direct neurophysiological 
inhibition of the impulse to behave in an 
emotional manner. It has been suggested 
(Arnold, 1960) that the impulse to respond 
to an emotion-producing stimulus is of 
greater importance in the subjective experi- 
ence of emotion than is the perception of 
ANS arousal. 

Cortical sedation model. Recently, Ey- 
senck (1973) proposed that the effects of 
nicotine depend on the degree of arousal in 
the cerebral cortex: When arousal is high, 
the effects on the cortex are depressant; 
when arousal is low, the effects are stimulat- 
ing. During the highly arousing states that 
accompany most emotions, cortical arousal 
is high (Lindsley, 1970); hence, the effects 
of nicotine on the cortex during emotional 
states are depressant (ie., tranquilizing). 
Decreased cortical arousal is assumed to 
cause decreased emotion. This model can 
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account for nicotine’s tranquilizing ргорег- 
ties if one assumes that CNS sedation results 
in decreased CNS response to and perception 
of ANS-arousal-related input. 

The strongest support Eysenck offered for 
his hypothesis that nicotine has bidirectional 
effects on arousal came from Armitage and 
his co-workers (Armitage et al, 1968; Ar- 
mitage, Hall, & Sellers, 1969). These re- 
searchers found that the effects of nicotine 
administered to cats and rats at a frequency 
and dosage per body weight corresponding to 
human smoking resulted in cortical (EEG 
and acetylcholine level) and behavioral 
arousal (lever-pressing rate) in some ani- 
mals of each species and in tranquilization 
in others. 

The studies mentioned earlier in this re- 
view that showed that the effects oí smoking 
and nicotine on cortical activity can be either 
arousing or depressing provide indirect evi- 
dence for Eysenck's hypothesis. Also con- 
sistent with this bidirectional view is the 
finding that nicotine tends to increase sen- 
sory detection thresholds when CNS arousal 
is high, but decreases them when CNS 
arousal is low (Mendenhall, 1925). 

Finally, a behavioral study of Ashton and 
Watson (1970) offers support for Eysenck's 
bidirectional model. These researchers found 
that smokers smoked more when under- 
aroused (resting) and when highly aroused 
(high-stress situation) than they did when 
only slightly aroused, which suggests that 
underaroused subjects smoked for stimula- 
tion and overaroused subjects for tranquil- 
ization. 

Eysenck did not speculate as to what ! 
physiological mechanism might underlie the 
suggested bidirectional effects of nicotine on 
cortical arousal. He did, however, note that 
this hypothesis is consistent with the conclu- 
sion of Rachman (1969) that arousal is an | 
inverted-U-shaped function of the arousal- 
producing conditions (Eysenck, 1973, p. 
136). 

There is very little research that relates 
directly to Eysenck’s hypothesis. A test of 
this proposal would be to expose subjects to 
an emotion-producing situation and, after. 
cortical arousal was high, to administer nico- 


EFFECTS OF NICOTINE 


tine while continuing to monitor CNS indices 
of arousal. 

Interacting CNS mechanisms. A possi- 
bility not discussed by Eysenck is that there 
are two brain arousal systems that interact, 
so that arousal in one system tends to in- 
hibit arousal in the other system. Routten- 
berg (1968) has made a strong argument 
for the likelihood of the existence of two 
such arousal systems (limbic and reticular 
activating) and has noted evidence that 
arousal of either one of these systems leads 
to cortical arousal but that the simultaneous 
arousal of both of them leads to cortical 
tranquility. Consistent with this hypothesis, 
Barnes (1966) showed that either of two 
stimulants (eserine or amphetamine) caused 
EEG arousal (fast waves), but surprisingly, 
the administration of both reinstated tran- 
quilization (slow wave EEG activity). If 
emotional arousal is based primarily on the 
activation of one of these two systems, the 
tranquilizing properties of nicotine may be 
a result of the inhibition of this system by 
nicotine’s arousing of the other system. A 
replication of Barnes’s study with nicotine, 
as well as eserine and amphetamine, would 
be a test of this hypothesis. 

Related to the idea of mutually inhibitory 
arousal systems are the findings of Fried- 
man, Horvath, and Meares (1974), which 
showed that tobacco (but not nontobacco) 
cigarette smokihg significantly increased the 
rate of habituation of EEG alpha desyn- 
chronization to a series of 90-dB sound 
pulses, These researchers proposed that since 
theoretical models suggests that habituation 
depends on active inhibitory and excitatory 
mechanisms that oppose each other (Soko- 
lov, 1960), it is possible that nicotine causes 
a dislocation of the usual relationship, so 
that inhibitory mechanisms are more fully 
activated without a simultaneous reduction 
in CNS excitation. This model is consistent 
with the earlier discussed findings that 
showed that nicotine increases some measures 
of cortical arousal while reducing others. It 
is also consistent with the EEG and self- 
report data, reviewed earlier, that suggest 
that nicotine increases the state of mental 
alertness while simultaneously producing 4 
state of emotional tranquility. This model 
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suggests that nicotine causes an individual 
first to become aware of, but soon to pay 
little attention to, emotion-related stimuli, 
thoughts, and ANS end-organ activity. 

Studies that simultaneously monitor CNS 
indices of arousal and responses to external 
emotional input would be relevant to this 
hypothesis. A modified replication of the 
study by Friedman et al., in which nicotine 
would be administered intravenously or by 
other nonsmoking means, would provide a 
significant amount of credibility to this hy- 
pothesis. 

Another model has been suggested by 
Goldstein et al. (1967). It is based on their 
studies of the effects of nicotine on simul- 
taneous quantitative measures of cortical 
electrical activity in the rabbit brain. They 
interpreted their results as indicating that 
nicotine causes an impressive diminution of 
mutual involvement (EEG covariance) be- 
tween the cortex and the hippocampus, and 
between the cortex and the reticular forma- 
tion. These findings, they suggested, may 
imply a decrease in the activity of inhibitory 
mechanisms operating between cortical and 
subcortical sites. Thus, for instance, nico- 
tine may relax excessively rigid control by 
the neocortex of subcortical structures (and 
vice versa). This model can explain the 
tranquilizing effects of nicotine if one as- 
sumes that either subcortical input to corti- 
cal structures or phasic deviation from the 
ANS-arousal baseline, rather than the base- 
line itself, determines the intensity of emo- 
tional responses. The nicotine from a single 
cigarette sustains high levels of ANS end- 
organ arousal for up to 30 minutes. On the 
other hand, startle responses and many other 
emotion-related responses that are likely to 
be mediated by interactions between cortical 
and subcortical structures usually operate in 
a much smaller time frame. 

Pleasure-center-stimulation models. Ey- 
senck (1973) has suggested an extension of 
his proposed cortical sedation explanation of 
nicotine’s paradoxical effects. The additional 
proposal is based on the suggested effects 
of nicotine on pleasure and aversion systems 
in the brain. First, granting the very spec- 
ulative nature of the hypothesis, Eysenck 
started by reminding the reader that three 
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different hedonic systems have been de- 
scribed (Berlyne, 1971; Olds & Milner, 
1954; Olds & Olds, 1965). The first two are 
the primary reward and aversion systems; 
the stimulation of either of these leads to 
familiar signs of increased arousal, including 
increased heart rate, high-frequency EEG 
waves, and increased bodily movement; the 
third is the secondary reward system, the 
activation of which results in de-arousal. 
It is suggested that the secondary reward 
system produces rewarding effects indirectly, 
that is, by inhibiting the aversion system, 
which in turn releases the primary reward 
system from inhibition by the aversion sys- 
tem. Eysenck (1973, p. 140) proposed that 
nicotine administered in emotionally arousing 
situations may activate the secondary reward 
system and through it deactivate the aversion 
system. On the other hand, he also suggested 
that in low-arousal situations nicotine may 
activate the primary reward system directly. 
Eysenck offered no direct support for this 
model, but did note that it is open to ex- 
perimental falsification. 
In contrast with Eysenck’s hypothesis, the 
model proposed by Jarvik (1970, 1973) as- 
sumes that nicotine’s emotion-reducing prop- 
erties are a product of its presumed ability 
to stimulate primary rather than secondary 
reward centers. Nicotine, like amphetamine, 
is assumed to release stores of norepinephrine 
and other catecholamines in primary reward 
areas of the brain such as the medial fore- 
brain bundle, the results being an improve- 
ment in mood and a reinforcement of be- 
havior associated with the administration of 
the drug. This interpretation is consistent 
with the catecholamine theory of emotion 
(Schildkraut & Kety, 1967), which assumes 
that catecholamine activity in the brain, 
particularly norepinephrine activity but pos- 
sibly also dopamine and epinephrine activity, 
is responsible for sustaining positive affect. 
The pleasure-center-stimulation model can be 
inferred to assume that a hedonic process is 
an important component of emotion and that 
nicotine’s purported pleasure-increasing prop- 
erties more than compensate for its ANS- 
arousing properties. 
Jarvik’s and, to a lesser extent, Eysenck’s 
hypotheses are not supported by the finding 
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(Schuster, 1970) that amphetamine increased | 
rather than decreased cigarette-smoking fre- || 
quency in habitual smokers. In his discussion 
of these findings, Schuster pointed out that 
it is difficult to understand why smoking in- 
creased, since one assumes that there is at 
least some satiation of the central reward 
mechanisms by amphetamine, leading to a 
diminished need for nicotine intake, 

On the other hand, support for the plea- 
sure-center-stimulation model is given by 
findings that suggest that nicotine releases 
acetylcholine in the brain (Armitage, 1973; 
Essman, 1973; Knapp & Domino, 1962). 
Finally, Armitage (1973) has said that the 
findings of a recent study by G. H. Hall 
and Turner (1972) suggest that nicotine 
does in fact release noradrenaline in the 
specific areas of diencephalic pleasure cen- 
ters, thus providing an important link to Jar- 
vik’s (1970, 1973) hypothesis that nicotine 
produces its effects by stimulating pleasure 
centers in the brain and causing or facilitat- 
ing the release of catecholamines in these 
centers. With cats, intravenously injected 
nicotine (.002 mg/kg every 30 sec) caused 
an increased release of *H-noradrenaline into 
the effluent of the third cerebral ventrical, 
as did the administration of cigarette smoke 
directly into the lungs. 

Also consistent with the pleasure-center- 
stimulation model, but inconsistent with 
arousal formulations of emotion, is that am- 
phetamine, a CNS and ANS stimulant that 
also appears to stimulate pleasure centers 
(Jarvik, 1970, p. 188; Ray, 1972, p. 167), 
causes subjective assessments of relaxation 
that are far in excess of responses relating to 
nervousness (Martin, Sloan, Sapira, & Ja- 
sinski, 1971). Furthermore, it is a well- 
accepted fact that amphetamine, methylphen- 
idate, and other CNS stimulants have para- 
doxical tranquilizing effects on hyperactive 
children (Satterfield, Cantwell, & Satterfield, 
1974) and on some adults who manifest 
overreactivity and other symptoms of the 
hyperactive syndrome (Wood, Reimherr, 
Wender, & Johnson, 1976). Finally, small 
doses of amphetamine have been shown to 
frequently reduce aggression in laboratory 
animals, whereas large doses have been 
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shown to increase it (for a review, see Allen, 
^ Safer, & Covi, 1975). 

In summary, several findings are consis- 
tent with the hypothesis that nicotine's emo- 
tion-reducing properties are at least partially 
a function of its suggested ability to stimu- 
late CNS pleasure centers. The findings of 
G. H. Hall and Turner (1972) offer some 
direct evidence suggesting that nicotine stim- 
ulates pleasure center activity. However, the 
question of whether this increment in ac- 
tivity increases pleasure, whether nicotine in- 
creases such activity during emotional states, 
and whether such activity is related caus- 
ally or only correlationally to reduced emo- 
tions have not been investigated. Ampheta- 
mine appears to stimulate pleasure centers 
and, like nicotine, reduces several indices of 
anxiety and aggression, which suggests that 
there may be a causal relation between 
pleasure and the inhibition of emotions. 
However, a great deal of further investiga- 
tion is needed before an accurate assess- 
ment of the pleasure-center-stimulation 
model can be made. 


Mechanisms Based on Altered Perceptions 
of Peripheral Activity 


One of the three mechanisms based on 
altered perceptions of bodily activity dis- 
cussed below may account for nicotine's re- 
duction of emotions in spite of increased 
ANS activity. The first of these mechanisms 
is based on the findings of Wenusch and 
Schóller (1936) and Mendenhall (1925), 
which showed an increase in the detection 
threshold for electrical shock following the 
smoking of a regular tobacco cigarette but 
not following the smoking of a nicotine-free 
cigarette. These results suggest that such 
smoking-induced increases of sensory thresh- 
olds may decrease subjects’ awareness of 
their autonomic arousal in spite of an actual 
* increase in ANS activity. If emotion is more 
a function of consciously perceived than of 
actual autonomic arousal, then the paradox 
is resolved. This model makes no assump- 
tions as to whether the elevation in threshold 
: is caused by а general cortical sedation simi- 

lar to that suggested by Eysenck (1973) or 
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whether it is a result of some more specific 
process such as that suggested by the neu- 
rophysiological model discussed earlier in 
this review. 

The second of the proposed mechanisms 
based on altered perceptions of bodily ac- 
tivity assumes that nicotine's tranquilizing 
properties are a consequence of muscular- 
action-reducing properties of nicotine. Re- 
search shows that while nicotine increases 
ANS arousal, it reduces reflexive muscular 
activity (patellar and startle responses) in 
humans and primates (Domino, 1973; 
Hutchinson & Emley, 1973) and reduces 
resting muscle tone levels in spastic patients 
(Webster, 1964). Hence, it may be that 
people who report that smoking tranquilizes 
them attend to muscular changes, the tran- 
quilizing effects of which overshadow the 
effects of smoking-induced increases of auto- 
nomic arousal. It appears that the subjec- 
tive experience of emotion is a positive 
function of facial and skeletal muscular ac- 
tivity as well as of ANS arousal. If this 
hypothesis is true, nicotine can be expected 
to reduce emotions with a large muscular 
component more than those with a relatively 
greater autonomic arousal component. 

The final of the three proposed mecha- 
nisms was suggested by Schachter (1973, p. 
152) in a direct attempt on his part to ex- 
plain the paradoxical tranquilizing effects 
of smoking. This mechanism is based on 
the law of initial values, the finding that 
“the magnitude of response is related to 
prestimulation level? in such a manner that 
“high autonomic excitation preceding stim- 
ulation is correlated with low autonomic re- 
activity upon stimulation" (Lacey, 1956, p. 
156). The argument is that since nicotine 
leads to arousal, the additional arousal in- 
duced by an emotional situation is less with 
nicotine than without it. If the intensity of 
an emotion is a positive function of the de- 
viation of autonomic activity from its base- 
line, this explanation is plausible. At the 
present time, however, there appear to be 
no data either strongly in support of or con- 
trary to this ANS deviation model of emo- 


tion. 
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Altered Cue Value of Peripheral 
(Bodily) Activity 


It may be possible to feel physiologically 
aroused and yet feel unemotional and tran- 
quil. Schachter (1973), in the same article 
in which he proposed the law-of-initial-values 
solution to the paradox, noted the possibility. 
of an alternative model. This alternative as- 
sumes that habitual smokers experience their 
body as reacting with the same sympathetic 
symptoms when they smoke as it reacts when 
they experience an emotion; therefore, to 
the degree that they attribute their bodily 
arousal to smoking, they should be less emo- 
tional; that is, in an emotional situation a 
smoker feels emotion only to the degree that 
arousal (both emotional stimulus induced 
and nicotine induced) is attributed by the 
person to an emotional stimulus. To the 
degree that arousal is attributed to smoking 
(or other nonemotional Sources), emotion 
is not felt (i.e. there is tranquilization). Un- 
fortunately, this proposal by Schachter seems 
unlikely to be a general solution to the para- 
dox, since subhuman species with little or 
no experience with nicotine show reductions 
of emotional behavior and it appears that 
there is no reason for such animals to at- 
tribute their arousal to a source other than 
the emotion-producing stimulus. 


Impulse Reduction Model 


It may be that a person can feel autonomic 
arousal and yet feel relaxed, tranquil, and 
unmotivated. This model does not assume, 
as does Schachter's, that misattribution of 
arousal to a nonemotional source is the criti- 
cal step; nor does the model require prior 
experience with nicotine. Instead, it suggests 
that nicotine may reduce the motivation, the 
impulse-to-move component of the emotional 
complex, possibly in a manner similar to 
the unexplained mechanism by which mor- 
phine reduces pain. Ray (1972) has noted 
that many laboratory reports suggest that 
norphine has no effect on pain threshold 
ind that it does not impair conduction in 
'eripheral nerves. Following the administra- 
ion of an analgesic dose of morphine, some 
atients report that they still notice pain 
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but that it is no longer aversive; nor does 
the pain demand attention or an impulse to 
action (Ray, 1972). Similarly, nicotine may | 
not alter or may actually increase the sub- 
jective perception of ANS arousal, yet may 
decrease the attention an individual gives 
these symptoms and may reduce the impulse 
to respond to this bodily activity and to 
external emotion-producing stimuli, 

A reduction of the impulse to respond to 
the emotion-producing stimulus would re- 
sult in lessened behavioral measures of emo- 
tion and, according to Arnold's (1960) well- 
known theory of emotion, would reduce sub- 
jectively experienced emotion as well, since 
Arnold argued that emotion is best conceived 
of as the felt urge toward or away from an 
object perceived as good or bad. Further- 
more, studies showing that nicotine reduces 
emotional muscular reflexes in man and 
monkeys (Domino, 1973; Hutchinson & Em- k 
ley, 1973) are consistent with this impulse 
reduction model, as is Miller’s (1973) neu- 
rophysiological hypothesis of antagonistic 
nicotinic and muscarinic neural systems, 
which was discussed earlier. Finally, the re- 
cent discovery of naturally occurring opiate- 
like substances (endorphins and enkepha- 
lins) in the brain and their reported emo- 
tion-reducing effects (Arehart-Treichel, 1978) 
adds to the plausibility of this mechanism. 


Glucocorticoid-ACTH Model 


It is possible that nicotine's emotion-re- 
ducing effects are mediated by the nicotine- 
induced release of glucocorticoids from the 
cortex of the adrenal gland. Glucocorticoids 
have been reported to reduce a variety of 
indices of emotion (Di Giusto, Cairncross, 
& King, 1971; Endroczi, Lissak, Fekete, & 
DeWied, 1970; Levine, 1971), and nicotine, 
in small, smoking-sized doses, has been 
Shown to cause increased serum levels of 
glucocorticoids (Hill & Wynder, 1974; 
Kershbaum, Pappajohn, & Bellet, 1968). 
These two facts suggest that nicotine-induced 
reductions of emotional reactions are medi- 
ated and caused by nicotine-induced in- 
creases in serum glucocorticoids. Further- 
More, nicotine and glucocorticoids have 


parallel effects on the following indices of '| 
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emotion, sensory detection threshold, and 
higher cognitive functions: 

1. They reduce measures of emotion (Di 
Giusto et al., 1971; Levine, 1971; Schechter 
& Rand, 1974; Silverman, 1971). 

2. They reduce startle responses (Hutch- 
inson & Emley, 1973; Levine, 1971). 

3. They facilitate habituation of the EEG 
alpha-desynchronization orienting response 
(Endroczi et al., 1970; Friedman et al., 
1974). 

4. They increase sensory detection thresh- 
olds (Henkin, 1970; Mendenhall, 1925). 

5. They increase higher integrative func- 
tioning such as memory, learning, discrimi- 
nation, and accuracy (Andersson & Post, 
1974; Geller, Hartmann, & Blum, 1971; 
Henkin, 1970; Levine, 1971; Myrsten et al., 
1972; Nelsen & Goldstein, 1972). 

Indirect support for the glucocorticoid 
model is provided by evidence suggesting 
that nicotine's emotion-reducing and cog- 
nition-facilitating effects occur maximally 15 
to 45 minutes after the administration of 
nicotine (Andersson, 1975; Andersson & Post, 
1974), at the same time at which nicotine- 
induced glucocorticoid blood levels are high- 
est (Hill & Wynder, 1974). 

It should be noted that glucocorticoid 
levels in the blood interact in a complex 
manner with ACTH levels and sex hormones 
and that these hormones have been shown 
to influence indices of emotion and learning 
(Di Giusto et al., 1971; Leshner et al., 1973; 
Levine, 1971). Since ACTH from the pitui- 
tary gland stimulates secretion of glucocor- 
ticoids from the adrenal cortex, it seems 
likely that nicotine increases glucocorticoid 
levels by increasing blood levels of ACTH. 
If nicotine in fact does increase blood levels 
of ACTH, it would be of significance, since 
ACTH has been shown to reduce aggressive- 
ness in mice (Leshner et al., 1973). There- 
fore, it may be more appropriate to con- 
sider a general pituitary, adrenal model of 
the emotion-reducing effects of nicotine. 


Summary and Conclusions 


A number of contemporary theories of 
emotion assume that emotions are largely a 
Positive function of ANS arousal. However, 


657 


a substantial number of studies have shown 
that nicotine increases heart rate, blood 
pressure, and numerous other indices of auto- 
nomic arousal; yet rather than producing 
expected increases of emotional behavior 
and feelings, it usually decreases emotions. 
A more detailed analysis reveals that the 
efiects of nicotine on physiology and emo- 
tions are quite complex. First, nicotine typi- 
cally increases ANS arousal, but the findings 
of one investigation suggest that this increase 
may depend on predrug arousal level. Like- 
wise, nicotine’s effects on classical CNS mea- 
sures of arousal have typically been in the 
direction of increased arousal, while its 
effects on more specific and sophisticated 
CNS indices have been mixed and have de- 
pended on such factors as individual differ- 
ences, rate of nicotine administration, and 
locus of brain activity monitored. One must 
question the relevance of the studies to date 
on the effects of nicotine on CNS activity, 
since they have not used emotionally aroused 
subjects, but have maintained subjects in 
pre- and postnicotine states of low arousal 
and inactivity. There is an obvious need 
for a systematic series of studies that moni- 
tor a variety of CNS, ANS, and muscular 
indices while manipulating and controlling 
personality, time, arousal, emotional, and 
drug parameters. 

In spite of the fact that several indices of 
emotion have consistently been reduced by 
the administration of nicotine, the question 
as to whether all types and intensities of 
emotion are reduced has not been answered. 
It seems likely that nicotine’s emotion-re- 
ducing effects vary from emotion to emotion, 
across various indices of the same emotion, 
and with a variety of other factors. The 
systematic investigation of the effects of nico- 
tine on the whole spectrum of emotional di- 
mensions could add significantly to the un- 
derstanding not only of nicotine’s paradoxi- 
cal effects but also of emotional processes in 
general. 

A number of mechanisms that may ac- 
count for nicotine’s paradoxical tranquilizing 
effects have been proposed, but none are 
backed by a convincing network of suppor- 
tive data. It seems evident that aside from 
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a serendipitous major breakthrough, a great 
number of studies testing a large variety 
of proposed theories will be needed before 
the mechanism or mechanisms underlying 
the paradox are determined. Different pro- 
posed mechanisms are not necessarilp mu- 
tualy exclusive. The impulse reduction 
model, for example, is an attempt to inte- 
grate several low-level mechanisms into a 
higher order or more complete picture of 
the emotion-reducing properties of nicotine. 
This model is consistent with muscular-ten- 
Sion-reduction, pleasure center, and neuro- 
physiological models, but adds a higher level 
of conceptual integration that allows one 
to see that the research on nicotine and 
emotions is consistent with Arnold's (1960) 
impulse theory of emotions, whereas it is 
inconsistent with many other conceptions of 
emotion. 

The literature surveyed in the present re- 
view suggests that the mechanism(s) by 
which nicotine reduces emotion is influenced 
by a wide variety of variables, including be- 
havioral activity level, CNS arousal level, 
personality, type of emotion, time since ad- 
ministration of drug, and rate and dose of 
nicotine administration. Any mechanism that 
is suggested to fully explain nicotine’s tran- 
quilizing properties must account for the in- 
fluence of these variables; but unfortunately, 
of the mechanisms reviewed earlier, only Ey- 
senck’s (1973) proposal attempts to deal 
with their contribution. The most promising 
approach for future research would seem to 
be for studies to systematically manipulate 
and control these influential variables in 
evaluations of a spectrum of suspected lower 
and higher order mechanisms. A major first 
step would be the development of a para- 
digm with specific parameters that reliably 
demonstrates the nicotine-induced reductions 
of emotion. Once this paradigm has been 
established further studies can vary its po- 
tentially influential variables, thus contribut- 
ing to a clearer understanding of the drug’s 
paradoxical effects. 
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Some Logical Pitfalls in Accepting the Null Hypothesis 
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The history of research on the sleeper effect prior to 1978 can be divided into 
five stages: (a) initial discovery of the effect, (b) development of the under- 
lying theory, (c) widespread acceptance of the effect and of the discounting cue 
explanation of it, (d) realization that past operational definitions of the effect 
were not isomorphic with the conceptual definition, and (e) repeated failures to 
demonstrate the effect once operational definitions were employed that corre- 
sponded to the conceptual definition (Gillig & Greenwald, 1974). These failures 
resulted in an invitation to accept the null hypothesis and to "lay the sleeper 
effect to rest.” This article illustrates why it is not justifiable to accept the null 
hypothesis about the sleeper effect. We suggest that provisional acceptance of 
the null hypothesis depends on assuming that all the necessary theoretical, 
countervailing, statistical, and procedural conditions for an adequate test of the 
effect have been demonstrably met. We further suggest that none of the em- 
pirical studies prior to 1978 demonstrably succeeded in meeting these condi- 
tions. However, adequate tests following the guidelines we have described for 
provisionally accepting the null hypothesis have recently been conducted, and 
the effect has been repeatedly found. A deductive model of the logical factors 
that should guide provisional acceptance of the null hypothesis is contrasted 
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with a current model that stresses induction and statistical power analyses. 


Most statistics texts assert that there is 
no formal basis for accepting the null hy- 
pothesis, The rationale for this assumption 
is basically a restatement of the well-known 
position in philosophy that inductive knowl- 
edge is not logically possible. However, prac- 
ticing scientists are often forced to act as 
though the null hypothesis were true even 
when they know that there is no compelling 
epistemological basis for their actions. Given 
the apparent conflict between practice and 
logic, it is important in the analysis of re- 
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search findings to be clear about the criteria | 
that help distinguish between a conclusion 
such as *We can be reasonably certain that 
no meaningful difference exists" and a con- 
clusion such as "Although no statistically 
reliable difference was found, we cannot be 
at all certain whether a meaningful differ- 
ence exists." 

The present article uses the history of 4 
research on the sleeper effect to illustrate 
some pitfalls in accepting the null hypothe- 
sis. It also uses this research to describe and 
justify a model that might help in deciding 
when no-difference findings warrant the ten- 
tative conclusion that no difference exists as 
opposed to the conclusion that no decision 
about differences is warranted. The sleeper 
effect is useful for this illustration, since 4 
recent claim was made that the effect does 
not exist (Gillig & Greenwald, 1974) despite 
25 years of claims that it does. The claim 


- 
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that the effect does not exist was based on 
a particular model for “How to Accept the 
Nul| Hypothesis Gracefully" (Greenwald, 
1975, p. 16). However, since subsequent re- 
search guided by a quite different model 
has led to repeated discoveries of the sleeper 
effect (Gruder, Cook, Hennigan, Flay, Ales- 
sis, & Halamaj, 1978), it may be useful to 
detail the later model here and to contrast 
it with the earlier model. 

To accomplish our ends we first review the 
history of research on the sleeper effect, for 
no detailed review yet exists in the literature. 
Then we outline the requirements for an ade- 
quate test of the null hypothesis, using the 
sleeper effect as an illustration, After this, 
we examine whether the failures to obtain 
sleeper effects prior to 1978 were due to the 
phenomenon's not existing or were due to 
inadequate tests, and we conclude that no 
conclusion on this issue was warranted prior 
to 1978. Next we report the results of two 
studies that repeatedly obtained sleeper ef- 
lects when the necessary conditions for the 
phenomenon were demonstrably met and did 
not obtain them when these conditions were 
not met. Finally, we compare and contrast 
two models for tentatively accepting the null 
hypothesis. 


History of the Sleeper Effect 


In 1949, Hovland, Lumsdaine, and Shef- 
feld reported an experiment that was de- 
signed to evaluate the impact of a World 
War II propaganda film on soldiers’ beliefs." 

he experimental design was a 2 X2 fac- 
torial: Enlisted soldiers did or did not see 
the film, and their message-relevant beliefs 
Were measured 5 days or 9 weeks afterward. 
The authors focused their discussion on a 
Subset of eight items in which belief changed 
little initially and in which the difference 
between the experimental and control groups 
was larger after 9 weeks than it was after 5 
days, This difference of differences was in- 
terpreted as “raising the possibility of a 
Sleeper’ effect” because it seemed that the 
full impact of the film took time to build up. 

The authors proposed four general hy- 
Potheses to explain the effect. The one that 
has been most cited was called the discount- 
ång cue hypothesis. It specified (a) that the 
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army, which sponsored the film, was seen as 
a biased and therefore untrustworthy source 
for war-relevant information; (b) that its 
sponsorship of the film led the soldiers to 
initially discount the filmed message, thereby 
reducing its immediate impact on their be- 
liefs; (c) that as time passed the source of 
the message was forgotten or dissociated 
from the message, thereby removing the 
change-inhibiting force of the untrustworthy 
source; and (d) that once the source was 
no longer linked to the message, soldiers' 
attitudes rose to the residual level of belief 
change caused by the message alone. This 
reasoning suggests a possible explanation of 
why sleeper effects occurred for only a subset 
of the belief items in Hovland et al.'s study: 
The message's initial effectiveness may not 
have totally dissipated over 9 weeks for these 
items, but it may have totally dissipated for 
the other items. 

It should be noted that the term sleeper 
effect can be used in a general sense to de- 
scribe a delayed increase in any dependent 
variable, not just in belief. Even in persua- 
sion research there is no need to restrict the 
term to contexts in which the discounting 
cue hypothesis is invoked as an explanation. 
Indeed, three of the four explanations that 
Hovland et al. (1949) offered of their sleeper 
effect did not even mention a discounting 
cue. Nonetheless, many commentators in so- 
cial psychology seem to use the term sleeper 
effect to refer to both the descriptive phe- 
nomenon (a delayed increase in belief 
change) and a particular explanation (the 
discounting cue hypothesis). We also use 
the term in this traditional way. 


Development of the Theory and Replications 
of the Sleeper Effect 


Three experiments were designed to repli- 
cate the sleeper effect and to directly test 


iIn the social psychological literature on per- 
suasion, the terms attitude and belief have often 
been used interchangeably, although their defini- 
tions have been distinguished. Since most studies 
that we cite measured beliefs, we use this term 
exclusively in this article, although we recognize 
that in certain cases attitude may be the appro- 
priate term. 
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and extend the discounting cue explanation 
of it. Hovland and Weiss (1951) manipu- 
lated whether persuasive messages came from 
sources of high or low credibility, with low 
credibility serving as a discounting cue. They 
claimed that a sleeper effect resulted in the 
low-credibility condition. Since data indi- 
cated that the low-credibility sources had 
probably not been forgotten, they explained 
the effect in terms of the message and source 
being spontaneously dissociated from one 
another with the passage of time. 

Weiss (1953) followed this research by 
having subjects learn a persuasive message 
that was or was not linked to a brief state- 
ment that discounted the message. Weiss 
claimed to have found a sleeper effect with 
this particular discounting cue manipulation, 
although he carefully pointed out that his 
inference was based on there being relatively 
less decay of change when the discounting 
cue was learned than when it was not. He 
did not find an absolute increase in belief 
change in the discounting cue condition. 

Kelman and Hovland (1953) had subjects 
learn a persuasive message from a source 
of high or low credibility, and the source 
was or was not reinstated at the delayed 
testing 2 weeks later. When no reinstatement 
took place, the data pattern resembled that 
of Hovland and Weiss: Belief change ap- 
peared to increase with time in the low- 
credibility condition and to decrease with 
time in the high-credibility condition. But 
when reinstatement took place (reversing any 
dissociation that might have occurred) 
neither the increase in change in low credi- 
bility nor the decrease in change in high 
credibility was obtained. 

Kelman and Hovland reasoned from these 
data that the process of belated dissociation 
of the message and the cue applies both to 
cues that cause initial rejection of a message 
and to cues that enhance a message’s initial 
impact. But while the dissociation of dis- 
counting cues should facilitate sleeper ef- 
fects, the dissociation of message-acceptance 
cues should accelerate the decay of initial 
belief change. Kelman and Hovland coined 
the term dissociative cue hypothesis to refer 
to the common process of first associating 
and then dissociating message-acceptance or 
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message-rejection cues from a message. Seer 
from this perspective, the discounting 
hypothesis is a special case of the more gen- 
eral dissociative cue hypothesis. 


Widespread Acceptance of the Sleeper Effeci 


reported in the literature from 1953 thro 
the early 1970s. As a result of these repeated 
replications, numerous textbooks referre 
both to the validity of the effect and to the 
validity of the discounting cue explanation 
of it. j 

The studies that led to such widespri 
acceptance of the sleeper effect are reviewed 
below. We define a discounting cue as any 
brief item of information that leads a reat 
to summarily reject the conclusion of a per 
suasive message and so inhibits the immedi- 
ate belief change that the message would 
otherwise cause. The predominant manipu- 
lation of a discounting cue has been to h 
the message come from a communicator 
low credibility (Falk, 1970; Gillig & Green 
wald, 1974; Hovland & Weiss, 1951; John 
son, Torcivia, & Poprick, 1968; Johnson 
Watkins, 1971; Kelman, 1958; Kelman &4 
Hovland, 1953; Schulman & Worrall, 1970; 
Watts & McGuire, 1964; Weber, 19725. 
Whittaker & Meade, 1968; Weber, Note 1): 
But other operationalizations included 4 
brief countercommunication (Weiss, 1953), 
qualifying statements that questioned the 
validity of the arguments within the mess 
(Papageorgis, 1963), a countermessage pre 


sider all these experiments as relevant to thé 
sleeper effect, though Wilson and Miller andy 
Papageorgis did not specifically mention the; 
discounting cue hypothesis in their reports 
(Adding these experiments to the геме! 
makes little or no difference to the general 
conclusions.) 

The operational definition of the sleepe 
effect employed in most studies before 
mid-1970s was an interaction of time of tes 
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‘ing (immediate vs. delayed) and whether 


the persuasive message was or was not asso- 
ciated with a discounting cue. For reasons 
that become clear later, we call this opera- 
tionalization a relative sleeper effect. 

The results of 16 experiments that per- 
mitted a test of the relative sleeper effect 
ate presented in columns 2 and 3 of Table 1. 
Of these studies, 11 resulted in at least mar- 
ginally significant relative sleeper effects. 
Thus, the preponderance of the evidence does 
indicate that the relative sleeper effect is a 
reliable phenomenon. 

However, it should be noted that the abso- 
lute values of the slopes of the time trends 
were typically higher in the experimental 
conditions without discounting cues than 
they were in the conditions with them. In 
other words, decay in the nondiscounting 
conditions contributed more to the interac- 
tions than did any delayed increase in the 
discounting conditions. This observation 
raises an important question concerning the 
fit between the operational and conceptual 
definitions of the sleeper effect. 


Table 1 
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Realization That Past Operational 
Definitions Did Not Match the Conceptual 
Definition of a Sleeper Effect 


Careful reading of early accounts suggests 
that the crucial defining attribute of a sleeper 
effect is an absolute increase in belief change 
over time. For instance, Hovland et al. 
(1949) summarized their data by stating, 
“Some of the effects of the film may be 
‘sleepers’ that do not occur immediately but 
require a lapse of time before the full effect 
is evident” (p. 188). McGuire (1969) de- 
scribed the same findings as indicating that 
“the impact of . . . the film was greater 
after eleven weeks had passed [from the 
pretest, not the film] than it had been 
shortly after the showing of the film” (pp. 
254-255). In the context of manipulations 
of low credibility, Insko (1967) wrote, ‘This 
increase in the influence of the low credi- 
bility source over time was called the ‘sleeper 
effect’” (p. 44). 

An operational definition that closely cor- 
responds to the conceptual definition is that 
there is more belief change at a delayed 


Outcomes of Past Experiments Classified According to Definition of the Sleeper Effect 
a EEUU E UE 


Relative sleeper 


Absolute sleeper 


Condition with 


Statistically higher slope Appropriate Statistically 

Study significant? value direction? significant? 
Hovland & Weiss (1951) Yes Nondiscounting Yes DK 
Kelman & Hovland (1953) Yes Nondiscounting Yes No 
Kelman (1958) DK Discounting | Yes DK 
Watts & McGuire (1964) Marginal — Nondiscounting No 
Johnson, Torcivia, & Poprick (1968) No Discounting | No 
Whittaker & Meade (1968) Yes Nondiscounting No 
Falk (1970) Yes Nondiscounting Yes No 
Schulman & Worrall (1970) Yes Nondiscounting No 
Johnson & Watkins (1971) Yes Nondiscounting No 
Weber (Note 1) No Discounting | No 
Weber (1972) Yes Nondiscounting Yes Yes 
Gillig & Greenwald (1974) Yes Nondiscounting Yes No 
Weiss (1953) Yes Nondiscounting No 
Papageorgis (1963) No ae ie m i : ne 
Wil i es in 6 ol 

son & Miller (1968) NA мн вије 

Holt & Watts (1973) Yes Nondiscounting Yes Yes 


‘Note. DK = do not know—cannot be computed from th 


е available data; NA = not appropriate—there 


was no equivalent to a message-acceptance ог message-only condition. 
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— ин 
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Figure 1. Hypothetical illustrations of absolute and relative sleeper effects. 


belief-testing session than at a session im- 
mediately after the learning of a message 
(see the upper line in Figure 1A). However, 
an implicit assumption of this definition is 
that beliefs would stay constant over time 
if there were no persuasive message. To con- 
trol for the possibility of shifting population 
beliefs, it is desirable to collect data from 
no-message controls—*subjects who have not 
received the persuasive message (see the 
lower line in Figure 1A). A second opera- 
tional definition of a sleeper effect is that the 
pattern of posttest beliefs should suggest 
more of an increase in belief change (or less 
of a decrease) in a discounting cue group 
than in a no-message control group. With 
only two posttest time intervals, a sleeper 
effect would be indicated by a difference of 
differences or by an interaction of groups 
(discounting cue vs. no message) and the 
time of posttest measurement (immediate vs. 
delayed). With more delay intervals, differ- 
ences between time trends would be used 
for inferring the effect. 

In the past, the sleeper effect has often 
been defined as less decay, or relatively more 
change over time, in a discounting cue group 
relative either to a group exposed to the 


persuasive message alone (message-only con- 
trols) or to a group in which the message 15 
linked to a cue that should increase accept- 
ance of the message conclusion (see Figure 
1B). These operational definitions rest on 


the assumption that the decay of initial be- ^ 


lief change in the message-only or message- 
acceptance groups is an accurate estimate 
of the decay that would result in a dis- 
counting cue group if there were no sleeper 
effect. This assumption is difficult to accept 
because there is usually more initial change 
in the comparison groups than in a discount- 


ing cue group, and greater subsequent decay ` 


is usually associated with large initial 
changes (Cook & Flay, 1978). Therefore, 
defining the effect relative to these compari- 
son groups overestimates the expected decay 
in discounting cue groups, and false positive 
sleeper effects can be inferred. It is likely 
that too many false positives were inferred in 


the past because more initial change was' 


obtained in the groups without discounting 
cues and because the absolute decay of 
change found in these groups was more than 
the absolute increase in change found in the 
discounting cue groups. In other words, a sta- 
tistical interaction between time of measure 


~ 


~ 
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, ment and experimental groups (a discounting 
cue group and a message-acceptance or mes- 
sage-only group as the comparison group) 
was taken as evidence of the effect. These 
interactions typically resulted from relatively 
less decay in a discounting cue group rather 
than from an absolute increase in change, 
which is what the conceptual definition calls 
for. 

Consideration of the lack of fit between 
past conceptual and operational definitions 


У of the sleeper effect led Cook (Note 2) to 


coin the term relative sleeper effect to refer 
to the statistical interaction between experi- 
mental groups (one of which receives a mes- 
sage plus a discounting cue, while the other 
receives the same message without a cue) 
and time of testing (one test coming imme- 
diately after the message and the other 
after a delay period). Since a relative sleeper 
effect does not correspond closely to the con- 
ceptual definition of a sleeper effect, Cook 
also coined the term absolute sleeper effect 
to refer to more appropriate operational defi- 
nitions, namely, a temporal increase in be- 
lief change within a discounting cue group 
or a relatively greater increase in this group 
when compared with a no-message control 
group. 


^ There are two special situations in which 


it is not appropriate to test the absolute 
sleeper effect in the way we have just out- 
lined. First, the effect would not be validly 
tested if a discounting cue were so strong 
that instead of merely inhibiting initial atti- 
tude change, it caused attitudes to change 
significantly in the opposite direction (ће, 


V9 it caused a “boomerang effect” to below the 


no-message baseline). In such cases, one 
could not distinguish whether any obtained 
sleeper effect reflected (a) reversion over 
time toward the no-message baseline, (b) 
reversion over time toward a level equal to 
the delayed impact of the message when the 


® message and cue are dissociated, or (c) both 


t forces operating simultaneously. One must, 
therefore, be careful to inspect immediate 
belief means lest boomerang effects create 
spurious sleeper effects or infldte the esti- 
mates of true ones. Second, it would be diffi- 
cult to accept a discounting cue interpreta- 
tion of an absolute sleeper effect if subjects 
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in a discounting cue group were more favor- 
able to the message conclusion at the de- 
layed posttest than were subjects in a mes- 
sage-only group (ie, if there were a sig- 
nificant “crossover effect”). This is because 
the discounting cue hypothesis predicts that 
after dissociation, belief change in a dis- 
counting cue group will increase over time 
to the level found in a message-only group 
and not beyond this level. Though it would 
warrant theoretical and empirical explora- 
tion, a significant crossover would be incon- 
sistent with the discounting cue explanation 
of an obtained absolute sleeper effect. 


Failure to Demonstrate the Absolute 
Sleeper Effect 


Prior to 1978, there was only one sys- 
tematic attempt to test directly for an ab- 
solute sleeper effect: Greenwald and Gillig 
(Note 3) reported their failure to obtain the 
effect in five experiments and later published 
an article based on seven unsuccessful repli- 
cation attempts (Gillig & Greenwald, 1974). 
In the title of this latter article they rhetori- 
cally asked “Is It Time to Lay the Sleeper 
Effect to Rest?” (Gillig & Greenwald, 1974, 
p. 132). Their work explicitly acknowledged 
an earlier version of the present article 
(Cook, Note 2), in which past experiments 
were reviewed to see if any conclusions about 
absolute sleeper effects were warranted. How- 
ever, no review of this issue has yet appeared 
in print, and so we briefly offer one here. 

There are some difficulties, however, in 
ascertaining whether past studies resulted in 
absolute sleeper effects. First, few of the past 
experiments included statistical tests of the 
appropriate effect. We have therefore tried 
to conduct the missing absolute tests our- 
selves, using estimates of the means and 
standard errors that were either cited in 
published reports or that we were able to 
compute for ourselves from other published 
information. In three cases, the information 
was insufficient for computing any direct test 
of the absolute sleeper effect (Hovland & 
Weiss, 1951; Kelman, 1958; Wilson & 
Miller, 1968). Second, some research reports 
contained relevant data for more than one 
discounting cue condition (e.g., Hovland and 
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Weiss had four low-credibility groups), 
whereas other reports contained data from 
more than one relevant experiment (e.g., 
Gillig & Greenwald, 1974). Our strategy in 
such cases was to examine the data at their 
lowest level to see if absolute sleeper effects 
could be tested at that level. Where there 
were no statistical (or obvious visual) differ- 
ences across conditions or experiments, we 
summed over the various discounting cue 
conditions, and only the summary data are 
presented here. 

Column 4 of Table 1 indicates whether 
trends in the discounting cue group were in 
the direction required for an absolute sleeper 
effect. Column 5 shows whether the effect 
was statistically significant. Of the 12 ex- 
periments with low-credibility manipulations, 
6 had trends in the appropriate direction, 
4 of which could be statistically tested. None 
of the trends reached conventional levels of 
statistical significance. Of the 4 experiments 
with other discounting cue manipulations, 
Weiss's (1953) trend was not in the required 
direction; Papageorgis (1963) obtained the 
required trend over 14 days but not over 41 
(the 14-day trend could not be statistically 
evaluated); Wilson and Miller (1968) seem 
to have obtained the appropriate trends in 
some of their conditions, but statistical eval- 
uation was again impossible; Holt and Watts 
(1973) obtained a statistically significant 
effect. 

All in all, there is still no convincing evi- 
dence of absolute sleeper effects where low 
source credibility was the discounting cue. 
The evidence looks more promising with the 
other manipulations, but two facts have to 
be remembered about them. First, the Wilson 
and Miller experiment was not conceived as 
relevant to the discounting cue hypothesis; 
we have made the connection. Second, Holt 
and Watts found a marginal trend that indi- 
cated that subjects in the discounting cue 
group recalled more of the message both 
immediately after its presentation and 1 
week later. Although the absence of a simple 
relationship between message learning and 
the persistence of attitude change (Cook & 
Flay, 1978) suggests that learning factors 
might not have mediated the absolute sleeper 

effect that these authors obtained, it would 
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be more comforting if the discounting cue 
had not been confounded with marginally 
greater initial learning and subsequent reten- 
tion of the message. 

In the face of the preponderance of evi- 
dence from the past experiments reviewed 
here and their own seven failures to find an 
absolute sleeper effect, Gillig and Greenwald 
(1974) invited readers to accept the null 
hypothesis that there is no (absolute) sleeper 
effect. This invitation has not gone unheeded, 
and recent textbooks have begun to reflect 
skepticism about the validity of the sleeper 
effect (e.g., Baron & Byrne, 1977; Oskamp, 
1977; Schneider, 1976). To assess whether 
it is warranted to accept the null hypothesis 
in this instance, we first discuss the logical 
requirements for accepting the null hypothe- 
sis and then apply this logic to the case of 
the sleeper effect. 


Logic of Provisionally Accepting the 
Null Hypothesis 


Statistics texts teach that it is logically 
impossible to accept the null hypothesis. 
However, practical concerns demand that 
one sometimes provisionally act as though 
the null hypothesis were true. We believe 


that there are some guidelines that help in * 


determining when no-difference findings war- 
rant tentative acceptance of the null hy- 
pothesis. These are as follows: (a) when the 
theoretical conditions necessary for the effect 
to occur have been explicated, operational- 
ized, and demonstrably met in the research; 
(b) when all the known plausible counter- 


vailing forces have been explicated, opera- ' 


tionalized, and demonstrably ruled out; (c) 
when the statistical analysis is powerful 
enough to detect at least the theoretically 
expected maximum effect at a preordained 
alpha level; and (d) when the manipulations 
and measures are demonstrably valid. 


Theoretical Conditions Necessary for. the 
Effect to Occur Must Be Met 


Scientific propositions about the existence 
of a phenomenon are not adequately tested 
if the theoretical conditions necessary for 
the phenomenon are absent from empirical 
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^tests. In the case of the sleeper effect, one 

therefore has to explicate the underlying the- 
ory—the discounting cue hypothesis—to de- 
termine when it predicts that absolute sleeper 
effects should occur. 

Examination of the discounting cue hy- 
pothesis reveals that sleeper effects are only 
expected under certain conditions. These con- 
ditions can perhaps best be understood by 
considering Figure 2. It shows belief at three 
different posttest time intervals (immediate 
and two different delays) in a discounting 
cue group, a message-only group, and a no- 
message control group. The figure has been 
drawn to illustrate the case in which (a) 
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the message causes initial belief change, (b) 
this change either decays at a continuous 
rate (the dashed line) or does not decay at 
all (the dotted line), (c) the discounting cue 
suppresses all initial belief change (point 
4); and (d) population beliefs do not shift 
over time (see the solid line for the no- 
message control group). 

Consider what would happen if the mes- 
sage and the discounting cue became disso- 
ciated some time before Posttest 2 (delayed). 
Imagine first that there was no decay of 
change in the message-only condition (the 
dotted line). Then, the theory predicts that 
after dissociation, belief in the discounting 
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Figure 2. Hypothetical time trends that illustrate that an absolute sleeper effect is a special case 


of the discounting cue hypothesis. 
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cue condition will increase to the level found 
in the message-only condition at the time of 
measurement. As line AB shows, an absolute 
sleeper effect would be expected to result 
under these conditions. Imagine next what 
would happen if all the conditions were the 
same except that there was decay of belief 
change in the message-only condition (the 
dashed line). Then, no absolute sleeper effect 
would be expected to occur (see line AC), 
though everything else was the same as for 
line AB. 

Consider next what would happen if be- 
liefs in the message-only condition were de- 
caying (the dashed line) and dissociation 
had taken place by Posttest 1 (delayed). An 
absolute sleeper effect could still be obtained 
(see line 4D). However, if under the same 
conditions dissociation had not occurred by 
then, no absolute sleeper effect could be 
found (see line AE). 

This graphic presentation can be summa- 
rized as follows: The discounting cue hy- 
pothesis predicts that an absolute sleeper 
effect can be expected if and only if at the 
time the message and discounting cue are 
dissociated, the mean belief in a message- 
only group is higher than the mean belief that 
is found in a discounting cue group immedi- 
ately after exposure to a persuasive message. 
There are, then, two theoretical conditions 
necessary for a sleeper effect: (a) The mes- 
sage and discounting cue must become dis- 
sociated before delayed measurement, and 
(b) postdissociation beliefs in a message-only 
Broup must show more change than is ob- 
tained in the discounting cue group immedi- 
ately after the message. This last difference 
defines the maximum size of the sleeper effect 
predicted from the discounting cue hypothe- 
sis; the more the initial change in a message- 
only group and the slower this change de- 
cays with time, the larger the difference. 


Ruling Out Plausible Countervailing Forces 


An absolute sleeper effect is a reliable in- 
crease in belief change, and any force that 
causes belief change to decay over time will 
countervail against such an effect. It is com- 
monplace in persuasion research to note 
that experimentally induced belief changes 
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decay with time. In their review of 30 years, 


of relevant research, Cook and Flay (1978) 
concluded that despite particular instances 
of total persistence or sleeper effects at- 
tributable to forces other than the discount- 
ing cue hypothesis, decay of initial change 
was the modal finding. Therefore, it is im- 
portant in designing adequate tests of the 
sleeper effect to ensure that decay forces of 
all kinds are minimized. The countervailing 
power of belief decay is minimal in experi- 
ments in which the discounting cue sup- 
presses all of the belief change a message 
would otherwise have caused. However, such 
suppression cannot be due to chance, for if 
it were, statistical regression would result 
and would masquerade as a sleeper effect. 


Statistical Power Necessary for 
Accepting the Null Hypothesis 


An appropriate test of the sleeper effect 
must ensure that the immediate belief change 
in a dicounting cue group is less than the 
belief change found in a message-only group 
at the time of dissociation, This difference 
defines the maximum size of effect that can 
be obtained. Whether a belief change of this 
magnitude will be statistically significant, 
though, depends both on the alpha level 
chosen and on the standard error of the 
estimate of the particular mean belief differ- 
ence that will be used for inferring whether 
there is an absolute sleeper effect in the 
discounting cue group. Once alpha is as- 
sumed to equal .05, the crucial determinant 
of whether the maximum possible sleeper 
effect can be Statistically corroborated is the 
size of the standard error. 

To make this clearer, consider the sleeper 
effect ratio (SER), given below, which can 
be used to estimate the power of a statistical 
test in detecting an absolute sleeper effect of 
the maximum predicted size. The numerator 


of the SER formula is based on the neces- ^ 


sary theoretical conditions we explicated. The 
denominator is an estimate of the standard 
error that will be used for actually testing 
a sleeper effect. Taken together, the numera- 
tor and denominator indicate whether the 
theoretically specified maximum distance 
over which belief can increase over time iri 
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„а discounting cue group is large enough so 
that if a sleeper effect were to be obtained 
it could be statistically corroborated. The 
minimal requirement for an adequate test of 
the sleeper effect is that SER exceed the 
# value associated with the alpha level and 
the degrees of freedom that are to be used in 
direct tests of the absolute sleeper effect. 
The formula is as follows: 


Мррмо — Мімм.рс 
Particular standard error used for’ 
directly testing the absolute 
sleeper effect 


* SER= 


where Мър.мо is the belief mean at the post- 
dissociation testing (PD) in a message-only 
group (MO), and Miyy-nc is the belief mean 
at the immediate  postmessage testing 


) (IMM) in a discounting cue group (DC). 


The formula for SER can be used in any 
study to inquire whether the theoretically 
specified necessary conditions for an adequate 
test of the sleeper effect are met and whether 
a statistical test powerful enough to detect 
the effect, if it exists, can be conducted. 
However, in practice SER can somewhat 
overestimate or underestimate the power of 
the subsequent test of the absolute sleeper 
effect, because sampling error makes the 
sample means in the SER numerator deviate 
from their population values. 


ж 


Valid Manipulation and Measurement 


Of all the necessary conditions for an 

J adequate test of a phenomenon, perhaps the 
most obvious is that it must be shown that 
the manipulation of the independent variable 
was effective and that the dependent variable 
was validly measured. In the context of the 
discounting cue hypothesis, this amounts to 
finding differences in immediate belief change 


У; across groups that were ог were not exposed 


to a discounting cue. Such a finding would 
illustrate (a) that the discounting cue is 
effective in reducing belief change and (b) 
that a measure of belief with face validity is 
at least reliable enough to discriminate be- 
‘ween group means. 
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Did Past Tests Meet the Logical 
Requirements for an Adequate Test 
of the Sleeper Effect? 


Was the Countervailing Decay 
Force Ruled Out? 


An adequate test of the sleeper effect re- 
quires that there be no immediate belief 
change in the discounting cue group. The 
third column of Table 2 shows that immedi- 
ate belief change occurred in all but three 
of the experiments in which the relevant 
differences could be tested. Thus, in most 
past experiments a decay force may have 
been set up in the discounting cue group that 
may have countervailed against obtaining 
an absolute sleeper effect and may have led 
to less than adequate tests of it. 

Does the evidence confirm our contention 
that absolute sleeper effects are more likely 
when there is no immediate belief change in 
a discounting cue group? To answer this, we 
need to contrast the results in which there 
was and was not immediate change. Hov- 
land and Weiss (1951) had two messages 
that caused immediate attitude change and 
two that did not. Over 4 weeks, the per- 
centage of subjects with promessage attitudes 
increased by 30% and 27% for the messages 
without initial change and by 13% and 3% 
for the messages with initial change. Falk 
(1970) obtained no immediate attitude 
change, and in comparison with other studies, 
he found one of the strongest indications of 
an absolute sleeper effect, albeit nonsignifi- 
cant (¢ = 1.50). Finally, Wilson and Miller 
(1968) found no initial belief change in the 
eight conditions in which prosecution argu- 
ments preceded defense ones, but found ini- 
tial change in the eight other conditions in 
which the order of arguments was reversed, 
There were some trends in the sleeper effect 
direction in conditions in which there was 
no immediate belief change, and there were 
no indications of sleeper effects in condi- 
tions in which there was immediate change. 

Ex post facto evidence of this kind is by 
no means definitive, but it is consistent with 
our suggestion that absolute sleeper effects 
are more likely when no immediate belief 
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change occurs in a discounting cue condition 
"than when it does. 


~ 


Were the Necessary Theoretical Conditions 
Met in Tests With Sufficient 
Statistical Power? 


In its numerator, the SER formula incor- 
porates the necessary theoretical condition 
that the impact of the message at the time 

x of dissociation be greater than the immediate 
impact of the message when it is linked to a 
discounting cue. Via its denominator, the 
SER formula probes whether in a particular 
experiment, sufficient statistical power exists 
to obtain a sleeper effect of the maximal pre- 
dicted value. Theoretically, SER can be ap- 
plied to past studies to determine the extent 
to which each constituted a powerful test of 
the absolute sleeper effect. In practice, SER 

is not as useful as it could be, since it pre- 
supposes the measurement of dissociation, 
and measures of dissociation are rare in past 
studies. Moreover, where available, they are 
invariably associated with single measures 
that have particular biases. For instance, 

Watts and McGuire (1964) measured sub- 

jects’ recognition of the source's name, which 
` ¿Presumably overestimates both recall and 

¥ association. As another example, Schulman 
and Worrall (1970) had subjects free asso- 
ciate to the message topic, a procedure that 
may underestimate association, since some 
subjects may think it inappropriate to men- 
tion a source when asked to free associate 
to a topic. Although it is much better to 
have such measures than to be without them, 
they are not by themselves particularly good 
indicators of message-cue dissociation. 
The lack of data on dissociation in past 
studies forced us to estimate a modified form 
of SER. Since we did not know when dis- 
sociation reached acceptable levels, we com- 
puted a separate SER value for each time 
У ünterval at which belief measurement took 
place. Hence, for each study, there were as 
many SER values as delay intervals. To aid 
interpretation we assumed that dissociation 
is positively related to the length of the de- 
lay interval and is approximately indexed 
hy the percentage of subjects who do not 
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report recognizing or spontaneously recalling 
a discounting cue. 

For only seven studies did the modified 
SER values we computed exceed the ¢ value 
associated with the statistical test of the 
absolute sleeper effect, and this occurred in 
only some experimental groups: (a) in the 
low-credibility group of Watts and McGuire 
(1964) after 1 and 2 weeks but not after 6; 
(b) in Johnson et al. (1968); (c) in the 
low-credibility-message-five-times condition 
of Johnson and Watkins (1971), but not in 
the condition in which the message was pre- 
sented once; (d) in the experiments of Gillig 
and Greenwald (1974); (e) in the 3-week- 
delay condition of Weber (1972) in which 
the source was mentioned twice, but not in 
the 7-week-delay condition; (f) in the 2- 
and 14-day-delay conditions, but not the 41- 
day-delay condition, of Papageorgis (1963) ; 
and (g) for the minor conclusion measure 
of Holt and Watts (1973). 

Two points are worth noting about these 
experiments. First, immediate belief change 
occurred in all the discounting cue groups, 
with the only possible exception being Holt 
and Watts, in whose study the relevant test 
of immediate change could not be conducted. 
Consequently, there are no experiments in 
which there was both an absence of initial 
change and a modified SER value greater 
than the ¢ value associated with the statisti- 
cal test of an absolute sleeper effect. Second, 
the data suggest that little dissociation may 
have occurred in studies with higher modi- 
fied SER values. Consider the last three col- 
umns of Table 2. Of Watts and McGuire's 
(1964) subjects, 36% still recognized the 
low-credibility source after 1 and 2 weeks. 
Johnson et al. (1968) and Holt and Watts 
(1973) had a final delay period of only 1 
week, and Gillig and Greenwald (1974) had 
a final delay of only 2 weeks. The typical 
decay of belief change did not occur in the 
high-credibility-message-five-times conditions 
of Johnson and Watkins (1971), which sug- 
gests that the message and source may not 
have been dissociated (although other ex- 
planations are possible). And the modified 
SER values in Weber (1972) and Papageor- 
gis (1963) were only large enough with the 
shorter delay intervals (ie. at 3 weeks but 
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not 7 for Weber and at 2 weeks but not 6 
for Papageorgis). 

The evidence we have just reviewed about 
dissociation and SER values is at best sug- 
gestive, but it does imply that high levels 
of dissociation may not have been obtained 
in most past experiments in which the modi- 
fied SER value was large enough for a pow- 
erful statistical test. More important, it should 
be noted that all of the experiments with high 
SER values were characterized by immediate 
belief change in the discounting cue condition. 
Since a strong test of the sleeper effect re- 
quires both the absence of initial change in 
the discounting cue group and a large enough 
SER value, it is not clear whether the ab- 
sence of past convincing demonstrations of 
the sleeper effect was due to the unreliability 
of the phenomenon or to weak tests of it. 


Do Adequate Tests Detect the 
Sleeper Effect? 


Reasons to Suspect That They Do 


The results prior to 1978 of tests of the 
absolute sleeper predicted from the discount- 
ing cue hypothesis are ambiguous because 
one could conclude from them either that 
there is no absolute sleeper effect or that 
the tests of it were inadequate. There are 
two major reasons for suspecting that the 
latter is true and that the effect might be 
found if appropriately tested. First, our re- 
view of past studies shows that past failures 
to obtain a sleeper effect occurred in studies 
in which it is unlikely (a) that all of the 
theoretically relevant conditions for a strong 
test were met and (b) that the countervail- 
ing force was ruled out. Moreover, ex post 
facto analyses suggested that the less the 
initial belief change in the discounting cue 
group, the stronger the trends toward a 
sleeper effect. 

Second, the dissociative cue hypothesis, 
the basic theory from which the sleeper ef- 
fect is derived, was twice directly tested, and 
each time was corroborated. The dissocia- 
tive cue hypothesis proposes that both mes- 
sage-acceptance and message-rejection cues 
can be dissociated from a message over time 
and that this dissociation eliminates the im- 
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pact that the cues initially have on belief. 
Thus, the dissociative cue hypothesis pre- 
dicts that when a message-acceptance cue 
is dissociated the decay of change will be 
accelerated but that when a message-dis- 
counting cue is dissociated the initial inhibi- 
tion of belief change will be eliminated and 
an absolute sleeper effect will sometimes re- 
sult. In this sense, the dissociative cue hy- 
pothesis is a more general statement of the 
discounting cue hypothesis we have examined 
in this article. 

Direct tests of the dissociative cue hy- 
pothesis have varied the degree of message- 
source association, Kelman and Hovland 
(1953) did this when the original message 
source (of high or low credibility) was or 
was not reinstated at the delayed measure- 
ment session. As predicted from the dissocia- 
tive cue hypothesis, (a) in high-credibility 
conditions, attitude decayed less when the 
source was reinstated than when it was not, 
and (b) in low-credibility conditions, belief 
showed a trend toward a sleeper effect when 
the source was not reinstated and no such 
trend when it was. In a related experiment, 
Weber (1972) manipulated the strength of 
the message-cue association during exposure 
to the persuasive message rather than by 
reinstating the source at the delayed mea- y. 
surement. Weber had subjects learn about a ` 
high- or low-credibility communicator whose 
name was mentioned twice (low association) 
or 22 times (high association) in the course 
of reading and then rereading a long per- 
suasive message. Belief was assessed immedi- 
ately after the message for all subjects, and 
3 weeks later for some of them and 7 weeks; c 
later for the remainder. The delayed assess- 
ments were conducted in subjects’ living 
quarters by experimenters who did not know 
the hypothesis, and it is likely that the 
earlier experiment was not reinstated. — 
Weber's time trends (see Figure 3) are 
very similar to those of Kelman and Hov- ^ 
land, and the overall three-way terao У 
of credibility, strength of association, and 
time of measurement was statistically sig- 
nificant at the 596 level (one-tailed test). It 
seems, then, that the basic dissociative cue 
hypothesis is correct. This hypothesis but- 
tresses the prediction of a sleeper effect, 
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i which suggests—but does not prove—that 
such an effect would be obtained if a strong 
test were conducted. 


Results of Adequate Tests of the 
Sleeper Effect 


When demonstraby adequate tests of the 
absolute sleeper effect were conducted, they 
repeatedly resulted in absolute sleeper effects 
(Gruder et al, 1978). In one experiment 


* subjects read a persuasive message about 


either the disadvantages of a 4-day work 
week or the disadvantages of permitting 
right turns when traffic lights are red. Each 
message was or was not accompanied by a 
brief statement that suggested that the in- 
formation in the message was false. Subjects’ 
beliefs were assessed both immediately after 
the message and 5 weeks later at an un- 
related experimental session. The data 
showed (a) that the discounting cle manipu- 
lation suppressed all belief change for the 
4-day work week topic but not for the right 
turn on red topic and (b) that SER was 
large enough for the 4-day work week topic 
but not for the right turn on red topic. Thus, 
the requirements for an adequate test of 
the absolute sleeper effect were clearly met 


у for one message and not for the other. Anal- 


ysis of the belief data showed a statistically 
significant sleeper effect for the message in 
which the requirements were met but no 
such effect for the message in which they 
were not met. 

A second experiment was designed to 
replicate and extend the first by linking the 


(y 4-day work week message to five different 


discounting cues that varied in strength and 
kind. Half the subjects’ beliefs were mea- 
sured immediately after the message, and 
all the subjects’ beliefs were measured by 
phone 6 weeks later by persons who did not 
know the subjects’ experimental conditions. 
, Analyses of the data relevant to the require- 
ments for an adequate test showed that (a) 
there was no immediate attitude change in 
three of the five discounting cue groups; 
(b) SER was large enough for these three 
groups; and (c) direct measures of disso- 
ciation revealed that over 75% of the sub- 
jects in each cue group dissociated or for- 
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Figure 3. Persistence of attitude change as a func- 
tion of source credibility and source-content asso- 
ciative strength in Weber (1972), whose relevant 
data are reproduced with his permission. (Sample 
sizes are in parentheses, and the 3- and 7-week- 
delay groups are collapsed because there were no 
theoretically meaningful effects of the length of 
the delay period.) 


got the cue. Thus, the requirements for an 
adequate test were clearly met for three cue 
groups, but were not met for the other two. 
Absolute sleeper effects were reliably ob- 
tained in the three cue groups in which the 
requirements were met and not in the two 
cue groups in which the requirements were 
not met. 

These studies demonstrate that the abso- 
lute sleeper effect that is predicted from the 
discounting cue hypothesis is both reliable 
and valid. It is reliable in the sense that it 
has been replicated across experiments, dis- 
counting cues within experiments, repeated 
versus not repeated belief measurements, and 
two different techniques of delayed measure- 
ment. Though replication with different mes- 
sages and in different laboratories is still 
needed, it does seem that the absolute sleeper 
effect is valid by criteria of both convergent 
and divergent validity insofar as the effect 
occurred when the logical requirements for 
an adequate test were demonstrably met and 
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did not occur when these requirements were 
not met. 


Two Models for Provisionally Accepting 
the Null Hypothesis 


This review indicates that the absolute 
sleeper effect can be obtained once a spe- 
cific set of explicated conditions has been 
incorporated into experimental tests of the 
effect, However, the major purpose of this 
article is not to illustrate that the sleeper 
effect can be found. Rather, the major pur- 
pose of the present article is to illustrate the 
logic of provisionally accepting the null hy- 
pothesis. 

Gillig and Greenwald (1974) used their 
seven unsuccessful attempts to find a sleeper 
effect to invite readers to believe that it was 
time to lay the sleeper effect to rest. Several 
textbook writers have already accepted their 
invitation, despite the fact that it may well 
be based on an implicit inductive fallacy: 
The more numerous the failures to obtain 
an effect, the less likely it is to exist. In a 
later article, Greenwald (1975) used the so- 
called nonexistence of sleeper effects as a 
major illustration of the factors that indi- 
cate “How to Accept the Null Hypothesis 
Gracefully" (p. 16). These factors are as 
follows: 

1. "Use a range, rather than a point, null 
hypothesis" (p. 16). 

2. "Select N on the basis of a desirable 
error of estimate of the test statistic" (p. 
16). 

3. “Have convincing evidence that manip- 
ulations and measures are valid" (p. 17). 

4. “Compute the posterior probability of 
the null (range) hypothesis" (p. 17). (Doing 
this led Greenwald to conclude from his 
own studies of the sleeper effect that the 
posterior odds ratio in favor of the null hy- 
pothesis is 1:249 as opposed to 1:19 for 
а = .05.) 

5. "Report all results of research for 
which conditions appropriate to testing a 
given hypothesis have been established” (p. 
18). (Greenwald elaborated on this by stress- 
ing the Type I errors that result from the 
nonpublication of nonsignificant findings.) 

With the exception of Point 3, Green- 
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wald’s list is statistical, and the casual reader 

might infer from this and from Gillig and? 
Greenwald that acceptance of the null hy- 

pothesis is largely a matter of the statistical 

power of a study or a set of related studies 

and that the more past failures to obtain the 

effect in question, the higher should be the 

confidence in accepting the null hypothesis, 

Although this perspective on the null hy- 

pothesis is useful, it must be incomplete, 

since it led to incorrect acceptance of the 

null hypothesis about the sleeper effect. What + 
is incomplete about this perspective? 

First, it does not stress the importance of 
deducing from the parent theory (the dis- 
counting cue hypothesis in the sleeper effect 
case) the theoretical conditions that are 
necessary for a particular effect. Explication 
of the discounting cue hypothesis makes it 
clear that the effect should only be predicted 
when the belief found in a discounting cue 
group immediately after a message is at a 
lower level than the belief found in a mes- 
sage-only group just after dissociation. Given 
that the parent theory specifies the condi- 
tions under which a sleeper effect should be 
obtained, it is incumbent on any researcher 
to ensure that these conditions are met in 
experimental tests. If the conditions are not 
met, it does not matter how many tests аге ју 
conducted, because they will all be inade- 
quate. 

Second, it is important to use background 
theory and common sense to specify extrane- 
ous forces that might plausibly countervail 
against the phenomenon under investigation. 
In the sleeper effect case, one very plausible 
countervailing force 15 the decay of belief ^: 
change. Such decay is regularly found after 
belief change has been obtained as a conse- 
quence of an effective persuasive message. 
Of course, belief does not invariably decay, 
and it is logically impossible to specify all 
the potential suppressor variables, But de- 
spite such complications, researchers who ¢’ 
want to provisionally accept the null hy-/ $ 
pothesis need to show that countervailing 
forces have not operated in studies to mask 
the effect under investigation. - 

Third, statistical considerations are crucial 
if one wants to act as though the null hy- 
pothesis were true. This is clearly recognized 
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in Greenwald's (1975) perspective, which 
urges researchers to design experiments so 
that the standard error of estimate of the 
test statistic is desirable. However, how 
should one define what is desirable? Fortu- 
nately, there are some cases in which theory 
allows one to specify a numerical value for 
the desirable standard error. In the sleeper 
effect case, the maximum possible size of the 
effect can be specified by subtracting the 
difference between the immediate posttest 
* belief mean in a discounting cue group from 
the postdissociation belief mean in a mes- 
sage-only group. Then one can easily solve 
for the size of the standard error that is 
necessary for an adequate test of the sleeper 
effect. (The SER formula incorporates all of 
the factors necessary for the computation in 
question.) There are, of course, many situa- 
tions in which the underlying theory is not 
) so precise and in which the practicing re- 
searcher does not have independent knowl- 
edge of the maximum magnitude of the 
effect. In such situations, one might well 
follow Greenwald’s advice and keep the 
standard error low by using range estimates 
of the effect instead of points. But where 
the theory specifies the magnitude of an ex- 
pected effect, the size of the necessary stan- 
x dard error can be computed. 

On another issue our perspective entirely 
overlaps with Greenwald's, for accepting the 
null hypothesis depends on demonstrating 
that the manipulations and measures are 
valid. 

The essence of the difference between the 
two perspectives is that Greenwald's stresses 
statistical concerns pertinent to evaluating 
no-difference results from a single study or 
a set of highly similar studies, whereas our 
perspective stresses the logic of using theory 
prior to data collection to deduce the condi- 
tions that are necessary for an effect and to 
deduce any conditions that might countervail 
against the effect. Unless these conditions are 
d lexplicated and demonstrably incorporated 

into the research, statistical criteria matter 

little, and the number of failures to obtain 

a given effect is uninformative. Our stress, 

then, is (a) on the logic of explicating the 

necessary theoretical, countervailing, statis- 
fical, and procedural conditions for an effect; 
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(b) on deducing which research operations 
will represent these conditions; and (c) on 
demonstrating via appropriate measurement 
that all of these conditions have been in- 
corporated into the experimental tests. If 
they have been so incorporated but the effect 
still does not appear, then one is closer to 
being able to act with justification as though 
the null hypothesis were true. 

Other factors, although not necessary for 
adequate tests of the null hypothesis, are 
desirable. It would be foolish to accept a 
null hypothesis if some secondary implica- 
tion of this acceptance were patently false. 
For instance, the sleeper effect is a single 
prediction from the discounting cue hypothe- 
sis that is itself part of the dissociative cue 
hypothesis. Since independent experimental 
tests have strongly suggested that the dis- 
sociative and discounting cue hypotheses may 
well be correct (Kelman & Hovland, 1953; 
Weber, 1972), this alone should give one 
reason to pause before using no-difference 
findings to claim that the sleeper effect 
should be laid to rest. 

When accepting the null hypothesis, there 
is sometimes value in replication. Cronbach 
and Snow (1977) have correctly pointed out 
that if many unbiased studies show nonsig- 
nificant effects in the same direction, then 
the null hypothesis can be rejected. (The 
stress here has to be on unbiased, for mul- 
tiple biased tests provide no information if 
the bias operates in the same direction across 
all the studies.) By the same token, accept- 
ance of the null hypothesis is facilitated if 
in many studies in which the necessary logi- 
cal conditions for an effect have been met, 
the effect does not appear and there is no 
pattern to the nonsignificant results. 

Other advantages of replication emerge in 
considering the question, Are there any cir- 
cumstances under which one could incorpo- 
rate all the necessary theoretical, counter- 
vailing, statistical, and procedural require- 
ments into an experiment but still fail to find 
a true effect? Obviously, chance dictates that 
a Type II error can occur, and replication 
is the way to examine this threat. A less ob- 
vious circumstance in which one might falsely 
conclude in favor of the null hypothesis is 
one in which all the known necessary condi- 
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tions for an effect are met but unknown 
necessary conditions are not. Imagine that 
absolute sleeper effects can occur under the 
conditions we have specified, but only if the 
discounting cue follows the persuasive mes- 
sage. In this case, experimental tests that 
met all the derived necessary conditions but 
presented the discounting cue before the 
message would lead to a false acceptance of 
the null hypothesis. The practical difficulty 
with such unknown conditions is that they 
cannot be deliberately incorporated into the 
research. Consequently, it is advantageous 
to postpone acting as though the null hy- 
pothesis were true until a number of studies 
have been conducted that meet all the neces- 
sary conditions derived from the theory and 
also vary a number of factors that might 
plausibly be additional necessary conditions. 

Since one can never know in practice 
whether one has met the true necessary (and 
sufficient) conditions for any effect, infer- 
ences about the null hypothesis must in- 
evitably and logically be imperfect and sub- 
ject to later correction. As Cook and Camp- 
bell (1979) have pointed out in their extended 
discussion of the null hypothesis, inferences 
in favor of the null hypothesis should be 
thought of as decisions to act as though this 
hypothesis were true and not as knowledge 
that the hypothesis is true. 
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Human Crowding and Personal Control: 
An Integration of the Research 


Donald E. Schmidt and John P. Keating 
University of Washington 


Empirical and theoretical discussions have suggested that crowding is experienced 

when situational density forces the blocking of goals, the interruption of be- 

haviors, or cognitive overload to occur. However, no psychological principles 

have been employed to unify these explanations. The present article attempts to k 
link the literature on human crowding with the experimental research on per- & 
sonal control. Averill's distinctions among behavioral, cognitive, and decisional 

control are discussed in the context of human crowding. A conceptual model is 

offered that suggests that crowding is an attributional label applied to a setting 

when situational density results in a loss or lack of personal control. 


The development of industrialization in 
the United States has been associated with 
a shift from a dispersed, agriculturally based 
society to a relatively centralized urban cul- 
ture and a period of rapid population growth 
(Davis, 1965; Meadows, Meadows, Randers, 
& Behrens, 1972). These general population 
growth patterns have created dense urban 
environments in which the majority of the 
population resides (Hawley, 1971). Perhaps 
this is why human crowding has become a 
central topic of research in the emerging sub- 
discipline of environmental psychology. 

Theoretical discussions of crowding have 
focused on two overriding questions: What 
physical, social, and personal factors deter- 
mine an individual's experience of crowding, 
and what are the psychological and physio- 
logical consequences of being crowded on a 
long-term basis? Although the literature has 
Brown to voluminous proportions, it does not 
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yet offer definitive answers to these questions - 


(Lawrence, 1974). 


Initial animal studies established relation- / 


ships between population density and psy- 
chosocial and physiological anomalies. Cal- 
houn (1962) created rat colony densities 
that were well above levels previously asso- 
ciated with stress. Under these conditions, 
he found a variety of abnormal develop- 
ments, including the disturbance of normal 
maternal behaviors, cannibalism, homosex- 
uality, and the disruption of a number of 
physiological processes. Christian, Flyger, 
and Davis (1960) observed the density- 
related mass mortality of a herd of silka 
deer on a small island off the Maryland 
coast. Autopsies of the deer carcasses re- 
vealed no consistent evidence of disease or 
malnutrition, but the presence of enlarged 


adrenal glands was an apparent sign of stress. ; 


(cf. Selye, 1956). Results similar to those 
observed by Calhoun and Christian et al. 
have also been found in house mice (South- 
wick, 1955), lemmings (Clough, 1965), and 
snowshoe hare (Deevey, 1960). Although it 
is unclear what specific factors contributed 
to these effects, a plausible explanation has 
been offered by Christian (1963). He postu- 
lated that high levels of population density 
may have created intraspecies competition 
for available space and resources. The psy- 
chological pressures (stress) that resulted 
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,may have culminated in the observed mal- 
adies. 

Subsequent demographic-correlational stud- 
ies have attempted to relate these findings 
to human populations. Associations between 
population density and indices of crime, dis- 
ease, and the like commonly have been 
found (e.g. Galle, Grove, & McPherson, 
1972; Hawley, 1972; Schmitt, 1966). How- 
ever, as provocative as the animal studies and 
correlational analyses appear, their applica- 

* bility to conditions encountered in large 
urban settings is severely restricted. Correla- 
tional analyses using relatively gross demo- 
graphic indices provide only general informa- 
tion about these associations and do not 
answer questions concerning causation. Simi- 
larly, generalization of the results from ani- 
mal studies to human populations in cities is 
tenuous. Stages of phylogenetic development 


i have clearly increased the ability of humans 


to deal with and adapt to conditions in the 
external environment (Glass & Singer, 1972; 
Schneirla, 1971; Stokols, 1972b). Thus, the 
response of other species of animals to ex- 
ternal conditions may be noticeably different 
from that of humankind. 

For humans, the relationship between pop- 
ulation density and an individual's percep- 


x оп of crowding is far from perfect. Stokols 


(1972a) has made a useful distinction be- 
tween these two measures. Population den- 
sity is defined as a measure of the number 
of people per unit area, which is strictly a 
physical index. Alternately, Rapoport (1975) 
has described functional density as a mea- 
sure of the number of others within a set- 


=W у па who directly affect an individual's be- 


i 


haviors and perceptions. This extension of 
the concept specifies a subjective determina- 
tion of the physical concept. Crowding, on 
the other hand, is a cognitive evaluation that 
is predicated on the individual's negative 
affective reaction to the immediate environ- 
ment. Stokols (1972a) noted that although 
Mdensity is a necessary antecedent of the per- 
ception of crowding, it is not a sufficient 
cause. Situational constraints, as well as 
personal and social factors, are also im- 
portant determinants of this negative evalu- 
ation of the environment (Altman, 1975; 
Carr, 1967; Hall, 1966; Schmidt, in press). 
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Hence, although density must be present for 
the evaluation to be made, social and dis- 
positional influences also play an important 
role in the judgment of crowding. 

The two primary theoretical approaches 
developed to explain the crowding response 
are the behavioral constraint or social inter- 
ference model and the cognitive overload 
position. Both approaches incorporate physi- 
cal, social, and dispositional factors. These 
explanations have helped to clarify the rela- 
tionship between density and crowding, since 
they specify a number of environmental con- 
ditions, produced by situational density, that 
result in an evaluation of crowding. It is 
important to make clear from the outset of 
this discussion that some nonspecific level 
of density is always assumed to be necessary 
for the production of behavioral constraint, 
social interference, or cognitive overload. 
Crowding is expected to result only when 
the antecedent level of density is present. 
Second, crowding is treated as the result of 
the physical consequences of density in these 
explanations and not as any specific social 
process resulting merely from the interacting 
presence of others. A situation is evaluated 
as crowded when the presence of others re- 
sults in interference and not when interfer- 
ence is produced only by social interactions 
with others; that is, crowding has been 
treated as a physically based and not a so- 
cially based reaction to the environment; 
it is not interference per se but interference 
related to density that results in an evalua- 
tion of crowding. 


Behavioral Constraint/Social Interference 
Explanation 


The behavioral constraint or social inter- 
ference explanation posits that a situation 
will be evaluated as crowded when density 
or other related conditions restrict or inter- 
fere with the activities of an individual 
within the setting. This approach is modeled 
after the theory of psychological reactance, 
in which Brehm (1966) stated that mainte- 
nance of freedom of choice is an important 
motivating factor in human behavior and 
perception. He argued that people are dis- 
posed to maintain or restore freedom when it 
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is threatened and that an individual's. reac- 
tion to a setting is dependent on his or her 
success at accomplishing this goal. Proshan- 
sky, Ittelson, and Rivlin (1970) have dis- 
cussed reactance in relation to environmental 
perception. They noted that crowding is ex- 
perienced when situational density leads to 
a frustration of an individual's pursuit of 
important activities and goals. This frus- 
tration may be the result of actual physical 
interference or of the mere presence of 
others, both of which are construed as limi- 
tations of behavioral choices (Stokols, 
1972b; Stokols, Rall, Pinner, & Schopler, 
1973). 

This perspective of limited freedom has 
been adapted in several theories of crowd- 
ing. Altman (1975) described social behavior 
in terms of a privacy-control mechanism that 
attempts to regulate the frequency of social 
contacts. According to Altman’s model, 
crowding occurs when the amount of actual 
social contact exceeds the level that is de- 
sired. The person is unable to effectively 
limit interactions with others. Similarly, 
Esser (1973) viewed crowding as the result 
of “not having one’s way.” This occurs when 
the denseness of people in a setting creates 
social interference and hence prevents an 
individual from functioning effectively. Sae- 
gert (1973) noted that high-density environ- 
ments may create social interference, compe- 
tition for scarce resources, and the restriction 
of behaviors that increase perceived crowd- 
ing. Finally, Stokols (1972b) postulated that 
an individual experiences crowding when the 
demand for space required by a specific ac- 
tivity exceeds the available supply. 

The research investigating this explanation 
of crowding has generally provided confirm- 
atory results. Sherrod (1974) conducted an 
experiment in which three density conditions 
were manipulated: low density, high density, 
and high density with control. In the first 
two conditions, subjects were placed in either 
dense or nondense settings, but were not 
given the option of leaving the room. In 
the third condition, subjects could freely 
choose to leave the setting. Sherrod found 
that subjects displayed more decrement in 

performance on a complex cognitive task and 
less persistence on an unsolvable puzzle ad- 
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ministered subsequent to interaction in a 
high-density setting relative to a low-density 
setting. However, perceived control in the 
high-density setting appeared to ameliorate 
these negative aftereffects. It is important to 
note that subjects never actually exercised 
the option to leave the setting, and hence it 
was perceived rather than actual control that 
contributed to these effects. 

Sundstrom (1975) manipulated room size 
and goal blocking independently in an ех, (| 
perimental setting. In this study, goal block 
ing was operationalized as interruption or 
inattention by others. Subjects reported 
greater irritation, which increased over time, 
when goals were blocked than when they 
were not. However, the amount of self-re- 
ported irritation was unrelated to the level 
of room density. Sundstrom noted that high- - 
density conditions may cause disruption of 
interpersonal interactions, which subse- 
quently produces psychological stress. On the _ 
other hand, density should be unrelated to 
stress when the number oí people present 
in a setting does not functionally interfere 
with goal-directed behaviors. Stokols et al. 
(1973) found that subjects felt more 
crowded in competitive as compared with 
cooperative groups, suggesting that the num- | 
ber of people present in a competitive situ-* 
ation may be more salient because they 
either actively or potentially inhibit the in- 
dividual from attaining a desired goal or 
outcome. Hence, the individual is more likely 
to evaluate the situation in a negative way. 
Wicker, Kirmeyer, Hanson, Alexander 
(1976) found that subjects evaluated an ex- |j 
perimental setting as more crowded when ^ 
there were not enough people to perform 
all of the required tasks, an “undermanned” 
situation. Perceived crowding was signifi- 
cantly less in the “overmanned” condition, 
a relatively higher density setting. Since as- 
signments in the undermanned condition re- 
quired considerable physical movement, often f- 
leading to behavioral interference among sub- ` 
jects, this result is supportive of the freedom. 
and control hypothesis. Density is related to 
crowding when it is expected to reduce the 
individual's control over the environment. 
Crowding does not occur when density does _ 
not affect behavior or goals. 1 
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* Schmidt, Goldman, and Feimer (1979) 
conducted a large-scale field study measuring 
evaluation of crowding at the residential, 
neighborhood, and city levels and related 
these variables to a number of psychological 
and physical measures. They found that psy- 
chological measures indicating some degree 
of control (e.g., privacy) over the environ- 
ment were associated with perceived crowd- 
ing at all levels of the analysis (cf. Altman, 
y 1975; Johnson, 1974). Further, psychologi- 
cal variables became increasingly important 
and physical measures decreasingly impor- 
tant as one moved from the more immedi- 
ate residential setting to the less immediate 
neighborhood and city settings. Schmidt et al. 
suggested that in the residential setting phys- 
ical density had a direct effect on the resi- 
dents’ behaviors and activities. However, at 


} the neighborhood and city levels of analysis, 
-» crowding evaluations were less tied to ob- 
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jective levels of density and more closely 
aligned with the impact of more general ur- 
ban conditions on an individual's behaviors 
and activities. 


Norms, Control, and Crowding 


» Previous theoretical statements that have 
been presented concerning the relationship 
between crowding and perceived freedom and 
control suggest that normative standards and 
expectations may have an important effect 
on evaluations of the environment (Proshan- 
sky et al., 1970; Schmidt, in press; Stokols, 
1972b; Stokols et al., 1973). Personal con- 
trol and the subsequent perception of crowd- 
ing involve individual assessments that are 
based on the person's expectations. Since 
norms specify common behavioral standards, 
they provide predictability that increases 
control in a social situation (cf. Lefcourt, 
1972). While norms increase situational con- 
trol, violation of such norms decreases con- 
| rol. From this perspective, density leads to 
an evaluation of crowding when it violates 
situationally appropriate social or spatial 
norms. However, the existence of different 
behavioral and spatial norms among various 
social subgroups and cultures provides an 
interesting degree of complexity, a point 

ade especially clear in the anthropological 
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literature (e.g., 1966; Watsen & 
Graves, 1966). 

Anderson (1972) suggested that high den- 
sities may increase the likelihood of highly 
charged social interactions that cause stress. 
He noted, however, that in Chinese cultures, 
large personal spaces are not intrinsically 
valued and that the existence of well-estab- 
lished behavioral and spatial standards re- 
duces the potential for stressful social inter- 
actions induced by close physical proximity, 
thus reducing the experience of crowding. 
Draper (1973) reported that among the 
!Kung bushmen, a hunter-gatherer tribe of 
southwestern Africa, extremely high-density 
living is common and preferred. However, 
periodic escape into outlying desert areas 
appears to mitigate the potentially stressful 
effects of high-density living. Finally, 
Schmidt, Goldman, and Feimer (1976) 
found that white, black, and Chicano sub- 
cultural groups used distinctly different cri- 
teria in evaluation of the environment. 
Crowding was a subjectively different ex- 
perience for each of the three groups. Black 
and Chicano subjects tended to focus on a 
wider range of urban factors in their judg- 
ments concerning crowding than did their 
white counterparts. The above studies sug- 
gest that culturally different evaluative and 
behavioral standards affect environmental 
judgments. 

Several studies have investigated the vio- 
lation of situationally specific norms as they 
are related to the evaluation of crowding. 
Baxter and Deanovich (1970) instructed 
confederates to position themselves at an 
inappropriately close or normal spatial dis- 
tance from an experimental subject. Com- 
pared with subjects in the normal condition, 
subjects in the inappropriately close condi- 
tion (normative violation) reported that 
they felt more crowded and displayed higher 
projective anxiety in describing figures illus- 
trated in a scenario situation in close physi- 
cal proximity. Freedman, Levy, Buchanan, 
and Price (1972) tested males and females 
in small and large experimental rooms. A 
statistical interaction between sex of the 
subject and room density was obtained; 
women felt less crowded in the small (dense) 
room than did men. Additionally, women 
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tended to be more cooperative in the small 
room, whereas men displayed more competi- 
tive behaviors, Epstein and Karlin (1975) 
found similar results in a study of acute 
crowding. Males tended to form fragmented, 
competitive groups in the dense condition, 
whereas females were more cohesive and 
cooperative. Epstein and Karlin suggested 
that these effects may be attributable to 
differential sex norms. Females are expected 
to share their distress and therefore form 
more cohesive groups. Males, on the other 
hand, are expected to hide distress, which 
purportedly leads to a more fragmented 
group orientation, Leibman (1969) also has 
suggested that socialization of sex roles may 
be associated with different personal-space 
requirements for men and women. Different 
criteria may be applied by the sexes when 
each evaluates a setting and makes a judg- 
ment of crowding. 


Locus of Control and Crowding 


Although most research has been con- 
cerned primarily with situational control, 
certain dispositional characteristics affecting 
perceived personal control have also been 
related to the perception of crowding. Re- 
search investigating the locus-of-control per- 
sonality construct (Rotter, 1966) has found 
differences in reactions to density and in per- 
sonal-space needs. In an experimental group 
situation Schopler and Walton (Note 1) 
found that subjects classified as internals 
felt less crowded than subjects classified as 
externals. Similarly, Duke and Nowicki 
(1972) found that externals had larger per- 
sonal spaces than internals on a pencil-and- 
paper measure in which subjects were asked 
to imagine the approach of a fictitious 
stranger and to estimate at what distance 
they would begin to feel uncomfortable. Per- 
sonal-space distances are thought to act as 
buffers between the individual and social 
contacts, hence offering the individual some 
control over the quality and nature of inter- 
personal interactions (Hall, 1962, 1966; 
Sommer, 1969). In the context of the be- 
havioral constraint explanation of crowding, 
internals presumably have a greater tolerance 
for interference from the environment, since 
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they perceive a greater degree of personal’ 
control over it. Externals, on the other hand, 
may perceive generally less control over 
these conditions and consequently may re- 
quire larger buffer zones. Similarly, less in- 
terference from the environment elicits nega- 
tive evaluations of crowding. 

In summary, the theoretical definition of 
crowding considered above postulates that 
an environment produces negative affect and 
is evaluated as crowded when social or physi- « 
cal factors reduce the amount of perceived 
freedom and control. Increased levels of 
density create conditions that make it diffi- 
cult for the individual to function effectively 
in a setting by blocking goals and interfer- 
ing with organized behavioral sequences. 
Since norms set behavioral standards and 
expectancies that structure social situations, 
they increase predictability and personal con- : 
trol over a setting. Similarly, normative vio- 
lations attributable to density should lead 
to greater evaluations of crowding, since 
they functionally reduce situational control. 
However, because normative standards vary 
between cultural and social groups, different 
physical and social conditions may lead to 
violation of different behavioral and spatial 
standards according to the particular group 4 
considered. Finally, perceived locus of con- 
trol may differentially affect the criteria that 
the individual applies to a situation in de- 
termining whether control can be maintained 
and if the situation can be labeled as crowded. 


Stimulus Overload Explanation 


». 


The stimulus overload formulation pre- 
dicts that a setting is evaluated as crowded 
when an individual is overwhelmed by the 
presence of others or when physical condi- 
tions in the environment increase the salience 
of social density. This is an explanation that 
relies primarily on interference at a percep- g 
tual or cognitive level, in contrast with the: 
behavioral restrictions discussed in the pre- 
vious section. The stimulus overload con- 
ceptions of crowding have been based, for 
the most part, on early urban sociological 
theories. Attempting to specify the defining 
qualities of a city, Wirth (1939) noted that 
urban areas can be characterized in terms 
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«of their size, their density, and the hetero- 
geneity of the encompassed populations. 
These characteristics often result in the ex- 
posure of city residents to excessive levels 
of physical and social stimulation. Simmel 
(1950) suggested that urban dwellers con- 
serve psychic energy by developing propor- 
tionally fewer and more superficial inter- 
personal relationships than do individuals 
residing in rural areas. Presumably this pro- 
tects a person from becoming overwhelmed 
by large numbers of social contacts. 

An elaboration of the early sociological 
papers was presented by Milgram (1970). 
He noted that the many stimuli that impinge 
on the individual in urban areas may over- 
whelm the person’s cognitive-processing ca- 
pabilities. Milgram believed that selective at- 
tention to only a small subset of the possible 
social and physical environmental stimuli is 
a strategy for dealing with this problem. 
Hence, individuals spend less time with each 
social contact and develop more superficial 
interpersonal relationships with others. Wohl- 
will (1966) suggested that the investigation 
of the physical environment is largely a 
study of stimulation, that optimal amounts 
must be defined and the concept of adapta- 
tion level (Helson, 1947) explored. Lee 
(1966) noted that individuals learn cogni- 
tive strategies that help them avoid or mod- 
ify incoming stimuli from the environment. 
Such strategies may allow a person to pre- 
vent overstimulation and perceptually to 
filter out the potentially stressful elements of 
a setting. The privacy-control mechanism 
postulated by Altman (1975) is also con- 
sistent with a stimulus overload interpreta- 
tion. Crowding results when a level of social 
stimulation occurs that is greater than that 
desired by the individual. However, judg- 
ments about what is considered to be the 
optimal level of social contact may vary 
from situation to situation. 

A number of studies have offered support 
for this position. Desor (1972) asked sub- 
jects to place miniature figurines in a model 
room until they felt that the room was just 
short of being crowded. The number of en- 
trances to the room and the presence or 
absence of partitions were varied. She found 
'that more figurines were placed in rooms 
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when there were partitions present and fewer 
external entrances. Additionally, the type of 
activity taking place affected judgments 
about the appropriate number of people in 
the room. This result is consistent with Alt- 
man’s model, since partitions and entrances 
regulate social contacts and different types 
of activities are associated with different 
desired levels of social contact. Valins and 
Baum (1974) found that students tended to 
rate corridor-style dormitories as тоге 
crowded than suite-style dormitories. They 
suggested that the former design is less ef- 
ficient at shielding residents from unwanted 
social stimulation, Baum, Reiss, and O’Hara 
(1974) found that when a confederate was 
placed in close proximity to the subjects, 
individuals were less likely to stop and drink 
from a water fountain if there was not a 
screening barrier. Presumably these barriers 
shielded the fountain user from unwanted 
visual intrusion or the possible violation of 
personal space. 

Saegert, Mackintosh, and West (1975) 
found that density was most likely to pro- 
duce psychological effects under three con- 
ditions: (a) when an individual was required 
to scan and interact in a setting, (b) when 
density was created by increasing the num- 
ber of people rather than by varying the 
room size, and (c) when there were a large 
number of people present in a setting. This 
suggests that the salience of social density 
may be an important part of this effect (cf. 
Loo, 1973; Rapoport, 1975) and that vari- 
ous activities, social conditions, and tasks 
focus attention on or away from physical 
density. Saegert (1973) reported that pa- 
trons in a Manhattan department store were 
less able to recall details of the store in 
high- versus low-density conditions. This 
finding is explained by Milgram's (1970) 
proposal that urban dwellers filter out ir- 
relevant stimuli in the environment to pre- 
vent cognitive overload created by a com- 
plex and active surrounding. 

In summary, stimulus overload explana- 
tions of crowding have been based on socio- 
logical theories that primarily describe the 
city in terms of size, density, and the cul- 
tural heterogeneity of the encompassed pop- 
ulations. These characteristics combine to 
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overwhelm the urban dweller's perceptual 
processing of the environment. Urban resi- 
dents attempt to evoke strategies that help 
reduce the occurrence of this cognitive over- 
load. These cognitive and behavioral strate- 
gies reduce exposure to the source of the 
overload or reduce the impact of stimulus 
overload. Crowding, defined in this way, oc- 
curs when the person is unable to cope 
effectively with the perceptual or cognitive 
interference created by density. An impor- 
tant component of this model is the notion 
that optimal levels of stimulation are situa- 
tionally determined by the individual. The 
perception of crowding is greatest when the 
level of stimulation incurred is beyond the 
level desired and the person is unable to 
effectively eliminate or reduce such overstim- 
ulation. The stimulus overload position is 
primarily concerned with interference and 
disruption of perceptual or cognitive in- 
formation processing. 


Crowding and Personal Control Over the 
Environment 


The thesis of the present review is that the 
theoretical explanations of crowding dis- 
cussed above all describe either a lack or 
loss of control over the environment, Den- 
sity-related conditions functionally reduce 
the individual's ability to maintain situa- 
tional control. Second, the perception of 
crowding is an evaluation in response to 
this lack or loss of control when density is 
a salient and viable cause of such absence 
of control. The behavioral constraint/social 
interference explanation posits that an en- 
vironment will be rated as crowded when 
other people in the setting inhibit or block 
goals and behaviors. Similarly, the stimulus 
overload conceptualization posits that the 
perception of crowding is related to the 
inability of the individual to control the 
level of social stimulation. 

Personal control is not a simple psycho- 
logical variable. Rather, it is a complex 
composite of different concepts that are ap- 
plicable to the present discussion. These 
components of control noticeably crosscut 
the theoretical discussions of crowding that 

were detailed previously. Averill (1973) has 
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distinguished three primary categories of con- ' 


trol: behavioral, cognitive, and decisional. 


Behavioral Control 


Averill (1973) defined behavioral control 
in terms of two components: regulated 
administration and stimulus modifiability. 
Regulated administration specifies who ad- 
ministers the noxious stimuli (ће. self-ad- 


ministered by the subject or under external «ç 


control). Stimulus modifiability refers to how 
and when the stimulus is encountered. In 
the experimental research on personal con- 
trol, stimulus modifiability has involved pre- 
vention of noxious stimuli, premature termi- 
nation of the stimulation, or direct modifica- 
tion of the form of the stimulation. Johnson 
(1974) similarly defined two components of 
behavioral control: behavioral selection and 
outcome effectance. Behavioral selection re- 
fers to the presence of viable behavioral 
options that can be selected by the individ- 
ual. Johnson noted that the more diverse 
these options appear, the more personal con- 
trol is perceived (cf. Steiner, 1970). Out- 
come effectance refers to the link between 
initiated behaviors and the final attainment 
of outcomes or goals. Established relation- 
ships between specific behaviors and the at- 
tainment of various outcomes increase per- 
ceived control by structuring the situation 
and adding predictability (Lefcourt, 1972). 
In sum, then, behavioral control involves the 
ability to choose actions that deal effectively 
with aversive stimuli and to attain desired 
outcomes. It also involves coping behaviors 
that deal with aversive qualities of the en- 
vironment and aid in the successful pursuit 
of goals in a specific setting. 

In the context of the preceding crowding 
explanations, behavioral control is applicable 
in a number of ways. First, the behavioral 
constraint or social 
deals with a blockage or inhibition of the 
behavior-goal relationship. This explanation 
posits that social density often disrupts be- 
haviors or makes certain outcomes unattain- 
able and that this interference leads to the 
perception of crowding. Mandler (1964) sug- 


gested that the interruption of organized be-, 


havioral patterns produces arousal that is 
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f evaluated as positive or negative according 
to the context of the situation (cf. Schachter 
& Singer, 1962). Interference that inhibits 
the relationship between behaviors and out- 
comes is often disruptive, since it blocks 
situationally appropriate activities. Hence, 
this arousal may lead to a negative evalua- 
tion of the source of the disruption. When 
density interferes with goal-directed behav- 
jors, a negative affective attribution of 
crowding may result (Sundstrom, 1975; 
Wicker et al, 1976). Similarly, density is 
not related to negative affect when it does 
not cause the blocking of goals or the in- 
hibition of behaviors. 

Research has provided support for this 
position. Sundstrom (1975) found that self- 
reported stress and irritation were not re- 
lated to the level of density when the block- 
age of behaviors and goals was independent 
of density. Wicker et al. (1976) found that 
there was more perceived crowding in a 
lower density setting as compared with a 
higher density setting when behavioral inter- 
ference occurred in the former but not in the 
latter. Hence, it is apparent that behavioral 
interference caused by other people in the 
setting is a critical factor in determining 
whether the situation will be evaluated as 
crowded. Crowding only occurs when ante- 
cedent density is a source of the disruption. 

The ability to exercise direct behavioral 
control over an aversive stimulus also re- 
duces negative affective responses. Corah and 
Boffa (1970) found that control over the 
termination of a stressor (noise) decreased 
the subjectively rated aversiveness of the 
stimulus and reduced physiological indices 
of arousal (see also, Glass et al, 1973). 
Similarly, Glass and Singer (1972) found 
that control over the onset, termination, or 
predictability of noise reduced negative after- 
effects on postexperimental tasks. In this 
series of studies, subjects perceived that they 
had control over onset or termination of the 
stimulus, although no actual control was 
ever exercised. Other studies have shown 
lower indices of autonomic arousal for sub- 
jects with perceived control over aversive 
stimuli (e.g., shock) as compared with those 
without control (e.g., Geer, Davison, & Gat- 
chel, 1970; Mahler, 1973; Starke, 1973). 
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Sherrod (1974) performed a partial repli- 
cation of the Glass and Singer studies using 
density as the environmental stressor. He 
found that compared with the condition of 
no perceived control, perceived control over 
the experience of density resulted in a 
smaller decrement in performance on a post- 
experimental task. Control in this study in- 
volved the freedom to leave a dense setting, 
although actual control was never exercised. 
Clearly, perceived control reduces both neg- 
ative perceptions of a stimulus and after- 
effects that may result. 

In some situations, however, the individual 
may be unwilling to assume control when it 
is possible. Rodin (1976) grouped subjects 
in a laboratory study on the basis of whether 
they lived in relatively high- or low-density 
housing. Subjects were run through a series 
of experimental tasks in which they either 
failed or succeeded. She found, in general, 
that children living in dense housing were 
less willing to modify the operant task in 
which they were engaged (i.e., to change 
reinforcement contingencies by pressing a 
button). The same children were also more 
likely to fail on a solvable experimental 
problem if they had experienced initial fail- 
ure on an unsolvable problem. Rodin sug- 
gested that children living in dense housing 
may learn that controlling one’s environ- 
ment is difficult. This lack of behavioral con- 
trol was reaffirmed by outcomes on the ini- 
tial task. 

Coping also provides an alternate method 
for maintaining behavioral control over an 
aversive stimulus. When the ability to di- 
rectly modify the stimulus or to determine 
when it will be encountered is uncertain, 
the individual may either implement be- 
haviors that reduce the impact of the stressor 
or choose to avoid it altogether. The stim- 
ulus overload literature suggests that with- 
drawal is an obvious strategy for coping with 
excessive stimulation from high-density en- 
vironments. Milgram (1970) proposed that 
social withdrawal is a response to cognitive 
overload created by the presence of large 
numbers of people. Similarly, the behavioral 
constraint/social interference explanation of 
crowding recognizes that situational escape 
and avoidance of social interactions are ways 
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of reestablishing control that is threatened 
in a dense setting (Altman, 1975; Schmidt, 
in press; Stokols, 1972b). 

Research that has looked at coping and 
crowding has defined withdrawal in one of 
two ways. The first is in terms of passive 
avoidance of social interactions. Tucker and 
Friedman (1972) observed naturally occur- 
ring groups at three college locations with 
surrounding city areas that varied in popula- 
tion density, They found an inverse rela- 
tionship between group size and city density. 
They suggested that the establishment of 
fewer interpersonal contacts may be a strat- 
egy for dealing with higher levels of den- 
sity, Loo (1972) studied groups of young 
children at play. She noted that as density 
increased, both the number of social inter- 
actions and the incidence of aggressive be- 
havior decreased. Further analyzing these 
data, she found that while there were no 
significant differences for girls, boys tended 
to be more aggressive in the low-density con- 
ditions. Hutt and Vaisey (1966) observed 
that groups of normal children at play in- 
teract less in high-density situations and 
more in low density conditions. Sundstrom 
(1975) observed less self-disclosure in an 
experimental setting in which density was 
high. He observed significantly less eye con- 
tact, gesturing, and positive head nodding in 
manipulated goal-blocking conditions than in 
conditions in which goal blocking did not 
occur. In this study the observed behavioral 
(passive) withdrawal was unrelated to room 
density. Subjects withdrew from interactions 
in response to goals that were blocked. 
Again, Sundstrom manipulated goal block- 
ing and density (room size) as independent 
factors. Hence, these results offer evidence 
that the subjective impact of density, rather 
than objective physical conditions, is impor- 
tant in determining which environmental 
factors lead to coping. 

A second definition of withdrawal involves 
the active avoidance of social contacts. Alt- 
man’s (1975) boundary-control model of 
privacy postulates that the individual may 
actively seek or thwart social interactions 
to maintain an appropriate level of social 
stimulation. Schmidt et al. (1979) found 
that the ability to attain the desired degree 
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of privacy and the freedom to get away from * 


the residence were important predictors of 
the self-reported perception of crowding in 
the residence. These clearly are factors that 
involve withdrawal or escape írom a re- 
stricted setting. Kutner (1973) manipulated 
group size, interpersonal distance, and visual 
body exposure. He found no relationship be- 
tween any of these factors and self-rated 
anxiety; however, behaviors that protected 
the subject from the visual scrutiny of others 
increased over time for the high-visual-ex- 
posure groups. Kutner suggested that density 
did not increase anxiety because subjects 
were able to implement behaviors that elimi- 
nated the threat. He postulated that when 
these behaviors become ineffective in shield- 
ing the individual from surveillance by 
others, anxiety may be increased by the num- 
ber and proximity of people in the room; 
that is, density is expected to intensitfy the 
problem of visual exposure. 

A final aspect of behavioral control and 
coping is anticipatory responding to density. 
Crowding is expected to occur when the 
individual is unable to take action to cope 
with an impending high-density situation 
following prior warning of the condition. 
Stokols (1976) noted that anticipation of 
and preparation for conditions that are po- 
tentially stressful are important components 
of the behavioral constraint explanation. The 
ability to implement learned responses based 
on prior experiences with an environmental 
condition may help mitigate the potentially 
negative effects prior to the actual onset of 
a stressor (Appley & Trumbull, 1967; Mc- 
Grath, 1970). Similarly, Lazarus (1966) has 
noted that the psychological stress that is 
experienced by an individual is inversely 
proportional to the person's anticipated ca- 
pability to deal with the stressor. 

Although the research on anticipatory re- 
sponses and crowding is sparse, some inter- 
esting results have been obtained. Schopler 
and Walton (Note 1) manipulated antici- 
pated interference by leading subjects to 
expect tasks that were either structured or 
unstructured. They reasoned that group 


structure would act as a regulatory mecha- , 


nism of interactions between group members 
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, "involved in performing a task. Hence, the 
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behavior-outcome relationship (outcome ef- 
fectance) would be clearly specified. The 
lack of an established task and group struc- 
ture was expected to be related to less situ- 
ational control and a greater potential for 
interference with other group members, since 
rules for controlling group interactions and 
for completing tasks were unspecified. 
Schopler and Walton found a marginally 
significant trend: Subjects who believed that 
the task situation would be unstructured 
felt more crowded than subjects expecting 
a structured group. It was suggested that 
the degree of structure affects anticipated 
control over the situation. 

Baum and Greenberg (1975) conducted a 
study in which subjects believed that they 
would be interacting in either 4- or 10-per- 
sons groups in experimental rooms of the 
same size. Although all conditions actually 
involved groups composed of a subject and 
two confederates, each subject believed that 
the entire 4- or 10-person group would meet. 
It was found that subjects anticipating 10- 
person groups perceived the room to be more 
crowded than did those anticipating 4-person 
groups. Individuals expecting large groups 
tended to select corner seats in the room, 
positions from which they could maintain 
the greatest control over spatial intrusion. 
Additionally, subjects in the 10-person con- 
dition tended to look at the confederates 
less often and rated each successive confed- 
erate entering the room progressively more 
negatively. 

In summary, the evaluation of crowding 
is related to a loss of behavioral control when 
physical density creates a condition in which 
behaviors are interrupted and the attainment 
of goals is blocked. Research has been pre- 
sented, however, that indicates that the ex- 
pectation of control over an aversive stimu- 
lus, even in the absence of objective control, 
is sufficient to forestall negative reactions to 
the environment. Finally, evidence has been 
presented that suggests that coping responses 
initiated prior or subsequent to the onset of 
a dense condition in which behavioral inter- 
ference and goal blocking are likely to occur 
can, in some instances, mitigate negative 
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responses to environmental conditions. Inef- 
fective coping with conditions related to 
density, on the other hand, is expected to 
increase evaluations of a setting as crowded. 


Cognitive Control 


Behavioral control refers to responses that 
either produce direct action on the environ- 
ment or allow the individual to actively 
avoid negative conditions, whereas cognitive 
control is concerned with the way events or 
conditions are interpreted by the individual. 
In this regard, Averill (1973) has defined 
two elements: information gain and ap- 
praisal. Information gain refers to the pre- 
dictability of a stressor and the anticipation 
of an aversive event. However, in contrast 
with the anticipatory responses discussed in 
the previous section, information gain does 
not involve any direct action on the en- 
vironment. Rather, it is concerned with cog- 
nitive preparation for an event. Appraisal, 
the second component, involves the interpre- 
tation and evaluation of events. Johnson 
(1974) defined the term outcome realization 
as the evaluation and interpretation of out- 
comes, a concept similar to appraisal. 

A review of Averil (1973) shows that 
information concerning the occurrence of an 
aversive stimulus is related in a complex way 
to stress. He cited animal studies suggesting 
that prior warning often increases stress (e.g., 
overt behavioral disturbances), which can 
be reduced by using available avoidance re- 
sponses. Other studies have demonstrated a 
clear preference for signaled versus unsig- 
naled aversive stimulation, even when escape 
or avoidance was not possible (e.g., Badia & 
Culbertson, 1972; Perkins, Seymann, Levis, 
& Spencer, 1966). Studies with human sub- 
jects have also shown a complicated pattern 
of results. Geer and Maisel (1972) found 
that subjects who could predict the occur- 
rence of a stressor exhibited greater auto- 
nomic responding than did control subjects. 
Epstein (1973) suggested that although prior 
warning of an aversive stimulus may initially 
increase stress, the subsequent generation of 
accurate expectancies may facilitate habitu- 
ation, ultimately reducing stress. Monat, 
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Averill, and Lazarus (1972) found that sub- 
jects preferred to know when an electric 
shock would occur and exhibited less antici- 
patory responding when this information was 
not provided. Similarly, Starke (1973) found 
that the predictable condition led to lower 
galvanic skin response rates and less sub- 
jectively reported discomfort than the un- 
predictable condition. Staub and Kellett 
(1972) observed that subjects receiving in- 
formation concerning the objective and sub- 
jective qualities of an aversive stimulus 
(electric shock) tolerated a more intense 
level of stimulation before rating it as pain- 
ful than the level tolerated by subjects pro- 
vided with only partial or no information. 
Presumably, this information allows the in- 
dividual to develop accurate expectancies 
concerning objective danger and subjective 
sensation. Hence, information gain functions 
in two ways. First, it signals the onset of a 
stressor so that possible cognitive preparation 
can be initiated. Second, it allows the in- 
dividual to generate accurate expectancies 
concerning the stimulus, resulting in eventual 
habituation to arousal. Lack of appropriate 
avoidance responses or the generation of in- 
accurate expectancies may increase the per- 
ceived stressfulness of an environmental stim- 
ulus (cf. Averill, 1973; Lazarus, 1966). 

The importance of information gain is 
apparent from the literature on human 
crowding. The role of norms in providing a 
basis for situational expectancies and pre- 
dictability was discussed in a previous sec- 
tion. The anticipation of inappropriate spa- 
tial violations, the inability to control spatial 
boundaries, or the blocking of situationally 
appropriate behaviors may be related to an 
increase in perceived crowding. Schopler and 
Walton (Note 1) found a marginally sig- 
nificant trend in which subjects anticipating 
unstructured groups felt more crowded than 
groups with well-specified structures. Pre- 
sumably, unstructured situations increase the 
potential for social interference. Baum and 
Greenberg (1975) observed that when sub- 
jects anticipated interactions in a dense set- 
ting, perceived crowding increased and an- 
ticipatory responding was enacted. The 
behaviors and perceptions observed in these 
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studies were clearly based on the subject's. 


assessment of impending conditions, since 
expected density levels were never experi- 
enced (cf. Lazarus, 1966; Stokols, 1976). 
Langer and Saegert (1977) reported that 
subjects felt less uncomfortable and crowded 
when they were given information about the 
potential arousal and anxiety they would 
experience before entering a dense super- 
market. 


A second effect of information gain is to „| 


focus the attention of subjects on the im- 
pending occurrence of a stimulus. Averill 
(1973) has suggested that stimulus overload 
created by increased vigilance may increase 
the level of stress experienced prior to the 
onset of an aversive stimulus. Monat et al. 
(1972) found that information concerning 
the time of occurrence of a stressor (indi- 
cated by external cues) resulted in increased 
anticipatory stress for human subjects. Geer 
and Maisel (1972) also found increased 
physiological responses to impending stres- 
sors by subjects who could predict onset. 
The results of the Baum and Greenberg 
(1975) and Schopler and Walton (Note 1) 
studies can also be interpreted as the conse- 
quence of prior warning that effectively 


focused the subject's attention on the ex- , 


pected qualities of a high-density or poten- 
tially uncontrollable situation. This is espe- 
cially true of the Schopler and Walton study, 
Since subjects in the unstructured condition 
were initially unaware of what specific con- 
ditions they would be facing. 

Stokols et al. (1973) have suggested that 
certain personal and social factors sensitize 
the individual to actual or potential spatial 
constraints in a setting. The person may be 
predisposed to attend to various environ- 
mental cues that serve to forewarn him or 
her of possible restrictions created by social 
density. Increased sensitivity to the environ- 


ment is expected to increase the likelihood ; 
that a setting will be evaluated as crowded. * 


For example, Schopler and Walton (Note 1) 
found that subjects with an external locus 
of control were more likely to evaluate à 
setting as crowded than were internals. Ex- 


ternals may be more likely to attend to cues. 


that signal a loss of control over the en- 
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rvironment. Schmidt et al. (1976) found that 

various subcultural groups attended to dif- 
ferent factors in their evaluation of crowding. 
Schmidt (in press) observed that subjects 
reporting that spatial factors were important 
considerations in the selection of their cur- 
rent residences were more likely to evaluate 
the setting as crowded. He postulated that 
this may reflect an attentional focus on space 
and spatial restrictions that subsequently 
occur. Rapoport (1975) suggested that 
crowding research should focus on density 
that functionally affects the individual rather 
than on the gross physical measurements 
that are typically taken. Clearly, predisposi- 
tions that focus the individual's attention 
on the external environment increase this 
functional density. 

Milgram (1970) proposed that experiences 
in large urban areas may lead to expecta- 
tions that evoke long-term cognitive strate- 
gies for dealing with social overstimulation 
and the resultant cognitive overload. En- 
vironmental cues that forewarn the individ- 
ual of potential stimulus overload from so- 
cial sources may elicit these strategies. For 
example, Saegert's (1973) department store 
study (cited previously) offers support for 
this idea; subjects attended to less detail in 
the setting. Baum and Greenberg (1975) 
found that subjects anticipating a dense situ- 
ation tended to look at others in the setting 
less often than those anticipating a relatively 
low-density setting. Sundstrom (1975) also 
found that goal blocking, a consequence of 
density in other research, led to less atten- 
tion by subjects to others present in the 
room. These behaviors allow the individual 
to tune out potentially stressful aspects of 
the setting. 

The human crowding literature that has 
been reviewed demonstrates that information 
gain given as prior warning of an impend- 
ing dense situation serves to increase the 


\ likelihood that a setting will be evaluated as 


crowded. The research has suggested that 
habituation to the arousal resulting from the 
expectation of high density may be caused 
by the enactment of coping strategies that 
perceptually reduce the perceived impact of 


" "density by focusing the individual's atten- 
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tion away from the immediate environment. 
Conditions or predispositions that focus the 
individual's attention on the implications of 
density are expected to increase the likeli- 
hood of crowding evaluations (cf. Averill, 
1973; Baum & Greenberg, 1975; Schmidt, 
in press). Additionally, anticipation of a 
dense situation was shown to lead to avoid- 
ance behaviors that demonstrated an active 
attempt to seek protection from anticipated 
problems, These behaviors suggest that much 
of the contradictory experimental evidence 
relating predictability, autonomic arousal, 
and subjective stress can be reconciled by 
considering information gain in conjunction 
with the second component of cognitive con- 
trol, appraisal. 

Appraisal refers to the way that individ- 
uals evaluate and interpret events and is re- 
lated to the subsequent affective responses 
that can be expected. This aspect of control 
is important, since crowding is typically de- 
fined as a negative affective reaction to con- 
ditions created by density. When actual or 
potential goal blocking can be attributed to 
the density of a setting, one would expect 
the individual to evaluate the situation as 
crowded (Baum & Greenberg, 1975; Stokols 
et al., 1973; Sundstrom, 1975). Lazarus 
(1966) noted that stress occurs when the 
individual anticipates an inability to cope 
with an impending situation. If the person 
judges that interference or overload created 
by density cannot be effectively dealt with, 
the perception of crowding is increased (cf. 
Kutner, 1973). Again, one is dealing with 
a cognitive evaluation that is separate from 
any behavioral responses. 

Stokols (1976) noted that the critical fac- 
tor in psychological stress is the individual’s 
belief that he or she is unable to exert con- 
trol over a situation. Crowding implies that 
the individual perceives an inability to deal 
with conditions created by density and hence 
anticipates or experiences goal blockage, spa- 
tial restrictions, or cognitive overload. Cog- 
nitive or behavioral coping responses are 
ways of preparing for and dealing with these 
conditions. Appraisal may, therefore, involve 
a series of environmental assessments, re- 
sponses, and  reassessments (cf. Stokols, 
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1972b, 1976). However, cultural and sex- 
specific norms have been shown to shape 
these assessments; that is, different behav- 
ioral and spatial standards are used to ap- 
praise impending or ongoing situations (An- 
derson, 1972; Draper, 1973; Epstein & Kar- 
lin, 1975; Freedman et al, 1972; Hall, 
1966; Schmidt et al, 1976; Watsen & 
Graves, 1966). 

A number of studies have demonstrated 
the effect of appraisal on environmental eval- 
uations. Schopler and Walton (Note 1) 
found that subjects expecting to interact in 
unstructured groups reported more perceived 
crowding than subjects expecting a well- 
specifed group structure. They suggested 
that this difference may be attributable to 
the subject's appraisal that the unstructured 
situation would be less controllable. Sherrod 
(1974) concluded that the stress of inter- 
acting in a high-density situation was re- 
duced by simply making subjects believe 
that they were free to leave. Desor (1972) 
found that entrances and partitions in a 
model room were viewed as ways of main- 
taining control over interactions (cf. Altman, 
1975) and affected the perception of crowd- 
ing. Partitions and entrances should lead to 
the appraisal that a setting is controllable. 

In summary, appraisal is a cognitive eval- 
uation that is affected by the individual's an- 
ticipation of his or her ability to deal with 
an impending situation. The perception of 
ability to deal with an impending noxious 
stimulus can reduced both stress and ob- 
jective ratings of the aversiveness of the 
negative affective response to a setting when 
the individual believes that density-related 
conditions will create an uncontrollable sit- 
uation. 


Decisional Control 


The final element, decisional control, is 
defined by both Averill (1973) and Johnson 
(1974) as choice in the selection of specific 
outcomes or goals. It is the primary compo- 
nent of control in Brehm’s (1966) theory of 
psychological reactance. Our discussion of 
this element of control is brief, since in the 
literature on crowding, a limitation of choice 
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or freedom is tantamount to a loss of both: 
behavioral and cognitive control. 

From the perspective of the behavioral 
constraint explanation, crowding is believed 
to occur when the individual is unable to 
attain desired outcomes or goals in a setting 
because density has functionally blocked the 
behavior-goal relationship. In this way, both 
behaviors and goals are restricted. Goal 
blocking and behavioral interference have 


been related to an increased perception of « 


crowding when density is a viable cause 
(Proshansky et al, 1970; Saegert, 1973; 
Schmidt, in press; Schmidt et al, 1979; 
Wicker et al., 1976). These restrictions func- 
tionally limit the options available to an 
individual in a setting, reducing decisional 
control and increasing negative affect and 
psychological reactance. 


Brehm (1966) and Wortman and Brehm | 


(1975) noted that the importance of the 
outcomes that were restricted was a central 
factor in determining the magnitude of the 
reactance that was evoked. The blocking of 
important or primary behavioral alterna- 
tives or goal options is expected to produce 
more reactance than options that are of 
little personal importance. A similar logic 
has been proposed by Stokols (1976) in re- 
lation to the evaluation of crowding. He 
suggested that perceived crowding tends to 
be of greater intensity in primary versus 
secondary environments. Primary environ- 
ments include those settings in which the 
individual spends a great deal of time (e.g. 
work or residential areas), is engaged in 
relatively long-term interactions with others, 
and is involved in a number of personally 
important activities. Secondary environments 
are locations in which interactions with 
others are relatively transitory, anonymous, 
and of generally less consequence (e.g., pub- 
lic transportation, shopping facilities, etc.). 
Stokols suggested that goal blocking or be- 


havioral interference in primary environ- ‘ 


ments may involve the limitations of im- 
portant goals and activities, Hence, the im- 
Pact of dense or related conditions is more 
critical, and the perception of crowding 
should be of a greater magnitude. . 

Altman’s (1975) boundary-control model 


< 
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¿of privacy suggests that a loss of decisional 
control is related to the evaluation of crowd- 
ing. This theoretical perspective posits that 
the individual in a social situation decides 
on an optimal level of social interaction or 
stimulation. This desired level is either at- 
tained or blocked, depending on conditions 
in the setting (e.g., density, spatial restric- 
tion, etc.). If situational conditions force a 
level of social interaction or stimulation be- 
yond that desired by the individual, an eval- 
uation of crowding is expected to result. 

Decisional control refers to the ability of 
the individual to select outcomes or goals 
in a social situation. However, in the crowd- 
ing literature, decisional control is tanta- 
mount to behavioral control, since restricted 
behaviors and blocked goals functionally 
limit the actiyities that are possible in an en- 
vironment. Similarly, an appraisal that a 
setting is uncontrollable or prior warning of 
impending conditions may also indicate a 
reduction in decisional control. The increased 
likelihood that a setting will be evaluated 
as crowded is expected to occur if density is 
a viable cause of these restrictions of be- 
havior and goal alternatives. 


Personal Control and the Attribution of 
Crowding 


Although the literature on personal con- 
trol and the literature on human crowding 
fit together reasonably well, it is clear that 
the evaluation of crowding within a specific 
setting may be the result of deficiencies ina 
number of control components. The results 
from a number of the crowding studies are 
explicable by using one or more of these 
elements. Certainly the consequence of living 
and interacting in high-density environments 
is likely to be a loss of some degree of con- 
trol over behaviors and the pursuit of goals. 
It is important at this point to examine 
the specific cognitive processes that are in- 
volved when an individual experiences or 
anticipates a loss of control and evaluates a 
setting as crowded. 

The literature reviewed previously indi- 
cates that a lack of behavioral control or the 
appraisal that a situation is uncontrollable 


"is related to increased levels of self-reported 
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stress and indices of physiological arousal 
(e.g, Corah & Boffa, 1970; Geer et al, 
1970; Glass & Singer, 1972; Staub, Tursky, 
& Schwartz, 1971). Similarly, the human 
crowding literature has suggested that en- 
vironmental conditions that reduce an in- 
dividual's control (ie., goal blocking, be- 
havioral interference, and stimulus overload) 
also lead to more self-reported stress and 
arousal (e.g., Saegert, 1973, 1975; Stokols 
et al., 1973; Sundstrom, 1975; Valins & 
Baum, 1974). A subsequent evaluation of 
the environment as crowded is expected to 
result when density levels are sufficiently 
high; this evaluation may be tied to loss 
of control. 

A number of studies have demonstrated an 
association between density and arousal, as 
measured by skin conductance (Aiello, Ep- 
stein, & Karlin, 1975), heart rate and blood 
pressure (D’Atri, 1975; Evans, 1975), and 
palmar sweat (Saegert, 1975). However, it is 
unlikely that the relationship between den- 
sity and arousal is as simple as these results 
suggest. In these studies, conditions indica- 
tive of some social impact are often con- 
founded with density. It is unclear whether 
density causes arousal or whether arousal 
is due to a concomitant consequence. Cer- 
tainly the behavioral constraint and cogni- 
tive overload explanations suggest the latter 
possibility. Sundstrom (1975) observed that 
self-reported stress and irritation were not 
related to density when goal blocking and 
behavioral interference were manipulated in- 
dependently of this density. These latter 
conditions were also related to increased 
arousal in previous studies (e.g., Mandler, 
1964; Wortman & Brehm, 1975). 

Findings such as these suggest that physi- 
cal density may have an indirect effect on 
arousal. Previous articles have also indicated 
that anticipated problems precipitated by 
density may lead to the appraisal that a sit- 
uation is uncontrollable, thus creating stress 
and arousal. However, even in this case 
anticipated goal blocking and behavioral in- 
terference are the critical factors resulting 
in the perception of crowding. 

Arousal and stress created by a loss of 
control over the environment are similar to 
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what Brehm (1966) has called psychological 
reactance. Goal blocking, behavioral inter- 
ference, and stimulus overload are often 
density-related consequences that decrease 
an individual's situational control and hence 
are expected to elicit some degree of reac- 
tance. We predict a lower level of arousal 
and stress associated with a relatively con- 
trollable setting and greater arousal and neg- 
ative affect related to uncontrollable situa- 
tions. Additionally, we expect the actual or 
anticipated ability of the individual to cope 
with or to eliminate the source that threatens 
personal control to have a direct relationship 
to the magnitude and duration of the re- 
actance that is experienced (cf. Brehm, 
1966). Although reactance theory posits ef- 
fects that may be attributed to a loss of 
control over the environment, we expect 
similar effects for situations in which one 
finds a general lack of control as well. Pre- 
sumably, in these settings a lack of control 
manifests itself in the inability to deal with 
a situation or to pursue activities and goals. 
Hence, lack of control in a setting should 
produce the same arousal and stress effects 
that have been found in situations in which 
conditions have precipitated a loss of en- 
vironmental control. The previous discussions 
on cognitive control suggest that the individ- 
ual may manifest initial stress if lack of 
control is expected and the situation is thus 
appraised in this way. For the sake of brevity 
in this discussion, we assume that both a 
lack and a loss of control are similar con- 
ditions; that is, both create increased levels 
of arousal and can be produced by situa- 
tional density. 

Social psychological theory and research 
suggest that the link between arousal and 
the evaluation of a setting as crowded may 
involve a relatively complex cognitive pro- 
cess. Current research indicates that when 
arousal is experienced by the individual, it 
is subsequently labeled as consistent with 
the context of the situation. Hence, the in- 
dividual must scan the external environment 
and attribute arousal to a salient and viable 
Source that is present. Schachter and Singer 
(1962) conducted a critical investigation of 
this process. In their initial study, they 
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aroused subjects by injecting them with the 
drug epinephrine. Subjects were then in- 
formed that the drug would cause them to 
feel aroused (informed condition), would 
have effects other than arousal (misinformed 
condition), or would have no noticeable ef- 
fects (ignorant condition). Schachter and 
Singer found that subjects were more likely 
to report emotional states that were con- 
sistent with external environmental cues 
(euphoria or anger) in the misinformed and, 
ignorant conditions than in the informed 
and placebo (no drug) conditions. They sug- 
gested that in the conditions in which sub- 
jects did not anticipate arousal, it was 
necessary for subjects to locate a source in 
the environment that could be used to cog- 
nitively label the internal state. In the in- 
formed and placebo conditions, labels were 
either provided by the experimenter or were 
unneeded when no physiological arousal was 
experienced. 

While the Schachter and Singer results 
deal with the cognitive labeling of drug-in- 
duced arousal, similar results have been 
found for situationally induced arousal. 
These latter studies are more important for 
the present discussion of crowding, since 
arousal associated with density and crowding 
is presumably induced by events occurring in 
the external environment, The studies that 
best illustrate the attribution of situationally 
induced arousal are best described as mis- 
attribution studies, In these investigations, 
arousal is induced by making subjects expect | 
the impending occurrence of an environmental 
Stressor (e.g., electric shock). Subjects are 
led to believe that the real cause of arousal 
is something else in the experimental set- 
ting. Behavioral or self-report measures are 
then taken to indicate how arousal is labeled. 
Valins and Ray (1967) found that they 
could reduce phobias about snakes by con- 
vincing subjects that their arousal was due 
to impending electrical shocks rather than 
to snakes, Storms and Nisbett (1970) found 
that they could reduce insomnia by making 
subjects believe that their arousal was due to 
a pill (placebo) administered to them rather 
than to their personal problems. Finally, 
Ross, Rodin, and Zimbardo (1969) found 
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гаї subjects were more willing to work for 

monetary rewards than to escape from im- 
pending electrical shocks when they were 
convinced that their arousal was due to the 
presence of a loud noise rather than to the 
shocks. 

The idea that the evaluation of crowding 
may be an attributional response to arousal 
has been a relatively recent development in 
the literature. A number of studies have 
offered support for this notion. Freedman 
(1975) provided evidence suggesting that 
density acts to intensify positive or negative 
feelings according to the context of the 
situation. The environment provides infor- 
mation concerning the interpretation of 
arousal, and density effectively heightens the 
level of positive or negative affect. Gochman 
(1977; see also Keating, 1978; Kalb & Keat- 
ing, Note 2) conducted a series of studies 
that directly investigated this labeling pro- 
cess. She found that unfulfilled goals or dis- 
confirmed expectations were related to an 
evaluation of the setting as crowded when 
density was salient. She suggested that dis- 
confirmed expectations and unmet goals cre- 
ate a state of arousal (cf. Lazarus, 1966; 
Mandler, 1964), which is subsequently at- 
tributed or misattributed to the most salient 
environmental condition. 

The attributional explanation posits that 
arousal created by a loss of control may lead 
to an evaluation of crowding if social factors 
create some sort of behavioral or perceptual 
interference and situational density is a sali- 
ent environmental factor. An interesting test 
of this approach was provided by Worchel 
and Teddlie (1976). In their study room 
size (spatial density; Loo, 1973), close or 
far interaction distance, and the presence or 
absence of picture distractors on the walls 
were manipulated as independent variables. 
They found that the perception of crowding 
was greatest when density was high, when 
interaction distance was close, and when pic- 
ture distractors were not present. When in- 
teraction distance was close and distractions 
were present, the room was rated as rela- 
tively less crowded. Worchel and Teddlie 
suggested that close interaction distances 
caused spatial violations that were associated 


“with increased levels of arousal (Felipe & 
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Sommer, 1966; Hall, 1966; Sommer, 1969). 
When the subject searched the setting for a 
cause of the arousal, attention was drawn to 
either density or to the visual distractors. 
Hence, when attention was focused away 
from room density by the picture distractors, 
arousal was not attributed to density and 
an evaluation of crowding was less likely. 
However, when density was salient, the set- 
ting was evaluated as significantly more 
crowded. Langer and Saegert (1977) found 
that subjects performing an experimental 
task in a dense supermarket reported feeling 
less crowded and more comfortable when 
they were told that they might become 
aroused and anxious than when they were 
uninformed. Hence, information gain con- 
cerning potential internal states may par- 
tially alleviate negative affect attributable 
to dense settings. 

Research investigating the attribution ap- 
proach to crowding has suggested that the 
evaluation of crowding may either be a 
veridica or a nonveridical response to the 
environment; that is, density may create 
conditions that lead to increased arousal and 
a subsequent evaluation of crowding, or 
arousal may be misattributed to density 
when it is merely a salient, although non- 
causative, factor in the setting (Keating, 
1978). In the present discussion, we are 
most interested in the former case (when 
crowding is a veridical response based on 
the impact of density), since this has been 
the primary theoretical and empirical focus 
in the past. 


Control-Attribution Model of 
Human Crowding 


Figure 1 is an illustration of the suggested 
process. Previous research and theory has 
described crowding in terms that imply a 
loss of control over the environment. Al- 
though absolute density has been shown 
to have an inconsistent relationship to the 
cognitive evaluation of crowding, functional 
density in its impact on the individual's 
activities and behaviors appears to be im- 
portant (Rapoport, 1975). Ample evidence 
has been presented that indicates that loss 
of control is associated with physiological 
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Figure 1. À control-attribution model of human crowding. 


and self-reported arousal, especially when 
the situation creates a decrement in behav- 
ioral control or the appraisal that a situation 
is uncontrollable, Similarly, goal blocking, 
behavioral interference, and stimulus over- 
load have been viewed as sources of arousal 
or stress in previous articles. This arousal 
state that is produced by a lack of or loss 
of control over the environment is а neces- 
sary antecedent to the subsequent attribu- 
tional process. 

The attribution segment of the diagram 
details the cognitive events that are thought 
to take place when arousal is experienced 
and a cognitive label is needed. The in- 
dividual scans the environment in an attempt 
to find a viable source of the internal physio- 
logical state. However, increasing levels of 
arousal lead to a restriction of the number 
of environmental cues that are considered 
and of the amount of information incorpo- 
rated in the individual's decisions and judg- 
ments (Easterbrook, 1959; Schroder, Driver, 
& Streufert, 1967). Hence, the salience of 
various environmental cues becomes impor- 
tant. Specifically, cues that are salient and 


are perceived as viable causes of the arousal 
state are selected because of the limited at- 
tentional focus maintained by the individual 
(cf. Worchel & Teddlie, 1976). These cues 
may then provide the cognitive label for 
the arousal that is experienced. In the present 
case, crowding is viewed as an attributional 
label when density is salient and has acted 
to functionally limit an individual's per- 
sonal control over a setting. This model 
clearly allows for either veridical or non- 
veridical labeling of arousal generated by 
density-related conditions, although the 
former condition is our primary interest in 
this article. j 
The present model makes some distinc- 
tions that are relevant to follow-up research 
in the area of human crowding. Figure 1 
suggests three distinct stages in the evalua- 
tion of crowding: the presence of environ- 
mental stimuli and objective environmental 
conditions, cognitive processing requiring an 
assessment of personal control and the sub- 
sequent attributional phase, and the final 
evaluation of crowding. Although it may | 
seem tenuous to classify behavioral con- 
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‘straint or cognitive overload as components 
of environmental stimuli, this is precisely 
the way these concepts have been operation- 
alized in the experimental research; they are 
objectively manipulated conditions. In this 
sense, previous views of crowding have in- 
volved a simple stimulus-response explana- 
tion for crowding. Objective environmental 
events or stimuli lead to the evaluation of 
crowding. The control-attribution approach 
includes these previous explanations along 
with indications of interceding cognitive pro- 
cesses, It is clear that an investigation of 
the nature of these processes, especially of 
the components of personal control, is a 
necessary step for a full understanding of 
crowding. It seems important to map condi- 
tions such as behavioral constraint and cog- 
nitive overload onto subjective control and 
to investigate individual-difference variables 
(e.g., demographic variables, stable person- 
ality dispositions, etc.) as mediators of this 
relationship. With this type of research it 
should be possible to make more accurate 
predictions concerning the evaluation of 
crowding than would be allowed by a simple 
stimulus-response model and to tie together 
both physical environmental stimuli and sub- 
sequent cognitive evaluations. 

Social psychological theory and the ex- 
perimental research on personal control sug- 
gest that the literature on human crowding is 
explicable by using established psychological 
principles. The application of these more 
general phenomena offers both a consolida- 
tion of diverse explanations of crowding and 
a logical thread tied to larger empirical bases. 
However, a clear need for more detailed 
study of the impact of physical and social 
conditions on behaviors and on subsequent 
evaluations such as crowding has been es- 
tablished. It is these additional studies that 
will help to map the relationships among ob- 
jective environmental conditions and the 
psychological effects they produce. 
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This article reviews research involving the use of videotape and film modeling 
in clinical and analogue settings. Most of the research has been on phobias, test 
anxiety, dental and medical stress, and interpersonal skills. Critical methodologi- 
cal issues and the relevance of this research to modeling theory are also dis- 
cussed. The need for improved methodology and the application of videotape 
and film modeling in more clinical populations is emphasized. 


Symbolic modeling in the form of films or 
videotapes was the logical outgrowth of the 
documented effects of live modeling in therapy 
(Bandura, 1969; Geer & Turtletaub, 1967; 
Sarason, 1968). It is likely that the cognitive 
processes identified as important in the effects 
of live modeling—such as verbal and imaginal 
coding and rehearsal (Bandura, 1977; Ban- 
dura, Blanchard, & Ritter, 1969)—can as well 
occur as a function of observing a filmed 
model. There is little reason to believe that 
the symbolic processes are particularly dif- 
ferent as a function of observing a live versus 
a filmed model (Bandura & Barab, 1973). 

Film and videotape modeling has several 
advantages over live modeling. The avail- 
ability of filmed and videotaped media pro- 
vides the opportunity to capture naturalistic 
modeling sequences that would be difficult to 
create in clinic settings. And of course the 
therapist has greater control over the com- 
position of the modeling scene because the 
film or videotape can be reconstructed until 
the most desirable scene is produced. These 
media also permit the convenient use of 
multiple models, repeated observations of the 
same model, reuse of the films or videotapes 
with other persons, and self-administered 
treatment sessions. In addition to these ad- 
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vantages, there is the benefit of efficiency. 
More clients can be treated, and there is less 
demand on the time the professional spends 
with each client. 

This article reviews research on the efficacy 
of videotape and film modeling, herein re- 
ferred to as symbolic modeling (SM), in the 
treatment of clinical and clinical-like (i.e., 
analogue) problems. One purpose of this 
article is to review studies that test the 
efficacy of SM by comparing either SM alone, 
or SM combined with other treatment com- 
ponents, with control conditions. Studies that 
compare SM with other treatment techniques 
or components also are reviewed. Another 
purpose is to identify procedural and con- 
ceptual issues relevant to SM research. Finally, 
attention is given to methodological considera- 
tions that are critical to eventually discerning 
the effects of SM in a variety of therapeutic 
contexts and with various clinical groups. 


Effects of Therapeutic Symbolic Modeling 


Table 1 summarizes the studies that have 
investigated therapeutic SM. Note that the 
studies are grouped according to the behavior 
under investigation, beginning with phobias 
and followed by test anxiety, dental and 
medical stress, interpersonal skills, and other 
clinical problems. The second column in 
Table 1 specifies whether the subjects were 
adults or children and the target behavior 
under investigation. In the third column, 
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results are reported that involved a no-treat- 
ment, attention, or placebo control. The 
treatment was either SM alone or SM plus 
one or more treatment components (e.g., in- 
structions or behavioral rehearsal). Column 4 
lists results involving a comparison treatment 
that was composed of one or more com- 
ponents. As with column 3, the SM condition 
involved either SM alone or SM plus one or 
more additional components. The last two 
columns of Table 1 contain information per- 
tinent to generalization and maintenance— 
whether or not these factors were assessed 
and, if so, whether the results were affirmative 
or negative. It was also deemed useful to 
specify the time interval between the treat- 
ment intervention and the assessment of 
maintenance. 

The results, as they are summarized in 
Table 1, pertain to between-groups differences, 
usually based on posttest differences or dif- 
ferences in change scores between groups. 
Also, it should be noted that in all studies 
the symbol greater than (>) is meant to 
indicate greater improvement in the clinical 
behavior. For example, the first study listed 
is by Bandura and Barab (1973). The table 
shows that SM was greater than an irrelevant 
film. This should be interpreted to mean that 
the SM group improved more than the control 
group, that is, the SM group showed greater 
fear reduction. An equal sign (=) indicates 
no differences between groups. 

It should be noted that in this portion of 
the article no attempt is made to qualify 
the results because of methodological con- 
siderations. Methodology issues are discussed 
in a later section. 


Phobias 


As can be seen in Table 1, research on 
phobias has been done with both adults and 
children; snake fear is the most common 
problem studied. The third column in the 
table reveals that SM, both with or without 
added components, had a positive treatment 
effect in comparison with the no-treatment 
and irrelevant-film control conditions. The 
fourth column reveals that SM compares 
favorably with other treatment components, 
especially if SM is combined with relaxation 
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or guided participation. It also is apparent 
from the last two columns of Table 1 that, 
when assessed, generalization and mainte- 
nance effects were obtained. 


Text Anxiety 


Four studies have been reported that ap- 
praised the effects of SM, often involving 
vicarious systematic densensitization, on test- 
anxious adults. Table 1 reveals that generally 
positive effects have been obtained for SM | 
when compared with control conditions; how- | 
ever, no research has compared SM with 
other treatment components. None of these 
studies reported an assessment of generaliza- 
tion, and only one study (Mann, 1972) as- 
sessed the maintenance of the treatment 
effects. Mann found that changes were main- 
tained 6 weeks after treatment. 


Dental-Medical Stress ( 


Symbolic modeling has been used to alle- 
viate stress and disruptiveness in dental and 
medical patients, typically among people 
who had no prior dental work or surgery. 
Prior to the actual dental work or surgery, 
the modeling film or videotape is shown to 
each treatment subject. Looking at the column 
of Table 1 that compares SM with controll 
groups, it can be seen that all but one of 
the studies (Shaw & Thoresen, 1974) used 
SM alone, and the studies typically did not 
involve other components. It appears that 
SM often is effective in comparison with 
controls according to behaviorally based mea- 
sures, but the results are less clear for mea- 
sures obtained from staff ratings, peer ratings, 4 
and self-report. 

It is apparent from column 4 that only three 
studies (Machen & Johnson, 1974; Shaw & 
Thoresen, 1974; Wroblewski, Jacob, & Rehm, 
1977) have investigated the effect of SM 
relative to other treatment components. Given 
the mixed findings of these few studies, à de- , 
finitive statement is not possible. | 

Generalization as such has not been as- 
sessed by researchers in this area; however, 
the dependent measure was often taken m 
a natural setting, so that additional measures 
to assess generalization may not have been 
necessary. Four studies (Melamed & Siegel, 4 
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1975; Shaw & Thoresen, 1974; Vernon, 1973; 
Wroblewski et al, 1977) appraised mainte- 
nance of treatment effects for periods ranging 
from 3 weeks to 3 months, and all of these 
studies found at least some indication of 
maintenance. 


= 


Interpersonal Skills 


Symbolic modeling has been used to fa- 
cilitate assertiveness, social interaction, and 
& the development of other appropriate social 
skills. This area of research contains the 
largest number of studies involving SM. 
Studies have been done with children, ado- 
lescents, college students, and adult psy- 
chiatric patients. A summary of the results 
depicted in the third column of Table 1 
shows that SM alone was generally more 
effective than the control conditions, although 
| two studies found no differences between SM 
and control conditions on social interaction 
in socially isolated preschoolers (Gottman, 
1977; Jakibchuk & Smeriglio, 1976). With 
five exceptions SM plus other treatment com- 
ponents, often three or four in number (see 
Footnote b in Table 1), typically was greater 
in effect than the control condition. In three 
of these five studies, the failure to obtain 
1 significant differences between a treatment 
condition including SM and a control condi- 
tion was based оп self-report measures 
(Curran, 1975; Hersen, Eisler, Miller, John- 
son, & Pinkston, 1973; Burrs & Kapche, 
Note 1). 
Of the studies that involved SM plus com- 
ponents compared with a control group, nearly 
yall (13 of 15 studies) contained rehearsal. 
Five studies contained an instruction com- 
ponent, 5 a feedback component, 4 a dis- 
cussion component, 3 a homework component, 
2 а reinforcement component, and 1 a com- 
munication training component. Of the 13 
studies that contained SM plus rehearsal, all 
but 5 also contained other components. Of 
y these 5 studies, 3 found SM plus rehearsal 
to be more effective than a control condition 
according to behavioral measures (Eisler, 
Hersen, & Miller, 1973; Hersen et al., 1973; 
Jaffe & Carlson, 1976). McFall and Twenty- 
man (1973) reported that SM plus rehearsal 
J#esulted in greater improvement than did the 


709 


control condition, based on a self-report mea- 
sure, but Hersen et al. found no differences 
in improvement with a self-report measure. 
Jaffe and Carlson and Thelen, Fry, Dollinger, 
and Paul (1976) found SM plus rehearsal to 
be more effective than control conditions ac- 
cording to staff ratings, but the latter study 
reported no differences on teacher ratings. 
Since a number of studies found SM (without 
additional components) to have significant 
effects, the results in the above studies may 
have stemmed from SM alone. No research 
has independently manipulated SM and re- 
hearsal to appraise their separate and joint 
effects. Definitive statements about the other 
components are even more difficult to make 
because there are few studies and these nearly 
always involved a multicomponent treatment 
package. 

With the exception of the study by Gold- 
stein et al. (1973), the studies that compared 
SM with other treatments involved SM plus 
one or more additional components. The two 
components most frequently combined with 
SM were rehearsal (eight studies) and in- 
structions (seven studies). Of the 17 group 
comparisons involving SM plus one or more 
additional components in which a behavioral 
measure was used, the condition that included 
SM was greater in effect than the comparison 
treatment in 11 instances and equal to the 
comparison treatment in 6 instances. How- 
ever, the condition that contained SM was 
greater in effect than the comparison treat- 
ment in only 1 (Curran, Gilbert, & Little, 
1976) of the 10 instances that involved a 
self-report measure. It appears that SM plus 
components compares favorably with other 
treatments, based on behavioral measures, 
but differences on self-report measures seldom 
have been found. One explanation for this 
consistent finding (differences on behavioral 
but not on self-report measures) is that the 
behavioral measures typically are obtained 
during role-playing situations that are similar 
if not the same as those used in the treatment 
session. On the other hand, the self-report 
measures are generally more global measures 
of social skills. Hersen & Bellack (1976) 
proposed that there is an “attitudinal lag” 
between the rapid behavior changes in social 
skills and subsequent self-assessments. 


710 


Also shown in the fourth column of Table 1 
are five studies in which the comparison 
treatment included SM. These studies indi- 
cate that SM plus praise is more effective 
than SM alone in increasing social inter- 
action among socially isolated children (Evers 
& Schwartz, 1973). However, O'Connor (1972) 
found that shaping did not make a similar 
additive contribution to SM alone. Instruc- 
tions, when included in a treatment package 
including SM. plus rehearsal, facilitated the 
development of assertive behaviors more than 
did just SM plus rehearsal (Hersen et al., 
1973). However, SM plus instructions. was 
equivalent to SM alone in increasing inde- 
pendent behavior (Goldstein et al., 1973). 
Also, Jakibchuk and Smeriglio (1976) reported 
that SM that included first-person, self- 
guiding comments was superior to SM that 
included narrative, third-person comments in 
the sound track. This study is discussed 
further in a later section. Because of insuffi- 
cient research on any one component and 
differences in the dependent measures and 
populations employed, definitive statements 
are not possible. 

In this area of research generalization 
usually has not been assessed. Of those studies 
that assessed generalization, 3 found that SM, 
at least as one component, facilitated gen- 
eralization (Hersen, Eisler, & Miller, 1974; 
Jaffe & Carlson, 1976; McFall & Twentyman, 
1973), whereas 2 studies failed to find such 
effects (Gutride, Goldstein, & Hunter, 1973; 
Burrs & Kapche, Note 1). However, as with 
the dental-medical research, studies involving 
social isolation in preschoolers have assessed 
the subjects’ behavior in natural settings, 
that is, during children’s interactions at nurs- 
ery school. Generalization to home or other 
settings has not been assessed. The results 
concerning maintenance are also very mixed. 
Of the 13 studies that assessed maintenance, 
8 found some evidence for maintenance, and 
5 obtained negative results. 


Other Clinical Problems 


The section of Table 1 labeled Other 
Clinical Problems contains five studies that 
focused on problems not encompassed by the 
previous sections. Since no problem area was 
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investigated more than once, it is impossible 
to make any definitive observation with respect 
to the influence of SM. These studies a 

mainly included for the sake of comprehen- 
siveness and to suggest possible directions for 
future research. 


Conceptual and Process Considerations 


Attention is given in this section to vari- 
ables and processes that have been studie 
within the context of SM research and thai 
have relevance to theoretical conceptuali 
tions of modeling. Attending to process cons 
siderations may facilitate theorizing regarding 
modeling and further refinement of the ap- 
plications of SM. 


Model-Observer Similarity 


The importance of model-observer simi- 
larity has drawn the attention of many re- 
searchers. Two variables in particular, model 
age in relation to observer age and the pres- 
entation of a coping versus a mastery model, 
have received considerable attention. 

Bandura and Barab (1973) found that 
adults showed as much snake-fear reduction 
after observing a child model as subjects; 
who observed a peer model. Fear measures 
included behavioral, self-report, and physio- 
logical measures. Similarly, Weissbrod and 
Bryan (1973) exposed children to a same-age 
or a younger model. They found no difference 
between the two groups on a behavioral ар“ 
proach to snakes measure. Neither of the 
above two studies contained a model who 
was older than the subjects. Kornhaber and] 
Schroeder (1975) addressed this question by 
exposing snake-fearing children to either @ 
child or an adult model. Those subjects who 
observed a child model demonstrated greater 
fear reduction on a behavioral avoidance test 
than those who observed an adult model, but 
there were no differences on an attitude 
measure. Furthermore, Kornhaber and Schroe- 
der found that children who observed à 
adult model did not show greater fear reduc- 
tion than did а no-treatment control group: 
Although these studies are limited in number, 
they are reasonably consistent. With adul 
subjects, it appears that observing a same-a8e 
or child model facilitates fear reduction 
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į Similarly, children experience the same degree 
of fear reduction when exposed to peer-age 
or younger models. However, the observation 
of an adult model has minimal effects on 
children. Children may see older models as 
superior and more capable and therefore may 
not imitate their approach behavior in a 
fearful situation. On the other hand, when 
adults observe children or when children 
observe younger children, they may be mo- 
tivated to perform the approach response 

(Bandura & Barab, 1973). Bandura and Barab 
reported that adult subjects who observed 
an adult model showed a positive correlation 
between fear extinction (based on autonomic 
responses) and behavioral improvement, which 
they interpreted as vicarious extinction. In 
contrast, the adult subjects who observed 
child models demonstrated little relationship 
between fear extinction and behavioral im- 
provement, which suggests that motivational 
factor served to influence approach behavior. 
In other words, *If that kid can do it so 
can I.” No related research has been done 
on the effects of model-subject age differences 
when both model and subject are adults. 

The question of the effects of a coping 
model versus a mastery model is a very 
complicated one that touches on many of the 

| studies on SM. Many of the studies reviewed 
used a coping model, but did not manipulate 
the coping-mastery variable. Additionally, 
there were variations of the coping-mastery 
variable. For example, Curran (1975) pre- 
sented an inappropriate model followed by 
à competent model. 

In a study with snake-avoiding college 
females, Meichenbaum (1971) suggested that 
the coping-mastery variable might relate to 
the broader question of model similarity, 
which has been shown to enhance imitation 
(Rosekrans, 1967). In the Meichenbaum 
study, the coping models displayed initially 
hesitant behavior, but later fearlessly inter- 
acted with the target snake. The mastery 

j models were fearless throughout the modeling 
Sequences, with each model confidently han- 
dling the snake. Meichenbaum found that 
the subjects who observed the coping models 
displayed significantly greater improvement 
than subjects in the mastery condition. 

AX similar procedure was used to alleviate 
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interview "anxiety in psychiatric patients 
(Bruch, 1975). Bruch's results were not as 
consistent as Meichenbaum's, but he did 
report improvement of subjective anxiety 
according to self-ratings among the coping 
subjects as compared with the mastery sub- 
jects. Working in the area of anxiety asso- 
ciated with surgery, Vernon (1974) reported 
that subjects who observed a model who 
showed some degree of pain were rated as 
experiencing less pain subsequently than were 
subjects who observed a model who did not 
demonstrate pain. Perhaps a study that fails 
to demonstrate effects is as important as 
affirmative studies. Lira, Nay, McCullough, 
and Etkin (1975) reported that adult female 
subjects who observed a fearless "expert" 
model were uninfluenced by that model as 
compared with a control group, based on 
behavioral and self-report measures. Meichen- 
baum (1971) acknowledged, however, that the 
greater effectiveness of the coping model in 
his study could have stemmed from the 
modeling of coping techniques that were an 
inherent feature of the coping conditions, 
rather than from the perceived similarity 
between observer and model per se. In con- 
sideration of this point, Kornhaber and 
Schroeder (1975) presented models who did 
not demonstrate deep breathing, which was 
central to Meichenbaum’s coping model. With 
this confound removed, Kornhaber and Schroe- 
der found greater change in evaluative at- 
titudes toward snakes among those who 
observed a fearful model as compared with 
those who observed a fearless model, but 
they obtained no differences on a behavioral 
measure. Also, Jaffe and Carlson (1972) found 
no differences on behavioral and self-report 
measures of test anxiety between adults who 
observed a calm model and adults who ob- 
served an anxious model. These findings are 
generally consistent with Kornhaber and 
Schroeder’s (1975) suggestion that with model 
coping skills removed, the greatest effect of 
a coping model may be on attitudes rather 
than on overt behavior. 

A number of other researchers have adopted 
the Meichenbaum procedures in their ap- 
plications of SM to clinical problems. The 
more focal problems of phobias and stress 
associated with dental and medical procedures 
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have been responsive to the coping manipula- 
tions. Lewis (1974) found water-phobic sub- 
jects to be significantly improved at the 
posttreatment assessment after they observed 
a coping model. Hill, Liebert, and Mott 
(1968) and Spiegler, Liebert, McMains, and 
Fernandez (1969) combined the mastery and 
coping procedures by depicting an * expert" 
model who demonstrated competent dog or 
snake handling to an initially apprehensive 
Gie., coping) model. The second model then 
imitated the first model. Both research groups 
found moderate support for the combined 
modeling approaches, based on behavioral 
measures. In her studies of dental anxiety 
(Melamed, Hawes, Heiby, & Glick, 1975; 
Melamed, Weinstein, Hawes, & Katin-Bor- 
land, 1975) and anxiety associated with 
surgery (Melamed & Siegel, 1975), Melamed 
used coping models to successfully reduce the 
disruptive behavior of children undergoing 
these procedures. Except for Gottman (Gott- 
man, 1977; Gottman, Gonso, & Schuler, 
1976), who obtained weak effects, most studies 
have reported positive effects in using a coping 
model with interpersonal problems (Evers & 
Schwartz, 1973; Evers-Pasquale & Sherman, 
1975; Jakibchuk & Smeriglio, 1976; O'Connor, 
1969, 1972; Thelen et al., 1976). 


Narration 


Bandura (1977) has emphasized the im- 
portance of attentional and imaginal-verbal 
cognitive processes in modeling. In general, 
four subprocesses of modeling are emphasized 
by Bandura. In order to imitate, the observer 
must first attend to the critical model be- 
havior. Second, the cognitive processes of 
imagery labeling and verbal labeling that are 
sustained over time by rehearsal facilitate 
later imitation of a model when external 
modeling cues are not present. Third, in order 
to overtly imitate a model, a person must 
have the motor reproduction abilities that 
are particularly relevant to more complex 
motor behavior. Last, although modeled be- 
havior can be acquired by observation alone, 
reinforcement is critical if observed behavior 
is to be performed and this performance is 
to be continued over time. 

Some aspects of the SM research reviewed 
in this article are relevant to these concep- 
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tualizations. One variable that has received, 
considerable attention is narration, or а de- - 
scription of the model behavior concurrent 
with model performance. Based on current 
theorizing, narration should facilitate atten- 
tion to the model and verbal labeling of the 
critical model behavior and thereby should 
increase the effectiveness of modeling inter- 
ventions. Jakibchuk and Smeriglio (1976) 
reported that SM plus narration failed to 
significantly affect the social behavior of, 
isolated preschool children in comparison with 
two control groups. However, researchers have 
reported that SM plus narration diminishes 
snake fears in adults (Spiegler et al., 1969), 
social isolation among preschoolers (Evers- 
Pasquale & Sherman, 1975; Keller & Carlson, 
1974; O'Connor, 1969, 1972), and anesthesia 
fear among children (Vernon & Bailey, 1974), 
Since SM alone often has a favorable effect, 
it is of questionable value to demonstrate j 
that SM plus narration has a favorable effect. 
Perhaps the greater question is whether nar- 
ration has an incremental effect on SM alone. 
On this question Morris, Spiegler, & Liebert 
(1974) found that SM plus narration was 
equal to SM alone, based on a self-report 
measure of snake fear among adolescents. 
No other studies directly compared SM plus 
narration with SM alone. À 
Although the effect of adding narration to 
SM is uncertain, Jakibchuk and Smeriglio 
(1976) reported a timely study in which they 
compared narration with the influence of 
self-guiding model comments on the social 
isolation behavior of preschool children. They 
found that SM plus self-guiding comments 
by the model had a significant effect in com 
parison with two control conditions and in 
comparison with an SM plus narration con 
dition. These findings are consistent with the 
earlier work of Meichenbaum (1971), who 
reported that a model’s self-verbalization 
facilitated reduction of snake fear among 
female undergraduates. It may be that set, 
verbalization facilitates the conversion of th 
model's behavior into perceptual-cognitive 
images, as described by Bandura (1977). 
Perhaps model self-verbalizations are espe 
cially facilitative when the model behavior 
is relatively complex; however, there is little 
systematic research relevant to this point. pt 
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Context and Complexity of M. odel Behavior 


The nature of the model context may be 

critical to attentional processes. It is sug- 
gested that the model context needs to be 
sufficiently simple to insure attention to the 
critical model behaviors, but it also should 
contain enough contextual cues to facilitate 
generalization to other settings. Furthermore, 
as the complexity of the target behavior 
increases, there may be greater need to both 
simplify and amplify the target behaviors. 
An example of such simplification of complex 
target behaviors is Hersen et al.'s (1973) 
reliance on eight response modalities charac- 
teristic of assertive responses. These included 
a lengthy reply, a request for behavior change, 
consistent eye contact, a refusal to comply 
with unreasonable requests, a fully audible 
speaking voice, an assertive affect, a quick 
latency of response, and overall assertiveness. 
Elaborative or focusing narration, self-guiding 
comments, or instructions may be used in 
conjunction with simplified target behavior 
to further increase the subject's attentiveness 
to relevant materials. By these means, the 
subject's attention is directed to the subtleties 
of the interpersonal interactions, which may 
Otherwise go unnoticed in daily naturalistic 
modeling. 
. In the course of developing modeling films 
for interpersonal skills, it is important to 
avoid overwhelming the observer. Even with 
simplification and elaboration it is possible 
to overload the observer with too many sig- 
nificant points of focus. Burrs and Kapche 
(Note 1) failed to achieve successful results 
using SM with psychotic inpatients. In his 
criticism of their study, Goldstein (1973) cited 
their research as an attempt to cover too 
much ground in short videotapes. However, 
when combined with role playing, SM may 
be effective with more complex skills, for 
example, requesting new behavior, whereas 
instructions may be adequate for less com- 
plex skills, for example, eye contact (Eisler 
let al., 1973; Hersen et al., 1974; Hersen et al., 
1973). 


Uncertainty, Arousal, and Model Warmth 


.. Other considerations are also very likely 
Ao be relevant to the matter of attention to 
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and imitation of the model. For example, 
Yussen & Levy (1975) found that subjects 
who observed a warm live model attended 
to the model and recalled the model's behavior 
more than did subjects who observed a neu- 
tral live model. Others have suggested that 
uncertainty in the observer may increase at- 
tention to and subsequent imitation of a 
model (Marlatt, 1972; Thelen, Paul, Dollinger, 
& Roberts, 1978). If this is the case, it is 
likely that individuals will profit most from 
SM at those times when they actively seek 
information regarding appropriate behavior, 
for example, shortly before a stressful medical 
procedure. On the other hand, if an observer 
experiences extreme anxiety in anticipation 
of a stressful event or if the model’s behavior 
creates a great deal of anxiety in the observer, 
the observer may avoid attending to the model. 


Retention 


As reflected in the methodologies of much 
of the SM research, especially in the failure 
to routinely collect data on maintenance, 
there is relatively little material relevant to 
the question of retention. The behavioral 
rehearsal and practice components that were 
so frequently used in the interpersonal skills 
research might be expected to facilitate not 
only immediate imitation of the model but 
also the retention of the modeled behavior 
(Bandura, 1977). However, there is no re- 
search that documents the influence of re- 
hearsal and practice following observation of 
a symbolic model on the retention of behavior. 
Research by Thelen, Fryrear, and Rennie 
(1971) with a nonclinical target behavior 
(i.e. standards of self-reward) suggests that 
the influence of observing a model can have 
long-term effects. 


Model Consequences 


If shown to be useful, model consequences 
can easily be incorporated as part of SM. 
Also, model consequences have been exten- 
sively researched (Thelen & Rennie, 1972) 
and are important to the theorizing of Bandura 
(1977), who ascribed both informational and 
incentive functions to this variable. Many of 
the studies that focused on anxiety-related 
problems did not contain model consequences, 
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perhaps because of the assumption that the 
absence of negative consequences following 
the demonstration of the feared act in itself 
serves as positive model reinforcement. Model 
consequences were included in some of the 
studies on social isolation in preschoolers 
(Evers & Schwartz, 1973; Evers-Pasquale & 
Sherman, 1975; O'Connor, 1969, 1972), on 
assertion (McFall & Twentyman, 1973), and 
on dental and medical stress (Melamed, 
Hawes, Heiby, & Glick, 1975; Melamed, 
Weinstein, Hawes, & Katin-Borland, 1975). 
Although all of these studies showed some 
positive effects for SM, none of them ma- 
nipulated the model-consequences variable. 
Therefore, this research does not establish 
that model consequences have an additive 
effect on the influence of SM alone. 
Acquisition Versus Disinhibition 

Yet another aspect of Bandura’s (1977) 
theory of imitation is his distinction between 
two effects of modeling: disinhibition and the 
acquisition of behavior. A behavior that has 
previously been acquired may not be per- 
formed because of fear or anxiety. Observa- 
tion of a model engaging in the behavior 
without negative consequences or with posi- 
tive consequences may disinhibit the behavior 
or increase its frequency. New patterns or 
sequences of behavioral components also are 
exhibited by observers following observation 
of a model. In this manner novel behaviors 
are acquired. 

It is potentially important to both ap- 
plication and theory to determine if SM 
primarily facilitates anxiety reduction and dis- 
inhibition or skills acquisition. Of course, it 
is possible that SM has an important influence 
in both areas. The treatment components that 
researchers have combined or compared with 
SM may give some clues as to their thinking 
regarding this matter. Research on the use 
of SM with phobias, test anxieties, and 
dental-medical stress used primarily relaxa- 
tion and desensitization as additional com- 
ponents and as comparison treatments. One 
might infer that researchers see these prob- 
lems as primarily involving anxiety and in- 
hibition. In contrast, the components (e.g., 
rehearsal) used by most researchers of inter- 
personal problems reflect an assumption of a 
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skills deficit. However, two recent studies 
raise the question as to the extent that skills 
deficits are the cause of some of the inter- 

rsonal problems studied. Schwartz and 
Gottman (1976) found that low-assertive 
college students were as able as moderate- and 
high-assertive college students to write out 
what they thought a good assertive response 
might be. Even more significantly, when 
asked to role play an assertive response for | 
a hypothetical unassertive friend, the Іож- 
assertive students were as assertive as the 
moderate- and high-assertive students. Con- 
sistent with the above, Nietzel and Bernstein 
(1976) found that high-demand instructions 
made unassertive college students more as- 
sertive than did low-demand instructions. 
Both of the above studies suggest that as- 
sertive responses were in the behavioral rep- 
ertoires of the low-assertive college students. 
However, since both studies were with college 
students, similar conclusions regarding other 
clinical populations are not possible. For 
example, it is likely that unassertive psy- 
chiatric inpatients have a more severe be- 
havioral deficit than do unassertive college 
students (Hersen & Bellack, 1976). 

Morris et al. (1974) have suggested that 
there are two components in phobia problems, 
one cognitive and the other emotional. The 
cognitive component involves information 
about alternative responses and expected 
outcomes that might ensue. The emotional 
aspect involves autonomic processes. Morris 
et al. suggested that the cognitive aspect 
should dominate in the anxiety of relatively 
normal subjects with snake fears because of 
their lack of experience and their faulty | 
beliefs. They found that high school children 
with snake fears who observed a symbolic 
model showed more fear reduction than & 
control group, based on a cognitive (worry) 
measure but not on an emotionality (auto 
nomic reactions) measure. Based on these 
findings, one might infer that SM is better) 
suited to changing cognitive functioning а0 
perhaps not so effective in changing aut 
nomic processes. These findings are generall 
consistent with a study by Bandura and 
Menlove (1968), in which they showed thal 
high-emotion-prone subjects in a multipl 
model condition showed less reduction in do 
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, fear than subjects who were not high-emotion- 
prone. Morris et al. worked with relatively 
normal subjects, who may be more similar 
to the moderate-emotion-prone subjects than 
to the high-emotion-prone subjects in the 
Bandura and Menlove study. One implication 
of these findings may be that SM is not as 
effective with more severe clinical problems 
that involve a strong anxiety component. 
Research is needed that addresses this question. 


M ultiple Models 


Another facet of an SM treatment that is 
easily incorporated is the use of multiple 
models instead of the presentation of a single 
model. The purpose of using multiple models 
is to increase the odds that the observer will 
select at least one model to imitate, to provide 
multiple exposures to the target behavior, to 

j increase the treatment stability, and to fa- 
cilitate generalization of the behavior. Many 
of the studies reviewed in Table 1 employed 
multiple models, and this was especially likely 
in those studies that assessed generalization. 
However, only Bandura and Menlove (1968) 
actually manipulated the number of models. 
In their study the usual posttreatment, 
follow-up, and generalization comparisons 

failed to differentiate the dog-fearing children 
who observed multiple models from those who 
observed a single model. However, more 
children in the former group completed the 
behavioral approach test than did children 
in the single model group. Since the use of 
multiple models is readily incorporated in an 
SM treatment, and since multiple models 

) logically seem to facilitate generalization, this 

f question is clearly in need of research. 


Length of Model Presentation 


The length of the model presentation is 
another consideration that is potentially criti- 
cal to the effectiveness of SM and on which 

Los studies vary considerably. However, none 
of these studies systematically manipulated 
the length of the model presentation, and 
therefore little can be gleaned in this regard. 
A study by McGuire, Thelen, and Amolsch 
(1975) of self-disclosure showed that instruc- 
lions were as effective as modeling when the 

Audio model presentation was brief, but when 
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the model presentation was longer, the audio 
model was more effective than were instruc- 
tions. If SM is to become a treatment in 
general use, it is important to establish the 
optimal length of model presentation for vari- 
ous clinical problems and clinical populations. 


Methodological Considerations 


One could write an entire volume on 
methodological concerns in this area of re- 
search ; only some of the more critical matters 
are discussed here. These points generally 
pertain to a number of studies rather than 
to one or two isolated studies. The following 
discussion concerns (a) subject selection, (b) 
defining the treatment or the independent 
variable, (c) control groups, (d) criterion 
measures of change, and (e) generalization 
and maintenance. 


Subject Selection 


No study can be better than the accuracy 
of the clinical population sampled. In the 
case of interpersonal skills problems, some 
researchers obtained subjects through psy- 
chology classes, which is probably not as 
desirable as obtaining subjects through news- 
paper advertisements (Little, Curran, & Gil- 
bert, 1977). Psychology students are a rela- 
tively homogeneous population, and they are 
seldom completely naive concerning the ex- 
perimental hypotheses. Inferences from studies 
that used college students to more diverse 
populations are highly risky. A strength in 
the research on dental-medical fears is that 
the subjects were drawn from the clinical 
setting in which the problem was demonstrated. 

Another related problem is the tendency in 
SM research to use people who have rela- 
tively minor clinical problems. This is most 
apparent in the research on snake phobias. 
As Morris et al. (1974) suggested, the efficacy 
of SM with relatively normal snake phobics 
may not obtain when SM is applied to a 
more clinical population. There is a dire need 
for research on the effects of SM with more 
truly clinical and more disturbed populations. 


Definition of the Independent Variable 


The definition of the independent variable 
or the composition of the treatment is another 
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critical area. Although some of the studies 
compared SM alone with one or more control 
conditions, the prominent tendency, especially 
in the studies on interpersonal skills, has been 
to put SM with a basket of components. 
The net result is that one knows relatively 
little about the individual influence of SM 
as a separate component and relatively little 
about what SM might contribute to an 
intervention that contains one or more other 
components. Only a limited number of studies 
were designed to identify the influence of the 
specific components (cf. Goldstein et al., 
1973; O’Connor, 1972). 


Comparisons With Control Groups 


If one wishes to attribute assessed changes 
to treatment variables, one must rule out the 
various effects that may be attributed to 
assessment, attention, placebo, or other de- 
mand characteristics. In general, the research 
on dental or medical stress and interpersonal 
skills controlled for the influence of assess- 
ment, attention, and placebo. But the re- 
search on phobias and test anxiety all too 
often contained only a no-treatment control, 
which does not allow one to rule out the 
influence of nontreatment variables. However, 
when the primary purpose of a study is to 
compare SM with other treatment techniques, 
an instruction/demand control may not ђе 
critical. 

Many of the studies on interpersonal skills 
that used role playing (rehearsal) as part of 
the treatment package also used role playing 
to assess possible changes in interpersonal 
behavior, but did not control for practice in 
role playing. It is possible that any assessed 
changes in these studies stemmed from the 
increased competence of the treatment sub- 
jects at role playing, whether it pertained to 
assertiveness or to other behaviors. Therefore, 
if role playing is a part of the treatment and 
assessment, it is important to control for 
role playing as a general skill that can be 
learned. 


Assessment 


The matter of assessment in research on 
SM is both critical and complicated. A number 
of different methods have been used to assess 
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the effect of the intervention, including self 
report, physiological, and a variety of bes 
havioral measures. Many researchers assessed 
behavioral changes in role-playing situations 
whereas others assessed behavior by creating 
circumstances relevant to the behavior undé 
investigation and unobtrusively observing th 
subjects’ responses. Other behavioral mi 
sures that have been employed include direg 
observation in natural settings by train 
observers and general ratings from infor 
acquaintances of the subjects. 

Much of the research on SM contained 
number of measures to assess the effects 0 
the intervention, and most researchers | 
this area generally support the idea of mul 
tiple dependent measures. However, many 
researchers have pointed out the poor relä 
tionship between some of these dependen 
measures, as for example between behavioral 
and self-report measures of assertion (Ric 
& Schroeder, 1976) and of anxiety (Paul & 
Bernstein, 1973). Perusal of Table 1 reveal 
discrepant findings for relationships атой 
the various dependent measures within 
number of studies; that is, significant dif 
ferences between groups were obtained 0 
one type of measure (e.g. self-report) bul 
not on another type (e.g., behavioral). Ad 
ditionally, the large number of instances 
mixed results on a given type of measum 
(as indicated by asterisks in Table 1) illu 
trates the frequent use of multiple behavio! 
and self-report measures. When a large n 
ber of measures are obtained, significan 
differences are more likely a function ( 
chance. 

A number of researchers have questione 
the reliability and validity of the assessmen 
methods used in SM research. For exampl 
evidence has been presented indicating 
the behavioral avoidance tests used to asse 
target anxiety in phobics are subject to 
variety of uncontrolled situational and pt 
cedural influences (Bernstein & Nietzel, 1973 
Bernstein and Nietzel argued that such | 
fluences as demand characteristics. and 
structional differences potentially obscure 
relationship between treatment variables а0 
anxiety. This is especially true when sul 
influences result in the inclusion of less phob 
subjects, who may respond more readily tha 
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. “true” phobics to posttreatment demands for 
approach behavior (Bernstein & Paul, 1971). 
Role-playing procedures used to assess as- 
sertiveness and heterosexual anxiety may be 
similarly influenced by instructionally me- 
diated demand (Nietzel & Bernstein, 1976), 
thus limiting the validity of such procedures. 
The reliability and validity of most self-report 
measures of assertiveness have also been 
criticized (Rich & Schroeder, 1976), and such 
tests may be insensitive to treatment effects 
(see Table 1). Finally, Gottman (1977) criti- 
cized the naturalistic-observation assessment 
procedures employed in previous studies of 
SM with socially isolated preschoolers. These 
studies were criticized for the lack of detail 
in the coding systems employed, the failure 
to adequately control observer bias, the use 
of error-prone time sampling procedures, and 
the failure to control for interobserver reli- 
ability “decay” and observer “drift.” 

In summary, it seems to be important to 
incorporate multiple measures into an assess- 
ment of the effects of SM. However, there is 
a need for more serious consideration of 
assessment problems in future SM research. 
One last point deserves mention. Only three 
of the studies reviewed in Table 1 used a 
physiological measure to assess change. Even 
though there are problems with physiological 
"measures, in that correlations between physio- 
logical measures are low and are sometimes 
inconvenient to obtain, greater use of such 
measures seems to be indicated. This would 
be especially true when treating anxiety- 
related problems and might eventually help 
to clarify the controversy concerning the 
relative roles of anxiety and skills deficit in 
the various areas studied. 


Generalization and Maintenance 


Since nearly all of this research, except 
that on dental and medical stress, was not 
carried out in a naturalistic setting, it is 
important to address the question of gen- 
eralization. The second to last column in 
Table 1 demonstrates the relatively small 
percentage of studies that assessed general- 
ization. And even some of the studies that 
assessed generalization did so in a way that 
leaves much to be desired. Recognizing that 
generalization is on a continuum, many of 
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the researchers seem to have appraised gen- 
eralization within a context that more closely 
approximated the treatment situation than 
the natural environment. For example, some 
of the phobia research appraised generaliza- 
tion by introducing a snake different from 
the one used for treatment. What does this 
tell us about the subject's response to a snake 
that they happen upon in their backyard on 
a Sunday afternoon? Similarly, a frequent 
method of assessing generalization in assertion 
research has been to introduce role-playing 
scenes that are different from those used for 
treatment. Surely this measure of generaliza- 
tion more closely approximates the treatment 
situation than the natural environment. The 
time has come to use generalization measures 
that appraise the subject's behavior outside 
the treatment setting and in settings more 
like the natural environment. The dental and 
medical research represents a clear exception 
to this problem. Since treatment and data 
collection typically occur within the natural 
environment, the question of generalization 
becomes less of a problem. 

The question of maintenance has received 
about as little attention as generalization (see 
the last column of Table 1). When measures 
of the maintenance of effects were taken, 
they often were obtained within a few days 
after treatment, Even more critical is that 
these researchers typically did not assess the 
duration of the changes in the natural en- 
vironment. In short, duration without gen- 
eralization is of little value. 


Conclusions 


Based on the research reviewed in this 
article, it appears that the use of symbolic 
modeling as a treatment device, perhaps 
combined with other components, may have 
a promising future. If nothing else, the re- 
search to date suggests a great potential in 
this area. At this point, the need is for studies 
with more clinical or disturbed populations, 
for studies that systematically vary the treat- 
ment components, that carefully use multiple 
measures to assess change, and that appraise 
both generalization and maintenance in natu- 
ral settings or in settings that closely ap- 
proximate the natural environment. 
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This study examined 866 black-white employment test validity pairs from 39 
studies for evidence of differential validity beyond that which would be ex- 
pected on the basis of chance plus various statistical artifacts. The data in this 


study, unlike those in previous studies 


of differential validity, were free of 


Type I bias induced by data preselection. The results support the hypothesis 
that findings of apparent differential validity in samples are produced by the 
operation of chance and a number of statistical artifacts and indicate that true 
differential validity probably does not exist. 


The question of whether traditional employ- 
ment tests are appropriate for blacks and 
other minorities is an important one today. 
Researchers have approached this question 
from two different directions: from the point 
of view of subgroup validity differences and 
from the point of view of selection fairness. 
Although these two phenomena are related, 
they are by no means identical. For example, 
"a test with equal subgroup validities can 
nevertheless be unfair, under certain circum- 
Stances, when used in selection. In earlier 
articles, we examined the properties of various 
models of test fairness (Hunter & Schmidt, 
1976; Hunter, Schmidt, & Rauschenberger, 
1977; Schmidt & Hunter, 1974). For other 
treatments of selection fairness, see Gross and 
Su (1975), Novick and Petersen (1976), and 
Cronbach (1976). 


Differential and Single-Group Validity: 
Definitions and Research 


Research in the area of subgroup validity 
differences has focused on two hypotheses: 
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the single-group validity hypothesis and the 
differential validity hypothesis. The single- 
group validity hypothesis states that, for at 
least some tests, validity in the applicant 
population is zero for one group (e.g., blacks) 
but not for the other (e.g., whites), that is, 
рі = 0 < py. The research evidence against 
this hypothesis is now overwhelming (Hunter 
& Schmidt, 1978). Four different studies 
based on cumulative available research re- 
sults (Boehm, 1977; Katzell & Dyer, 1977; 
O'Connor, Wexley, & Alexander, 1975; 
Schmidt, Berner, & Hunter, 1973) all found 
that the frequency of single-group validity 
does not exceed that which would occur by 
chance alone given equal population validities. 

The differential validity hypothesis, on 
the other hand, is more general. It states 
merely that the validities in the two applicant 
populations are unequal, that is, pı ғ p». 
The status of this hypothesis is less certain 
than that of the single-group validity hy- 
pothesis. Because of small sample sizes and 
other problems, individual studies have limited 
statistical power to detect differential validity. 
Studies based on cumulative research results 
but focusing on the single-group validity 
hypothesis likewise have limited statistical 
power to detect differential validity and thus 
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Figure 1, The level-based differential validity hypothesis (rm < rw) and the null hypothesis (ra = rw); В = 


blacks; W = whites, 


provide only weak tests of the differential 
validity hypothesis (Hunter & Schmidt, 1978). 
Studies based on cumulative research results 
and focusing specifically on the differential 
validity hypothesis can have high statistical 
power for detecting differential validity. To 
date three such studies have examined the 
evidence for differential validity (Boehm, 
1972; Boehm, 1977; Katzell & Dyer, 1977). 
However, these researchers preselected the 
data to be examined in such a way as to 
induce a massive Type I error bias in their 
results. Specifically, to be included in their 
analyses, at least one of the sample validities 
in a black-white validity pair had to be 
statistically significant. Hunter and Schmidt 
(1978) have shown that such preselection of 
data leads to the appearance of differential 
validity beyond chance levels even when there 
is in fact no differential validity in the parent 
populations. Thus the finding in these three 
studies that sample differential validity occur- 
red in excess of chance frequency is difficult 
to interpret. A major purpose of this study is 
to examine the differential validity hypothesis 
using data that are not preselected. 

It has been suggested (Boehm, 1972; Bray 


& Moses, 1972) that validity differences by 
race may be associated with the use of subjec- 
tive criteria, such as ratings and rankings, 
and that such differences are infrequent when 
more objective criteria are employed. This 
hypothesis has found no support in connection 
with single-group validity. Schmidt et al. 
(1973) found that the frequency of single-group 
validity was at chance levels for both kinds of 
criteria. (The proportion of validity pairs 
showing single-group validity was significantly 
higher for subjective than for objective 
criterion measures [.37 vs. .20, p < .001], but 
this difference was predicted and fully ex- 
plained by differences between the two data 
sets in individual sample sizes, differences 
between black and white sample sizes, and 
general level of test validity.) The second 
purpose of the present research is to test this 
hypothesis in connection with differential 
validity. А 

A third purpose of this research is t0 
examine the differential validity hypothesis 
separately for different levels of validity. It 
may be that tests of generally low validity show 
little or no differential validity, while tests of 
higher validity show substantial differential 
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validity. This hypothesis is depicted graphic- 
ally in Figure 1. This figure shows the predicted 
graph of test validity for blacks as a function 
of validity for whites. If a test is invalid for 
whites, it is predicted to be invalid for blacks 
also, and hence the function begins at the 
point (0, 0). The validity for blacks is predicted 
to increase as the validity for whites increases, 
but at a slower rate. Hence the function shown 
in Figure 1 is always less than the broken 
line, тв = rw (B = black; W = white). 

Although Figure 1 does illustrate the 
hypothesis that degree of differential validity 
is a function of level of test validity, the 
figure can be criticized in two respects. First, 
the comparison of the validity for whites and 
for blacks is made in the form of a comparison 
of the observed curve with the hypothetical 
broken line, rg = rw, rather than in the form 
of a direct comparison of curves for each group 
separately. Second, and more crucial, a plot 
of actual data (as opposed to population 
values) in the form of Figure 1 would show 
a definite bias produced by the sampling 
error in the correlations for whites. The 
sampling error in the white validity coefficients 
would function as an error of measurement in 
the independent variable in that it would 
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result in a false reduction of the slope of the 
regression line; that is, a plot of real data 
would be biased in the direction of the differen- 
tial validity hypothesis even if that hypothesis 
were false. This follows from the fact that 
measurement error (unreliability) in the 
independent variable—but not measurement 
error in the dependent variable—acts to 
attenuate the slope (cf. Hunter & Schmidt, 
1976, for further development of this point). 
An alternative manner of plotting the same 
hypothesis is shown in Figure 2. In Figure 2 
the basic validity of the test is not defined as 
the validity for whites but as the average of 
the validity for whites and the validity for 
blacks. The separate validities for each group 
are then plotted as a function of the within-pair 
average, producing separate functions for 
whites and blacks. These two functions can 
then be compared in the usual way. First, if 
validity is zero for either group, then it is 
expected to be zero for the other group, that 
is, both functions start out at (0, 0). As the 
average validity for the test increases, the 
validity for each group increases. However, 
the validity for whites is expected to increase 
more rapidly than the validity for blacks, and 
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y Figure 2. An alternate expression of the level-based differential validity hypothesis. (B = blacks; W = whites.) 
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hence the hypothesized curve for whites always 
lies above the hypothesized curve for blacks. 


Data Analysis: Methods and Problems 


A special form of chi-square test was 
developed for use in testing the differential 
validity hypothesis: The coefficients in each 
validity pair (rg and rw) are first transformed 
to Fisher’s zs, that is, to 2в and zw, respec- 
tively. For each validity pair, the chi-square 
is then as follows: 


(zw — zs)? 
1 ai 
Мана MESE 


where N w is the sample size for whites and N 
is the sample size for blacks. Note that the 
right side of this equation is merely a squared 
Z score, as required by the definition of chi- 
square. The Z scores result, of course, from 
standardization of the validity differences after 
conversion of validities to Fisher's z form. 
For a single validity pair, this chi-square has 
only one degree of freedom, but (unlike a 
significance test between correlations) it can 
be cumulated over all pairs in a given range 
of average validity or over all pairs across 
all levels of validity to provide a test with 
high statistical power. However, because of the 
nature of the data that must be used in any 
study of differential validity findings from the 
literature, this chi-square test (and any other 
test) is characterized by a number of sources 
of Type I error bias. 

Effects of nonnormality. The first factor 
that creates such bias is that most validity 
studies are done on selected populations. If 
Such participants are selected by using the 
test in question, their test score distribution 
will be nonnormal. Suppose, for example, that 
the distribution for the applicant population is 
bivariate normal, Then if, for example, only 
the top 50% on the test were selected, the test 
would be highly skewed in a positive direction, 
and the criterion would be similarly skewed to 
a lesser extent, depending on the size of the 
correlation. (A high validity would mean a 
high degree of concomitant skew in the 
criterion, whereas a low validity would mean 
that the criterion would be approximately 

normal.) We found no literature examining 
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the impact of these nonnormal distributions 
on the standard error of the correlation 
coefficient. However, we were able to derive 
a fairly good approximation. Our calculations 
show, for example, that for a 50% selection 
ratio and a validity of .5 in the applicant 
population (reasonable assumptions; see 
Schmidt & Hunter, 1977; Schmidt, Hunter, 
& Urry, 1976), the standard error of the 
selected sample validity coefficient is 8% 
larger than the full population standard error 
derived from the assumption of normality. 
The probability of finding a false reading of 
differential validity in such a situation is not 
the usual 5%, but rather 7%. (These calcula- 
tions are available from the senior author.!) 
If such studies were added into a cumulative 
chi-square test, then each such study would 
add an expected increment of 1.16 to the 
chi-square, instead of the 1.00 that would be 
added if the full applicant population were 
sampled. This results, of course, in a Type I 
bias. In this study, we deal with a total of 
866 black-white validity pairs. In the absence 
of Type I bias, the expected value of the 
cumulative chi-square over these validity 
pairs would be its degrees of freedom, 866. 
In the presence of this Type I bias, the expected 
value of the chi-square under the null hypoth- 
esis (that is, given the complete absence of. 
any true differential validity) would be 1.16 
(866), or 1,004.6 in our example, which would 
be deemed significant beyond the .0001 level. 

Violations of independence. The second 
factor creating Type I error bias is the fact 
that not all validity pairs are statistically 
independent. The chi-square test assumes 
that all validity pairs are computed on 
independent samples. If several pairs are 
actually calculated on the same sample of 
persons (as is the case in nearly every study 
considered below), then those pairs are to 
some extent redundant. For example, suppose 
that several predictors were correlated with 
the same criterion; then the resulting correla- 
tions would be completely independent of 
one another only if the predictors were 
perfectly uncorrelated. Similarly, if one pre- 


These calculations are quite lengthy and are 
therefore not included here. They will, however, be 
presented in a future article. 
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4 dictor were correlated with several criterion 
measures, then the resulting correlations would 
be independent of one another only if the 
criterion measures were uncorrelated with each 
other. The effect of this partial redundancy 
produced by nonzero correlations between 
multiple predictors or between multiple criteria 
is difficult to calculate, and no attempt is made 
to correct for it below. Instead we simply state 
an example by way of warning. Suppose we 
had run a chi-square statistic over 100 validity 
pairs and had obtained an observed value of 
128. Then, had those pairs been independent, 
we would have used the known facts about the 
chi-square distribution to conclude that the 
chance expected value of the chi-square 

statistic would be 100 and the standard 
deviation would be the square root of twice 
the degrees of freedom, that is, У (200) = 
14.14, and hence an observed value of 128 

^ would be significant at the 5% level. (Recall 
that a chi-square distribution with 100 degrees 
of freedom is essentially normal.) However, 
suppose we were told that an error had been 
made and that one of the numbers had been 
inadvertently written down 10 times. Then 
the expected value of the “chi-square” 
statistic would still be the nominal degrees of 
freedom, 100. To obtain the variance, we 

would note that this statistic was the sum of 
91 independent entries, 90 of which had the 
usual chi-square variance of 2, whereas the 
last entry was repeated 10 times and hence 
had a variance of 10X 10x 2, or 200. 
Thus, the true standard deviation would be 
У (180 + 200) = 19.49, and an observed value 
of 128 would no longer be statistically signif- 

| icant. Obviously, use of the smaller, erroneous 
standard deviation leads to a Type I error 
bias in the statistical test. 

The typical correlation between alternate 
predictors and alternate criteria in the studies 
considered here is far less than 1.00. Thus the 
Type I error bias is probably not as great as 
in this hypothetical example. However, there 

| is a Type I error bias. If in fact there is no 
true differential validity, the probability of 
rejecting this true null hypothesis is greater 
than .05; that is, when independence assump- 
tions are not fully met, the nominal .05 
alpha level is not the operational alpha level. 

^ The true alpha level is numerically larger, 
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which biases the statistical test in favor of the 
alternate hypothesis that there is true differen- 
tial validity. 

When lack of independence is incomplete— 
which is usually the case—it is typically not 
possible to compute the correct standard 
deviation or degrees of freedom. If it were 
possible, the resultant chi-square statistic 
would of course not have a Type I error bias. 
However, this chi-square would have some- 
what less statistical power than would be 
present if there were no violations of indepen- 
dence to begin with and the usual chi-square 
formulas were used to compute the standard 
deviation and the degrees of freedom. 

Differential range restriction. The above two 
factors tend to produce the false appearance of 
differential validity by violating assumptions 
basic to the chi-square test. One other factor— 
differential range restriction by race—also tends 
to produce artifactual differential validity, 
although it does not violate strictly statistical 
assumptions. Such differential validity is 
artifactual in that it is produced by purely 
statistical factors that have nothing to do 
with substantive black-white cultural differ- 
ences; that is, such apparent differential 
validity will manifest itself even when the 
validities of interest—the applicant population 
validities—are identical for blacks and whites. 

The effect of range restriction on test validity 
was examined in detail by Schmidt et al. 
(1976). Suppose that the correlation in the 
applicant population is .50 but that only the 
top 40% of the applicants were selected and 
that the validity study was done on this group. 
The expected validity on the selected appli- 
cants would be only .31. 

The situation is fundamentally more com- 
plicated when the validities of two groups are 
to be compared, because one must then 
consider the possibility that selection does 
not act equally on the two groups. Suppose, 
for example, that the validity in the applicant 
population is .50 for both blacks and whites; 
that is, suppose that there is no true differential 
validity. Suppose further that the cutoff score 
is set such that 40% of the whites are selected. 
The validity coefficient for the selected whites 
would be lowered to .31 by this artifactual 
restriction in range. Would the same happen 
to blacks? That depends on the mean black 
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test score. If the mean for blacks were the same 
as the mean for whites, then the effect of 
restriction in range would be the same. But this 
is typically not the case. Rather, there is 
usuall a difference of about 1 standard 
deviation between whites and blacks on the 
test. If blacks have a test mean that is 1 
standard deviation below the mean for whites, 
then the cutoff score is not .25 standard 
deviations above the black mean, as it is for 
whites; rather, it is 1.25 standard deviations 
above the black mean. Thus the selection 
ratio for blacks is not 40%, but 11%. Since 
blacks are much more severely selected, their 
test scores are much more severely restricted 
in range. In fact, the expected black validity is 
not .31, but .23; that is, selection at the same 
cutoff score for whites and blacks results in 
differential restriction in range and hence in 
an artifactual difference in the observed 
validities in the selected group.? The informa- 
tion necessary to correct validities for differen- 
tial range restriction is rarely presented in 
studies. 
Would an artifactual difference on the order 
of .30 versus .23 be significant in existing 
differential validity data? In the data reviewed 
for this study, the average Ns per validity pair 
were 140 for whites and 75 for blacks. If one 
considers a single study with these Ns, then 
this difference adds .231 to the chi-square 
(with 1 degree of freedom). This addition is 
nowhere near significant. Thus, this sort of 
artifact is not immediately detectable in 
single studies. But suppose that 866 such 
studies were run, as in this study. Then the 
chi-square pooled over the 866 validity pairs 
would be inflated beyond sampling error by 
866 (.231) = 200; that is, the expected value 
of the chi-square would be 866 + 200 = 1,066. 
This would raise the probability of obtaining 
a significant cumulative chi-square from the 
5% chance level prevailing when studies are 
done with applicant populations to the 99%, 
prevailing when calculations are based on the 
selected applicants. [This calculation takes 
into account the fact that a noncentral chi- 
square has a larger standard deviation. For 
the central chi-square that represents the null 
hypothesis, the expected value would be its 
degrees of freedom, 866, and the standard 
deviation would be [2(866)]; or 41.64. For 
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the noncentral chi-square, which гергеѕепі 
the true state of nature here, the expected 
value would be 866 + .231 (866) = 1,066, an 
the standard deviation would be ([2+4(.231) 
866)! = 50.32.] Had 866 separate chi-squa 
tests been done, this same fact would ha 
taken a different form. If the studies h 
been conducted on the applicant populations, 
then about 595 of the studies would have 
produced a false reading of significant differen- 
tial validity. But in the selected applicant 
population, the probability of falsely detecting 
differential validity (і.е., the probability of a 
significant chi-square) is 7.5%. Over a large 
number of studies, this would lead one to 
falsely conclude that differential validity 
would be found in 2.5% (ie., 7.5% minus 
5.0%) of the corresponding job-test come 
binations. 

Statistical power. In contrast to the situa- 
tion prevailing with respect to Type I error. 
biases, Type II errors are very well controlled: 
in these data. Because the data set contains & 
large number of validity pairs (866) based on 
a much larger number of data points (see 
Discussion section), statistical power is very 
high. The probability of failing to detect 
true differential validity is quite low. Recall 
from our discussion of differential range 
restriction that the power to detect the small: 
7-point correlation difference between .23 
and .30 is .99. Even for the trivial difference 
of .05 (Schmidt & Hunter, 1978), statistical 
power is .77. 3 

In summary, the chi-square test used in 
this study was affected by three sources 0! 
Type I error. Obviously, the usual .05 level 
of significance was not appropriate for use | 
with these data. In the case of the individual 
chi-square tests, for example, the expected 
percentage of tests significant under the null 
hypothesis (no true differential validity), 
using a = .05, was not 5% but some substan- 
tially larger number. Based on the discussion 
above, it is our judgment that a conservative 
criterion would hold that findings of 10% or | 
less disconfirm the differential validity hypoth- 


* Even in unselected groups, tests scores of blacks 
have sometimes been found to be less variable (6.8 
see Shuey, 1958). However, these findings have been 
based on school populations and may not apply (0 
applicant populations. 
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| esis and that proportions in the 595—796 range 
constitute compelling evidence against the 
differential validity hypothesis. 

A second test. In addition to the chi-square 
test, differences in mean validities between 
blacks and whites, within validity intervals 
and across all levels of validity, were examined 
using a critical ratio (CR) test. In this test, 
the error variance terms used were computed 
directly rather than being based on an assumed 
theoretical error distribution, as in the case 
of our chi-square test. This test can be rep- 
resented as follows: 


E(rw — rg) 
СК = — >, 
SDa/Nn -1 


where the numerator is the average difference 
between the black and white validities, SDa 
is the standard deviation of these differences, 
and л is the number of validity pairs. The 
advantage of this test is that departures from 
normality in test score or criterion distributions 
do, not create Type I biases. The test is, 
however, affected by departures from indepen- 
dence and by differential range restriction. 
These effects operate to produce Type I 
biases of the same kind produced in our chi- 
square test, that is, the true alpha level will be 
numerically larger than the nominal alpha 
“level. 


Method 


A careful review of published studies produced 39 
studies reporting employment-test validities separately 
by race, These studies contained a total of 866 pairs of 
validity coefficients for which sample sizes were 
reported. A total of 532, or 61%, of the validity pairs 
were based on subjective criterion measures; 334, or 
39%, were computed using objective criterion measures. 
These figures. are shown by individual study in 
Table 1. All ratings, rankings, and so on, and grades 
in training (when not based on performance mea- 
sures) were considered subjective criteria; perform- 
ance measures such as quality and quantity of out- 
Put, job sample measures of proficiency, errors, 
attendance, tenure, written job knowledge tests, 
and the like were considered objective criteria. 

All coefficients were converted to Fisher's 2, and 
values for the chi-square test described earlier were 
computed for each validity pair. The percentage of 
chi-square values reaching significance was calculated 
for (a) the overall group of 866 pairs, (b) each validity 
interval examined, and (c) the objective and subjective 
criterion measures. These chi-square values were also 
^cumulated to produce overall values within each of 
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these data categories. Critical ratios for level differences 
between blacks and whites were also run within each 
of these categories. 


Results 


The proportion of chi-square values signif- 
icant at the .05 level in the total sample of 866 
validity pairs was .09, which is less than our 
stipulated criterion of .10. The cumulative 
chi-square for the 866 pairs was 1,123.02, 
which is not significantly different from the 
value that might be produced artifactually 
given our earlier example of differential range 
restriction (1,066). Nonnormality of test and 
criterion distributions, induced by selection on 
the test, and violations of the independence 
assumption operated, of course, along with 
differential range restriction to bias the 
obtained chi-square of 1,123 in a Type I 
direction. Because of the large sample sizes, 
both these figures are nominally significant 
beyond the .01 level, but, as shown earlier, 
they are well within the range of results to be 
expected from Type I biases alone in the 
absence of any true differential validity. The 
overall mean racial difference in validity 
was .02 (whites higher). This difference, 
trivial from a practical point of view, is 
nominally statistically significant. 

Figure 3 shows the results relevant to the 
hypothesis, which is illustrated in Figure 2, 
that degree of differential validity increases 
with level of validity. As the reader may 
recall from the discussion of Figure 2, the 
black and white validities (y-axis) are plotted 
separately as a function of the average of the 
two validities (x-axis). For positive validity, 
the curve is quite straightforward: There is 
no apparent evidence for differential validity. 
The only noticeable difference in the two 
curves is found in one point (the highest point) 
based on only 11 validity pairs, and for this 
point the mean black validity is higher than 
the mean white validity. On the other hand, 
the data for negative validities are strange 
and unexpected. All variables were either 
scored such that expected validity would be 
positive or were reflected to achieve the same 
effect. True negative validities were thus 
highly unlikely. Therefore, the negative corre- 
lations for both groups should represent 
sampling error, that is, negative deviations 
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Table 1 1 
Distribution of Validity Pairs Across Studies 
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Criteria 
Study Subjective Objective "Total 
Campbell, Crooks, Mahoney, & Rock (Note 1) 92 46 138 
Campbell, Pike, Flaugher, & Mahoney (Note 2) 18 0 18 
Campbell, Pike, & Flaugher (Note 3) 0 8 8 
Campion & Freihoff (Note 4) 0 10 10 
Farr (Note 5) 
Study 1 5 0 5 
Study 2 0 90 90 
Farr, O'Leary, & Bartlett (1971) 
Study 1 2 19 21 
Study 2 52 8 60 
Farr, O'Leary, Pfeiffer, Goldstein, & Bartlett (Note 6) 46 0 46 
Flaugher, Campbell, & Pike (Note 7) 36 0 36 
Fox & Lefkowitz (1974) 18 9 27 
Gael & Grant (1972) 0 35 35 
Gael, Grant, & Ritchie (19752) 0 11 11 
Gael, Grant, & Ritchie (1975b) 0 11 11 
Grant & Bray (1970) 0 8 8 
Kirkpatrick, Ewen, Barrett, & Katzell (1968) 
Study 1 4 20 24 
Study 2 28 0 28 
Study 3^ 0 6 6 
Study 5 8 8 16 
Lefkowitz (1972) 0 8 8 
Lopez (1966) 4 12 16 
Mitchell, Albright, & McMurry (1968) 1 1 2 
O'Leary, Farr, & Bartlett (Note 8)» 121 0 121 
Ruda & Albright (1968) 0 2 2 
Tenopyr (Note 9) 36 0 36 
Toole, Gavin, Murdy, & Sells (1972) 30 0 30 
U.S. Department of Labor (Note 10)* 9 0 9 
U.S. Department of Labor (Note 11) 1 0 1 
U.S. Department of Labor (Note 12) 1 0 1 
U.S. Department of Labor (Note 13) 1 0 1 
U.S. Department of Labor (Note 14) 1 0 1 
U.S. Department of Labor (Note 15) 1 0 1 
U.S. Department of Labor (Note 16) 1 0 1 
U.S. Department of Labor (Note 17) 1 0 1 
U.S. Department of Labor (Note 18) 1 0 ; 
U.S. Department of Labor (Note 19) 1 0 1 
U.S. Department of Labor (Note 20) 1 0 T 
Wollowick, Greenwood, & McNamara (1969) 12 12 24 
Wood (Note 21) 0 10 10 
Total 532 334 866 


* Included data on Spanish Americans, which were excluded f. poses of thi lysis. 
^ Includes all data not published in Farr, O'Leary, & Bartlett (1971). TR oma 


* Includes one American Indian. 


from a population in which the correlation is 
close to zero. If this were true, then we would 
expect no differences between the curves for 
the negative validity samples. Thus, the 
observed difference (which is significant) 
suggests that something other than routine 


sampling error occurred in the negative 
validity samples. " 
The basic difference in the results for positive 
and negative validity samples was brought 
out by a second analysis: a count of the 
number of validity pairs that were significantly 
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) different, that is, showed significant chi-square 

' values. In the positive validity class, 45 out of 
712 pairs were significantly different, that is, 
6%, which is near even the nominal chance 
level. However, for the negative validity 
class, 32 out of 154 pairs were significantly 
different, that is 21%, which is considerably 
beyond the nominal chance level and perhaps 
even beyond the true chance level for these 
data. 

A search of the data showed that the 
unexpected negative validities were соп- 
centrated in three studies: Farr, O'Leary, 
Pfeiffer, Goldstein, and Bartlett (Note 6), 
O'Leary, Farr, and Bartlett (Note 8), and 
Toole, Gavin, Murdy, and Sells (1972). 
The Farr et al. study contained 46 pairs of 
correlations; 23 of these were based on a 
sample of 31 black and 95 white clerical 
workers. In the group of 95 whites, only 5 of 
the validity coefficients were negative, and 
these were all near 0, ranging from —.05 to 
—.10. In the group of 31 blacks, all 23 
coefficients were negative, and the average 
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j Figure 3. Graphic test of the level-based differential v: 
blacks; W — whites.) 
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value was —.24. In this same study, another 
23 validity pairs were based on a sample of 
51 black and 158 white insurance workers. 
In the group of 158 whites, none of the validity 
coefficients was negative. In the group of 
51 blacks, all but one coefficient was negative, 
and its value was .01. The average of the 
coefficients was —.15. In all cases, these 
validity coefficients were for types of tests 
that are valid in a positive direction in other 
black samples. 

In one substudy of the O’Leary et al. study, 
48 pairs of correlations were presented; 24 
of these were based on a sample of 31 black 
and 60 white clerical machine operators. 
In the group of 60 whites, only 4 coefficients 
were negative, and all of these were near 0, 
ranging from —.02 to —.07. In the group of 
31 blacks, 18 of the 24 coefficients were 
negative, and the average value was —.21. 
In this same study, another 24 validity pairs 
were based on a sample of 24 black and 106 
white “miscellaneous” clerical workers. In 
the group of 106 whites, only 1 coefficient was 
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alidity hypothesis based on all 866 validity pairs. (B = 
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Figure 4. Graphic test of the level-based differential validity hypothesis based on the reduced data set of 781 


validity pairs. (B = blacks; W = whites.) 


negative, and it was —.01. In the black group 
of 24, only 2 coefficients were negative, and 
these were —.01 and —.07. 

Toole et al. presented two sets of data: 15 
validity pairs for younger workers and 15 
pairs for older workers. In the sample of 
younger workers, the white group numbered 
288 and the black group numbered 75. Only 
4 of the 15 coefficients in the group of blacks 
were negative, and these were very small in 
magnitude, ranging from .00 to —.08. The 
sample of older workers contained 121 whites 
and 36 blacks. In the group of blacks, 13 of 
the 15 coefficients were negative. The average 
of the 15 validities was —.17. 

Our interpretation of these results is that 
these negative correlations were probably the 
product of outliers. Consider the following 
parable. Sam is an outstanding worker by 
every criterion. Indeed, he is such a good 
worker that he comes to work even when he 
is sick and should be in bed. It was on such 
a day that the tests were given for the validity 

study. Sam did his best on the tests, but he 


J. HUNTER, F. SCHMIDT, AND R. HUNTER 


Average Validity for 
Blacks and Whites 


was so nauseated that he could hardly think. 
Thus all across the board, the man with the 
highest criterion scores had the lowest test 
scores, This one outlier tended to cancel out 
the positive correlation apparent in the data 
for the other subjects. To make the example: 
more specific, let us assume that had Sam not 
been sick, the observed validity would have 
been .20 for the 30 workers. Sam's job per- 
formance score is 2.5 standard deviations 
above the mean, and had he not been sick; 
his test score would also have been 2.5 standard 
deviations above the mean. However, because 
of illness, his obtained score is 2.5 standard 
deviations below the mean. The observed 
correlation then becomes —.22 instead of 20 
The four samples in which the average validity 
for blacks was negative were all small enough 
to be seriously affected by single outliers and 
were therefore deleted. This step result 

in the elimination of all 46 pairs from Fart 
et al., 24 from O'Leary et al, and 15 pairs 
from Toole et al. In total, 85 validity palts 


were dropped. 
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| Figure 4 shows the comparison of black and 

· white validity coefficients for the 781 pairs of 
validity coefficients remaining after the suspect 
data were deleted. The only point at which 
the curves differ noticeably is the highest 
point, which is based on only 11 pairs, and at 
this point the mean validity is higher for 
blacks than for whites. This figure clearly 
disconfirms the hypothesis that differential 
validity increases with validity level. More- 
over, the two curves are now essentially equal 
for the negative validities resulting from 
sampling error, just as they are equal for the 
studies with positive validity. 

The proportion of chi-square values signif- 
icant at the .05 level was .067 in the reduced 
data set of 781 pairs. With 781 cases, the 
difference between 6.7% and 5% is nominally 
significant (p < .05, binominal test). But this 
small deviation from chance is in fact less 
than would be expected from the combined 
effects of the three known sources of Type I 
bias. The cumulative chi-square value was 
874.4, which is nominally statistically signif- 
icant (р < .01). But, again, the number of 
cases (781) is very large, and the result is 
actually miniscule when compared with that 
predicted from selection-induced skew in 
score distribution, violations of independence, 

and differential range restriction. In fact, the 
expected value of this chi-square in our earlier 
example of the range restriction artifact was 
1.23 (781) = 961. The other two sources 
of Type I bias discussed earlier, of course, 
also operated here. The overall difference 
between the mean validities for whites and 
blacks was zero (to two digits) and the 

j corresponding critical ratio was nowhere near 
significance (CR = —.25, p < .60). Thus these 
"tests, taken together, suggest the absence of 
true differential validity in these data. 

Table 2 shows the results of the three 
statistical tests separately for each level of 
average test validity. For example, in the 
validity category .51-.60, the mean validity 
ior blacks is .58 and the mean validity for 
whites is 49. This mean difference is significant 
(CR = 3.33, р < .002) and leads to a signifi- 
cant chi-square, X2(11) = 23.80. However, а 
check of the number of cases shows that for 
this validity interval, 2 of the 11 pairs are sig- 

y nificantly different; a number this small could 
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see text for discussion of these biases. 
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Table 3 pu- E 
Comparison of Differential Validity Findings for Objective and Subjective Criterion Measures 
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Subjective criterion 


Objective 
Item criterion* All data Reduced setè 
No. pairs 334 532 E 447 
Proportion of xs significant .08* л 1 
оа х 399.3* 723.7* 475.2 
M racial difference іп validity —.01 —.04* 5 


АП data; попе of the validity pairs dropped involved objective criteria. 


ъ Data deleted as described in text. A- 
* Negative values indicate higher white validities. 


+ p < .05 in absence of Type I biases; see text for discussion of these biases. 


be due to chance. Evidence in this direction 
is found in the adjacent column for the validity 
interval .41-.50; here, despite containing 
nearly three times as many cases, the critical 
ratio is not significant. 

The close comparability of mean black and 
white validities across the average validity 
intervals is particularly striking.’ This fact 
is reflected in the values of the critical ratio 
assessing level differences in each of the 
validity intervals (last row of Table 2). Of the 
nine critical ratios, only two were statistically 
significant. One was the previously discussed 
validity interval .51—60, which was based on 
only 11 validity pairs. In the other interval 
Showing significance (.21-.30), the mean 
validity difference between blacks and whites 
(26 — .23 = .03) was trivial in magnitude 
and was not replicated in the intervals on 
either side. 

The proportion of validity pairs in which 
the difference was significant varied across the 
intervals from .00 to 48; the unweighted 
average was .087. Four of the nine cumulative 
chi-square statistics were nominally significant 
(p < .05). 

In interpreting all the statistics in Table 2; 
one must again bear in mind the Type I 
biases that operated. In light of these biases, 
the statistics for validity intervals, like 
the whole-sample statistics discussed above, 
strongly suggest the absence of any true 
(population) differential validity. 

Table 3 summarizes differential validity 
findings separately for objective and subjective 
criterion measures. These findings provide no 
evidence to support the hypothesis that 
differential validity is associated with subjec- 


tive criterion measures. The overall proportio 
of significant chi-squares for subjective criteri, 
was .10 in the complete data set and .06 i 
the reduced set. For objective criteria, th 
proportion was .08 in both data sets. Given thi 
known sources of Type I bias in these data, 
all proportions must be considered in th 
chance range. Interestingly, the only cumula: 
tive chi-square that did not reach significana 
pertained to subjective criteria (reduced da: 
set). The racial difference in mean validiti 
appears to be negligible for both criterion 
types. To conserve space, the graphs of mean 
black and white validity values against average 
validity intervals are not shown separately 
for subjective and objective criteria. Both 
graphs for objective criteria are essentially 
identical to Figure 4. In the case of subjective 
criteria, the graph based on all data is very 
similar to Figure 3, and the reduced data set 
produces a graph essentially identical to 
Figure 4. Thus, the evidence appears [0 
indicate that our overall conclusion about the 
nonviability of the differential validity hypoth- 
esis can safely be generalized across criterion: 


Lypes. 
Discussion 


The complete set of 866 validity pairs 2 
this study reflects 185,487 data points (120,294 
for whites and 65,193 for blacks). The reduced | 
set of 781 validity pairs is based on 173,190 
data points (111,333 for whites and 61,857 
for blacks). Thus, statistical power in these 


з The table computed on the unreduced set of 866 
validity pairs is available from the authors. 
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analyses is almost infinite in comparison 
with usual research standards; that is, although 
Type I error biases are a problem in these data, 
there are no problems of Type II error biases. 
The probability of a Type II error, given even 
trivial validity differences, is close to zero. 
Yet no differences were found beyond those 
expected on the basis of Type I biases plus 
chance. In fact, the differences found were 
smaller than one might expect to find on the 
basis of the combined effect of these two 
factors. These results appear to clearly discon- 
firm the differential validity hypothesis. The 
conclusion must be that tests which rank 
order whites successfully with respect to some 
given criterion also rank order blacks equally 
successfully. 

An argument occasionally heard against 
the earlier findings that single-group validity 
does not exceed chance frequencies may be 
offered in connection with the findings of this 
study. This argument can be summarized as 
follows. The frequency of differential validity 
does not exceed chance levels in the pooled 
data as a whole, and this finding does indicate 
that true differential validity is quite un- 
common. But it may nevertheless be that 
specific instances of statistically significant 
validity differences observed in samples may 
reflect true differential validity in particular 
applicant populations. This argument must 
be addressed in terms of the overwhelming 
evidence. Recall that statistical power in this 
and similar studies is extremely high. If true 
differential validity existed in more than a 
tiny fraction of the applicant populations 
included in the study, it would be detected 
in the analysis of the data as a whole. Yet no 
such effect was observed in the data. The only 
possible conclusion, then, is that the probabil- 
ity is vanishingly small that any specific 
validity difference observed in samples reflects 
a true applicant population difference. 

Some may object to the analyses performed 
on the reduced data set. In our judgment, the 
elimination of the 85 small-sample validity 
pairs (9.8% of all pairs) from the negative 
validity data set has been amply justified, 
but those who may disagree must still account 
for the fact that in the unreduced data set, 
the frequency of differential validity was 
clearly in the chance range (6%), even 
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ignoring Type I biases, in the entire range of 
positive validities. Recall that all predictors 
were either scored such that expected validity 
was positive or were reflected to achieve the 
same effect. True negative validities (e.g., a 
negative population correlation between a 
cognitive ability and job performance), if they 
exist, must be regarded as anomalies. Clearly, 
most psychologists are essentially interested 
only in the positive validity range, and here 
the results for the complete and reduced data 
sets agree in strongly indicating the absence 
of any true differential validity. 

What are the implications of these findings 
for the question of the appropriateness of 
personnel tests for blacks? As indicated earlier, 
lack of differential validity does not assure 
test fairness. For example, under the regression 
definition of selection fairness, regression line 
intercepts can be unequal even given equal 
walidities and equal test and criterion var- 
iances. According to the regression definition 
of test fairness, such unequal intercepts denote 
unfairness to the group with the higher 
intercept when test scores are used in selection 
in the same regression equation for both groups. 
On the other hand, equal validities imply the 
impossibility of all of the forms of test unfair- 
ness specifically stemming from validity 
differences by race. For example, one potential 
cause of differences in regression slopes is 
differences in validity. 

A more important point from a theoretical 
point of view, however, is that most substan- 
tive theories or hypotheses of test bias make 
predictions of differential validity (in addition 
to predictions of, for example, lower test 
means for blacks); that is, hypotheses of test 
bias have been based on the assumption that 
the actual content of tests, having been based 
on the content of white middle-class culture, 
does not mean the same thing psychologically 
to blacks as it does to whites. If this is in fact 
true, one result must be differential validity. 
Consider the following example. Suppose a 
certain perceptual speed test requires the 
examinee to follow complicated written direc- 
tions. If there are differences in English dialect 
so strong that blacks must employ an internal 
translation process, two effects will result. 
First, blacks at a given level of ability in 
perceptual speed will have a more difficult 
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task than whites at the same ability levels 
and will make more errors, leading to lower 
test scores. Second, and more important, the 
factor composition of the test will not be the 
same for the two races. Whereas the test would 
assess only individual differences in perceptual 
speed for whites, it would, in addition, measure 
individual differences in internal translation 
abilities for blacks, The psychological meaning 
of the test scores will not be the same for 
blacks and whites, and thus the pattern of 
validities the test shows for various external 
criteria will differ by race. Thus, our failure to 
find any support for the phenomenon of 
differential validity in this study constitutes 
strong evidence against substantive hypotheses 
of test bias based on the assumption that the 
meaning of test content differs by race. 
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Unfair Discrimination in the Employment Interview: 
Legal and Psychological Aspects 


Richard D. Arvey 
University of Houston 


The psychological and legal literature concerning evidence of bias or unfairness 
in the employment interview with regard to blacks, females, handicapped per- 
sons, and the elderly is reviewed. This review indicates that (a) the interview is 
highly vulnerable to legal attack and one can expect more future litigation in 
this area; (b) the mechanisms and processes that contribute to bias in the 
interview are not well specified by researchers; (c) findings based predominantly 
on resume research show that females tend to receive lower evaluations than 
males, but this varies as a function of job and other situational characteristics; 
(d) little evidence exists to confirm the notion that blacks are evaluated un- 
fairly in interview contexts; (e) a relative dearth of research exists investigating 
interview bias against the elderly and handicapped individuals; and (f) evidence 
concerning the differential validity of the interview for these minority and 
nonminority groups is virtually nonexistent. A number of research needs and 


directions are specified. 


Despite research that indicates that the 
employment interview has limited reliability 
and validity (Mayfield, 1964; Ulrich & 
Trumbo, 1965; Wright, 1969), organiza- 
tional use of the interview in helping to make 
selection and promotion decisions persists. 
The statement of Dunnette and Bass (1963) 
that the personnel interview is the most 
widely used method of selecting employees 
still holds true today. In fact, there is some 
speculation that the employment interview 
may be gaining in popularity because of 
increased court and legal pressures brought 
to bear on employers’ pencil-and-paper test- 
ing practices. In view of the increased like- 
lihood of their employment tests’ being sub- 
jected to legal scrutiny, employers are drop- 
ping the use of tests and placing even more 


This article is a greatly expanded version of a 
chapter in Arvey (1979). 
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reliance on the interview as the major di 
cision-making tool. 

An important but unsettling question n 
mains, however, for those who abandon tests 
and adopt the interview for use in selecting 
and promoting employees. Does use of the 
interview result in unfair discrimination 
against protected minority group members 
This question becomes even more importanti 
when one realizes that the interview is con 
sidered a test under the Equal Employment 
Opportunity Commission (1970) testing 
guidelines for selecting employees. Thus, thé 
employment interview is (and has been) 
subjected to judicial review. Moreover, some 
recent evidence indicates that the interview) 
Process tends to yield judgments and evalua 
tions about minority group members thal 
make it less probable that they will be hired 
or promoted than that nonminority group) 
members will be hired, even though the mem- 
bers of the two groups have substantiall 
equal qualifications. 

I intend to accomplish several objectives 
in the present article: First, I discuss various 
definitions of unfair discrimination in get 
eral and how components of these definitions 
apply to the employment interview; second) 


UNFAIR DISCRIMINATION 


I review the legal literature and case law 
regarding unfair discrimination in the inter- 
view; third, I review the basic processes and 
mechanisms by which minority group mem- 
bers may come to receive lower evaluations 
based on the interview procedure; fourth, T 
review and summarize the research literature 
available concerning the evaluations given 
to minorities as a result of the interview 
process and whether these judgments ex- 
hibit differential validity in predicting job 
success; finally, I summarize trends and offer 
suggestions for future research efforts. 

I selectively focus on the legal and re- 
search literature that concerns four basic 
minority groups: blacks, females, the elderly,* 
and the handicapped." I have chosen to deal 
primarily with these particular groups be- 
cause (a) the existing data base deals almost 
exclusively with these particular subgroups 
and (b) because the principles exemplified 
within the article generalize reasonably well 
to other subgroups. 


Unfair Discrimination: 
Background and Definitions 


As many of us are undoubtedly aware, 
Congress passed the Civil Rights Act in 
1964, and Title VII of this act forbids dis- 
“crimination against individuals in employ- 
ment settings based on factors of race, na- 
tional origin, religion, or sex. The agency 
charged with the interpretation and enforce- 
ment of this act was the Equal Employment 
Opportunity Commission (EEOC). In 1970 
and again in 1978, this commission issued 
interpretative guidelines concerning the use 
and validation of employment tests in or- 
ganization settings to insure the fairness of 
selection devices. As mentioned, the employ- 
ment interview was explicitly included under 
the rubric of tests and was subject to these 
same guidelines. Judicial definition of unfair 
discrimination gained greater precision in 
the now famous Griggs v. Duke Power Com- 
pany* Supreme Court decision. In this de- 
cision, the Supreme Court ruled that the use 
of a general intelligence test and the use of 
a mechanical aptitude test were illegal be- 
cause their use resulted in more whites than 
j blacks being hired (adverse impact) and the 


737 


test was not shown to be valid. The Court 
clearly enunciated a shifting-burden-of-proof 
principle, which applies to almost all litiga- 
tion in this area today. First, an individual 
filing suit must demonstrate an adverse ef- 
fect due to the use of a particular selection 
device) For example, the finding that sig- 
nificantly more minority members than non- 
minority members were rejected for a posi- 
tion on the basis of an interview might be 
sufficient evidence to establish adverse im- 
pact. Subsequently, the burden of proof 
shifts to the employer, who must then estab- 
lish the validity of the selection device. 
Since the interview is considered a form of 
test, the same two-pronged procedure is 
applicable. For an employment interview 
to be judged unfair, first, evidence would 
need to be presented that minority group 
members were evaluated more poorly than 
nonminority members on the basis of the 
interview and that these evaluations resulted 
in an adverse impact, and second, the em- 
ployer would have to fail in establishing the 
validity of the interview. 

It should be noted that another channel by 
which minority group members might chal- 
lenge employment selection practices and the 
interview is by filing suits under the equal 
protection clauses of the Fifth and Four- 
teenth Amendments of the U.S. Constitution. 
Prior to 1975, suits filed under these authori- 


lIndividuals within the ages of 40 and 70 were 
extended protective coverage in the Age Discrimi- 
nation in Employment Act of 1967 and recent 
amendments. 

2 Handicapped individuals were extended cover- 
age by the Vocational Rehabilitation Act of 1973. 

з Іп the past, other agencies (e.g, the Office of 
Federal Contract Compliance, Justice Department, 
etc.) issued separate testing guidelines. More re- 
cently, the various agencies jointly issued uniform 
guidelines on employee selection procedures (Equal 
Employment Opportunity Commission et al., 1978). 

4 Griggs v. Duke Power Company, 3 FEP 175 
(1971). 

510 should be noted that the more recent uni- 
form guidelines stress that evidence for adverse 
impact should be assessed on the basis of the 
total selection process (the “bottom line" approach) 
in contrast with reviewing a single component of 
the process (ie. the interview). It is not yet clear 
which reviewal process will be accepted by the 
courts. 
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ties were reviewed under the same standards 
as Title VII suits. However, in 1975, the 
Supreme Court ruled (in Washington v. 
Davis") that when a suit is filed under 
these amendments, plaintiffs must demon- 
strate that the employer had an intent to 
discriminate (which is typically quite diffi- 
cult to prove). 

The 1970 EEOC guidelines call for the 
examination of whether a testing device (i.e., 
interview) exhibits the same pattern of 
validity for both majority and minority 
group candidates. Thus, an employer must 
determine that the interview is equally valid 
for both segments of the applicant pool. 
Employers are asked to determine whether 
their employment devices are differentially 
valid (Boehm, 1972). However, the more 
recent uniform guidelines (Equal Employ- 
ment Opportunity Commission et al., 1978) 
indicate that instead of simply having re- 
searchers determine whether differential va- 
lidity exists, studies of "fairness" should be 
carried out when feasible. One more sophis- 
ticated psychometric model of fairness calls 
for the investigation of regression line dif- 
ferences between majority and minority can- 
didates (Cleary, 1968); that is, for the inter- 
view to be fair to minority and nonminority 
group members, the regression lines com- 
puted should be equal. 

The more complex psychometric models 
of test fairness need not be presented fully 
here for the purposes of this article for two 
reasons: First, little agreement exists today 
concerning which of the more complex 
models of fairness is the most adequate 
model, Several relatively complex models 
have been advanced (Cole, 1973; Petersen 
& Novick, 1976; Thorndike, 1971) and are 
currently being evaluated and compared 
(Hunter & Schmidt, 1976). Second, research 
on interview fairness has simply failed to 
examine the issue from the Perspective of 
these more sophisticated psychometric mod- 
els, although it would be appropriate to 
do so. 

To summarize, the fairness of the inter- 
view from a legal perspective can be evalu- 
ated from basically two Perspectives: from 
the degree of adverse impact shown by in- 
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terview judgments and írom the degree s 
validity or job relatedness of the intervie 
for majority and minority candidates. 


Legal Aspects of the Interview 


How has the employment interview far d 
when challenged in court by individuali 
claiming that they were discriminated against 
unfairly as a result of an interview? In ak 
most all cases, the shifting-burden-of-prool 
standard is used. Cases in which the use of 
the interview in employment decisions has 
been challenged are shown in Table 1. 

Perhaps the first and most often cited cas 
in this area is Rowe v. General Motors} 
decided by the Fifth Circuit Court of Ap 
peals. The decision actually had to do with 
the performance appraisal used by the com 
pany in determining promotions, but becausé 
the case is used frequently as a controlling 
case, it is important to review it here, Thé 
decision dealt with the subjective nature of 
employment decisions, an obvious component 
of the interview. In this situation, foremen's 
subjective evaluations of hourly employees! 
ability, merit, and capacity were used i 
making promotion decisions. After determin 
ing that adverse impact had occurred regard? 


several ways: First, foremen were given n0 
written instructions pertaining to the qualifi- 
cations necessary for promotion; second, thé 
standards that were determined to be in con- 
trol were vague and subjective. In summariz- 
ing the decision, the court added that 


all we do today is recognize that promotion/trans 
fer decisions which depend almost entirely upo 
the subjective evaluation and favorable recommen 


much of which can be covertly concealed. (Rowe 
v. General Motors, p. 450) 


Thus, this decision casts some doubt on 
employer's use of a subjective decision-mak- 
ing process if such a process resulted in 
verse impact. 


® Washington v. Davis, 12 FEP 1415 (1976). 
7 Rowe v. General Motors, 4 FEP 445 (1972)- 
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In a 1971 decision? the EEOC ruled that 
an employer's decision not to hire a black 
woman because of her “poor attitude" dur- 
ing the interview was in violation of Title 
VII. In this case, the EEOC cited a number 
of court decisions to establish that if dis- 
criminatory impact of a hiring system is 
shown, "it is essential that the system be 
objective in nature and be such as to permit 
review" (EEOC Decision No. 72-0703, p. 
437). 

The employment interview was also given 
court scrutiny in Hester v. Southern Railway 
Company? A black female applicant was 
denied a clerical job partly as a result of 
the interview process. Although the district 
court ruled that an adverse impact had been 
demonstrated and that the interviewing pro- 
cedure was faulty because of its subjective 
nature and because it was based on “no 
formal guidelines, standards and instruc- 
tions," the court of appeals overturned the 
decision because there was no clear proof 
that these selection procedures (tests, inter- 
views, etc.) had resulted in adverse impact, 

A more recent case involved a court de- 
cision that struck down the use of the inter- 
view used in hiring teachers. In United States 
v. Hazelwood School District) the court 
noted that the subjective interview process 
used in making selection decisions was simi- 
lar to the vague and subjective criteria used 
in Rowe v. General Motors, It is instructive 
to read what the court said in this instance: 


Principals are free to give whatever weight they 
desire to subjective factors in making their hiring 
decisions, Indeed, one principal testified that inter- 
viewing an applicant was "like dating a girl, some 
of them impress you, some of them don't" . . . 
No evidence was presented which would indicate 
that any two principals apply the same criteria— 
objective and subjective—to evaluate applicants, 
tered States v. Hazelwood School District, p. 


An interesting 1976 case (Weiner v. 
County of Oakland™ dealt specifically with 
the kinds of questions asked in the interview 
and their possible bias. Mrs. Weiner applied 
for the position of intermediate planner and 
was given an oral interview that apparently 
was scored in some systematic way. Although 
Mrs. Weiner was ranked third on the list 
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of eligible applicants, four men were hired 
to fill the available positions. As grounds for ) 
not hiring Mrs. Weiner, the county was only 
able to suggest some doubt about the flexi- 
bility of her approach to planning. | 
The court ruled that Mr. Weiner had dem- 
onstrated an adverse impact. At this point, 
the burden of proof shifted to the county, 
which had to prove that it hac valid business 
requirements justifying its conduct. The 
county attempted to defend the use of the 
interview by asserting that the decision 
reached was based on subjective evaluations 
made during the interview that were in no 
way the product of sex discrimination. 
The court, however, reviewed the kinds of 
questions that were asked of Mrs. Weiner 
and found that they were suggestive of bias 
against women. Questions such as whether 
her husband approved of her working, 
whether her family would suffer if she were 
not home to prepare dinner, and whether she 
was able to work compatibly with young, ag- 
gressive men were asked. The court ruled 
that these kinds of questions during the in- 
terview, along with other facts, were suf- 
ficient to substantiate the charge of discrimi- 
nation and awarded back pay and attorney’s 
fees. r 
A recent case provided sufficient evidence. 
to the court regarding the interview to sur- 
vive challenge. Harless v. Duck? involved 
a situation in which a woman brought a class 
action suit against a midwestern police de- 
partment charged with discrimination im hir- 
ing because of sex. The department used, 
along with several other tests, a structured 
oral interview that consisted of approxi | 
mately 30 questions posed to each candidate 
by a team of interviewers. The questions 
were designed to determine an "applicant's 
communication skills, decision-making and 
problem-solving skills, and reactions to stress 


" 


8 Equal Employment Opportunity Commission 4 
Decision No, 72-0703, 4 FEP 435 (1971). 

9 Hester v. Southern Railway Company, 
FEP 646 (1974). 

10 United States v. Hazelwood School District, 11 
EPD 10854 (1976). 

11 Weiner v. County of Oakland, 14 FEP 380 
(1976). 

12 Harless v. Duck, 14 FEP 1616 (1977). J 
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situations.” It was determined that 43% of 
the females failed the oral interview, com- 

pared with 15% of the males. After some 
discussion of proper sample sizes for detect- 
ing significant differences, the court deter- 
mined that the discrepancy in pass rates 
was significant and that the interview did 
indeed have a discriminatory or adverse 
effect. In defending the validity of the inter- 
view, the organization relied on two sources 
of evidence: 

1. The oral interview had construct and 
content validity. The expert witness for the 
organization testified that the structured oral 
interview portions of the exam, which con- 
sisted of hypothetical questions simulating 
situations likely to be encountered by pa- 
trolmen, measured several dimensions iden- 
tified through job analysis that differentiate 
among persons who would be better patrol 
Officers if put in a position to perform patrol 

» functions. 

2. A significant relationship between per- 
formance in the interview and performance 
at the police academy was shown. A previous 
Supreme Court decision (Washington v. 
Davis; see Footnote 6) had affirmed the use 
of measures of training success as legitimate 

. Criteria against which to validate a selection 
instrument. The court found this evidence 
sufficient to demonstrate the validity of the 
interview. 

Another recent case is King v. New Hamp- 
shire Department of Resources and Economic 
Development;* in which a court of appeals 
found that the questions asked of a female 
who applied for the job of a state meter 
patrol officer helped to establish that dis- 
criminatory intent had occurred. In this in- 

"stance, a female applicant was asked 
“whether she could wield a sledge hammer, 
whether she had any construction industry 
experience, and whether she could ‘run 
somebody in’” (p. 670), none of which were 
related to the job in question. The court 
indicated that the employer's discriminatory 
intent was proved largely by its own words 
and actions. У 

Finally, in Bannerman v. Department ој 
Youth Authority?* the use of a panel inter- 
view was challenged. Candidates applying for 

| a parole agent job were interviewed by a 
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panel of three interviewers who were asked 
to judge each candidate in relation to stated 
“critical class requirements" (e.g., demon- 
strated ability to relate to youths and to 
gain their respect and confidence). However, 
the plaintiffs were not able to demonstrate 
to the satisfaction of the court any discrimi- 
natory bias against women; that is, there 
was no statistically significant difference be- 
tween the pass rates of the males and females 
interviewed by the panel. 

It is somewhat surprising that more cases 
dealing with interviews have not been liti- 
gated. It seems apparent that one direction 
in which the courts are moving is toward 
the exploration of the nature of and kinds 
of questions asked and the information elic- 
ited in the interview in more depth; that is, 
the content of the interview is being more 
fully examined. For example, inquiries dur- 
ing the interview that might convey to the 
applicant the impression that persons in a 
protected class will be discriminated against 
will now be viewed as discriminatory. In 
one case (cited in Babcock, Freedman, Nor- 
ton, & Ross, 1975), the EEOC and the New 
York Human Rights Commission concluded 
that a New York law firm had violated Title 
VII when interviewers emphasized to female 
applicants that the firm had only one female 
lawyer and that she was assigned to an area 
of work traditionally performed by women. 
The conclusion was that “the interviews are 
conducted in such a manner as to express a 
preference for men and to discourage women 
from pursuing employment with respondent 
firm" (Babcock et al., 1975, p. 380). 

Among these same lines, the specific kinds 
of information elicited on application forms 
and during interviews are being litigated. 
Managers are currently confused about what 
they may or may not ask during an inter- 
view. Although I do not provide a review 
of these more specific interview inquiries, it 
should be noted that litigation revolves 
around two basic themes: 


13 King v. New Hampshire Department of Re- 
sources and Economic Development, 15 FEP 669 
(1977). 

1* Bannerman v. Department of Youth Authority, 
17 FEP 820 (1977). 
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1. Do particular kinds of questions convey 
an impression of an underlying discrimina- 
tory attitude or intent? That is, references 
to “girls” and inquiries into non-job-related 
areas such as marital status, parenthood, 
child care, and so on, when these same 
questions are not presented to male candi- 
dates, may be sufficient to convince a court 
that discriminatory animus or intent is 
operating. 

2. Does the inquiry operate in a way that 
demonstrates a differential impact or ad- 
verse effect on protected groups? If so, is 
the particular information valid or job re- 
lated? Thus, organizations should avoid in- 
terview questions that operate in such a way 
as to differentially affect minority groups, 
unless such questions are job related. 

Guidelines concerning preemployment in- 
quiries have been set forth by a variety of 
state human rights commissions as well as 
by a set of guidelines issued by the EEOC. 

For example, the Washington State Human 
Rights Commission (1979, pp. 2923-2926) 
stated the following to be unfair and illegal 
preemployment inquiries when they cannot 
be shown to be job related: (a) all inquiries 
related to arrests; (b) any inquiry concern- 
ing citizenship; (c) specific inquiries con- 
cerning spouse, spouse’s employment or sal- 
ary, children, child care arrangements, or de- 
pendents; (d) overgeneral inquiries (e.g., 
"Do you have any handicaps?"—which 
would tend to divulge handicaps or health 
conditions that do not relate to fitness to 
perform the job); (e) whether the applicant 
is married, single, divorced, engaged, wid- 
owed, or any other inquiry as to marital 
status; (f) type or condition of military dis- 
charge; (g) any questions related to preg- 
nancy; and (h) whether applicant owns or 
rents a home, 

To summarize, although there have not 
been an overwhelming number of lawsuits 
involving the discriminatory nature of the 
employment interview, the litigation that has 
evolved clearly indicates that the interview 
is vulnerable to such suits. Interviews will 
indeed be treated like tests and reviewed by 
these same standards. I predict a greater 
number of suits in this area during future 
years. Organizations may find themselves 
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even more ill equipped to defend the inter- 
view because of the little attention paid to | 
quantifying interview judgments ог conduct- 
ing research to determine the reliability, 
validity, or adverse effects of the interview 
process. 


Processes Inherent in Interview That 
Potentially Contribute to Differential 
Evaluation 


The interview process is highly subjective. 
Schmidt (1976) has summarized a myriad 
of factors that contribute to interviewers’ 
judgments and evaluations. Because of its 
basic subjective nature, the interview process 
is vulnerable to the personal biases, preju- 
dices, and stereotypes of interviewers, thus 
making it open to challenge from civil rights 
litigants. Interviewers may form poorer eval- 
uations of minority group members than of 
nonminority candidates, even when the can- 
didates are substantially similar with regard 
to their job qualifications. Just how or why 
these differential evaluations are made is not 
well-known, There do appear to be two pos- | 
sible mechanisms that contribute to differ- 
ential evaluations: (a) stereotyping and (b) 
differential behavior emitted during the inter- 
view. 


Stereotyping 


Decisions that are made on the basis of 
the interview are generally subjective in na- 
ture and thus susceptible, according to some 
individuals, to the influence of stereotypes. | 
However, Brigham (1971) indicated that 
there is a good deal of confusion concerning, 
the precise definition of stereotypes and 
their correlates. Most definitions (e.g., Lipp- 
man, 1922) revolve around the notion that 
stereotyping involves making judgments 
about people on the basis of their member- 
ship in a particular group. Once an individ- 
ual’s membership in a particular class Of 
category is established (e.g., race, sex, 280 
etc.), a number of trait characteristics ate) 
ascribed to the individual based on the traits 
associated with the larger class of which he 
or she is a member. Thus, stereotyping 1 
volves basically two processes: (a) the for 
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mation of impressions and trait descriptions 


. of particular classes and categories of in- 


dividuals and (b) the assignment of these 
traits to a particular individual once his or 
her membership in that class or category is 
known. 

For example, Rosen and Jerdee (1976a) 
indicated that common stereotypes ascribed 
to males include such trait descriptions as 
adventurous, competitive, objective, domi- 
nant, decisive, and rough, whereas females 


' are commonly described as compassionate, 


dependent, submissive, emotional, and so 
forth. 

While many researchers have character- 
ized stereotypes as the product of a rigid 
and faulty reasoning process that operates to 
help individuals rationalize and justify their 
hostility and prejudices, Brigham (1971) 
and Hamilton (1976) have suggested that 
stereotypes are not unlike other normal cog- 
‘nitive functioning processes. Hamilton indi- 
cated that stereotypes are “unfounded over- 
generalizations,” in that the perceiver does 
not use the available information in an opti- 
mal manner and bases his or her conclusions 
about a particular social group on poor evi- 
dence. In a similar vein, Katz (1960) has 
delineated several useful functions that ster- 


.,eotypes fulfill for different people (instru- 


1 


mental, ego-defense, value-expressive, and 
order functions). 

It is interesting to note that although 
the notion of stereotyping is frequently in- 
voked to explain why differential evaluations 
Occur during interviews, the precise nature 
of how stereotypes operate and produce these 
different evalutions is not well specified. 
There appear to be at least three lines of 
speculation concerning this process. First, 
the stereotypes or trait characterizations may 
be essentially negative in nature; that is, 
the traits ascribed to a minority group and 
individuals from that group may consist of 
basically negative tones (e.g, blacks are 
dirty, uneducated, unintelligent, etc.). Thus, 
essentially negative attitudes and opinions 
Concerning particular minority groups may 
have their basis in these stereotypic charac- 
terizations. Individuals holding essentially 
negative opinions and attitudes toward 
minority group members might be expected 
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to give lower evaluations in an interview 
because of these attitudes. For example, Ter- 
borg and Ilgen (1975) found that attitudes 
toward women correlated significantly with a 
subject's decision to hire a female engineer. 
Individuals with essentially negative atti- 
tudes about women were less likely to give 
favorable evaluations than were those with 
more positive attitudes. 

Similarly, Britton and Thomas (1973) 
gathered opinion data from 56 employment 
interviewers and found that they ranked 
older individuals as more difficult to place 
and train and as more slow in maintaining 
production. In addition, females were seen 
as more likely to be absent from work and 
as less skilled than men. Finally, older women 
were viewed as having fewer saleable skills 
than older men, but younger men and women 
(aged 18 years) were seen as having basi- 
cally the same saleable skills. These negative 
opinions may lead to lower evaluations of 
older individuals and women in interview 
situations. 

A second and more indirect manner by 
which stereotypes may affect interview evalu- 
ations is based on the process of matching 
the stereotypic traits with the character- 
istics thought to be necessary to perform the 
job. If a relatively poor match results, an 
interviewer may reject the candidate out- 
right or give lower evaluations. Note that 
this process does not necessarily imply that 
the stereotypic traits are negative—indeed, 
they may even be positive in nature. Evalu- 
ations are potentially inaccurate (a) because 
of the inaccuracy of the stereotypes attrib- 
uted to the individual, (b) because of the 
inaccuracy of the characteristics deemed nec- 
essary to perform the job, or (c) because the 
matching process itself is a poor decision 
strategy. 

The research conducted by Schein (1973, 
1975) is particularly representative of this 
second viewpoint. Schein (1973) asked 300 
male managers to indicate which of 92 ad- 
jectives best described (a) women in gen- 
eral, (b) men in general, or (c) successful 
middle managers (each manager described 
only one of these three subgroups). She 
found that the relationship between the 
average descriptions of the middle manager 
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and the average description of the males was 
much higher (7 = .62) than the relation- 
ship between the descriptions of the man- 
agers and of the females (r — .06). In fact, 
on 60 of the 92 items, the descriptions of 
the managers were more similar to the de- 
scriptions given to men than to the descrip- 
tions given to women. These-findings suggest 
that even before an interview or any formal 
selection process has begun, the perceived 
similarity between the characteristics of suc- 
cessful managers and of men in general in- 
creases the probability of a male's rather 
than a female's being given a higher evalua- 
tion. These results were also replicated with 
a group of 167 female managers who also 
described the various subgroups (Schein, 
1975). In this study, 167 female middle 
managers rated women in general, men in 
general, or successful middle managers on 
the same 92 adjectives. Like the male man- 
agers in the previous study, the female man- 
agers provided descriptions of successful 
middle managers that were far more similar 
to descriptions of men than to descriptions 
of women. The results suggest that female 
managers are as likely as male managers to 
accept stereotypical male characteristics as 
the basis for success in management. 

A third manner by which stereotypes pos- 
sibly operate is to shape the kinds of ex- 
pectations and standards that interviewers 
have of job candidates during the interview. 
For example, it may be that an interviewer, 
after learning that the person to be inter- 
viewed next is female, evaluates the candi- 
date on a different set of criteria—for ex- 
ample, beauty, typing skills, poise, and so 
on—than used for evaluating a male candi- 
date. A study by Cecil, Paul, and Olins 
(1973) attempted to identify the qualities 
perceived to be important for male and fe- 
male applicants for the same job. Over 100 
subjects indicated the importance of each of 
50 variables (e.g., pleasant voice, expres- 
sive self-will, etc.) to interviewers evaluating 
either a male or a female job candidate. 

The results indicated that the kinds of 
standards and criteria used to evaluate the 
candidates depend on whether an applicant is 
male or female. The standards used to evalu- 
ate females are more clerical and cosmetic 
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in nature, whereas the standards for males 
are along aggressive and persuasive dimen- 
sions. 

Stereotypes obviously are also associated 
with race. Despite the Civil Rights Act of 
1964, stereotyping of blacks persists today, 
For example, Karlins, Coffman, and Walt 
(1969) compared the racial stereotypes ol 
undergraduate students in 1967 with 
stereotypes reported in 1933 and 1951. Sub- 
jects in all three studies indicated those trai 
thought to be typical of blacks. Although’ 
the data indicated a consistent trend over 
the 25-year period toward giving more favor 
able characterizations of blacks, a consider 
able amount of stereotyping still exists. The| 
students in the study described blacks as 
lazy, superstitious, musical, ostentatious, an 
pleasure loving. Considerably more informa. 
tion concerning the nature of racial and eth- 
nic stereotypes can be found in Jones (1972) 
and Brigham (1971). 

The nature of age stereotypes and how 
they relate to the job have recently received 
some research attention. Rosen and Jerdee 
(1976c) asked over 100 business students 
and realtors to imagine that they were go 
ing to meet two individuals for the fis! 
time and that the only information they had 
about the two people was that one was 3 
years old and the other 60 years old. The 
participants in the study then considered 65 
characteristics and indicated the degree to 
which each characteristic described the aver: 
age 60-year-old male and the average 30 
year-old male. Significant differences wel 
found on several characteristics and are sum 
marized as follows: The 30-year-old male 
was described as significantly more produc 
tive, efficient, and motivated, more capabl 
of working under pressure, more ambitio 
eager, and future oriented, more receptive 
new ideas, more adaptable, and more verse 
tile. The 60-year-old male was described 
significantly more accident prone and то 
rigid and dogmatic. a 

Clearly the traits and characteristi 
ascribed to the older group are not particu 
larly favorable. Rosen and Jerdee (19760) 
noted that these stereotypes are also not со 
sistent with objective research findings and 


| 
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changes associated with aging (Schaie, 
· 1974). 

To summarize, the specific nature of stere- 
otypes that interviewers hold concerning dif- 
ferent minority groups may well influence 

their evaluations of these candidates during 
the interview process. To the extent that the 
stereotypes are basically negative, deviate 
from the perception of what is needed for 
the job, or translate into different expecta- 
tions and standards of evaluation for minor- 

'ity group members, stereotypes may well 
have the effect of lowering the evaluations of 
interviewers, even when the candidates are 
equally qualified for the job. 

Terborg (1977) has sounded a warning 
against the too quick acceptance of the no- 
tion that the pervasive effects of stereotypes 
explain or account for all differences in treat- 
ment by men and women managers. He sug- 
gested alternative explanations that may ac- 
count for apparent discrimination. One al- 
ternative he posed is that many women are 
truly not yet qualified for managerial posi- 
tions: "This is not meant to suggest that 
women lack potential for the job, but that 
the cumulative effects of past discrimination 
have prevented women from gaining the nec- 
essary skills and experience" (Terborg, 1977, 

нр. 649). 


Differential Behavior of Interviewers 


One additional explanation for different 
evaluations of minority group candidates as 
a result of interviews is that minority appli- 
cants may behave in a manner that seems 
different and unfamiliar to interviewers. Hall 
(1966) specifically argued that subcultural 
differences in nonverbal behavior have re- 
sulted in whites’ misreading of black appli- 
сапіѕ and therefore in blacks’ failure to get 
jobs; that is, it is possible that blacks emit 
verbal and nonverbal behaviors (e.g. jive 
| talk) that are acceptable and even desirable 

in their subculture, but that these same be- 

haviors are misinterpreted or confused by @ 
white interviewer. An interesting study was 
conducted in this regard by Fugita, Wexley, 
and Hillery (1974), White and black (20 
lof each) female undergraduate students were 
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asked to participate in an employment inter- 
view. Acting as the interviewers were 2 black 
and 2 white males; eight questions were 
asked of the interviewees. Results of the 
study showed that the black interviewees 
maintained significantly less eye contact with 
both white and black interviewers. The least 
amount of visual interaction occurred when 
a black interviewee was interviewed by a 
black. Black interviewers were also given 
shorter glances than were white interviewers 
by both black and white interviewees. In 
sum, the race of the interviewee and of the 
interviewer appeared to be significant fac- 
tors in the kind and amount of behavior that 
occurred during the interview. What would 
have also been of great interest is whether 
these differences in behavior also produced 
differences in evaluations by the interviewers. 

It is possible that women, older workers, 
and handicapped individuals also differ in 
their reactions during interviews, which could 
contribute to lower evaluations. For ex- 
ample, older job candidates may be more 
thoughtful and cautious in their interview 
responses than are younger candidates. Ster- 
rett (1978) conducted a study in which 100 
male and 60 female managers evaluated 
videotapes of a male applicant in which the 
intensity of nonverbal body cues was manip- 
ulated. Results indicated that the male and 
female interviewers reacted significantly dif- 
ferently to the different kinds of body lan- 


guage. 


Research Findings Concerning Different 
Evaluations in the Employment Interview 


In the previous section, it was learned that 
one component of unfair discrimination from 
a legal perspective is whether the evaluations 
stemming from the interview result in an 
adverse impact on protected groups. In the 
present section, the various research studies 
that have investigated whether equally quali- 
fied minority group members receive lower 
evaluations on the basis of interviews are re- 
viewed. These studies have generally em- 
ployed one of three kinds of research 
strategies. 
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Resume Studies 


In this kind of study, subjects (students, 
managers, recruiters, etc.) are asked to re- 
view a series of job resumes and to deter- 
mine the suitability of each of the candi- 
dates for employment and/or the starting 
wage that might be offered. Each resume 
usually contains information about educa- 
tional background and past experience, with 
a glossy photograph attached. In the typical 
study, the race, sex, age, or handicapped 
status of the job candidates can be manipu- 
lated through the photograph and the name, 
which is also printed on each resume. Thus, 
half the interviewers might be asked to make 
evaluations concerning, for example, five 

males and five females. The other half of 
the interviewers might be asked to evaluate 
five males and five females, with the only 
change being that the names and photo- 
graphs are switched so that no change what- 
soever occurs with regard to the qualifica- 
tions of the candidates; the only changes 
made are in regard to sex, race, and so on. 
Obviously, the interviewers are unaware that 
the resumes they are evaluating may differ 
from those being evaluated by other inter- 
viewers. In addition, some studies have added 
several other variables such as applicant 
attractiveness, type of job, and so forth to 
determine if these characteristics may inter- 
act somehow with minority status and thus 
influence the evaluations given. 

Another strategy researchers use is to have 
subjects evaluate and rate several resumes 
that vary by sex, race, and so on. Care is 
taken to equate the resumes with relevant 
characteristics other than the variables stud- 
led. One potential problem, then, with this 
within-subject design is the degree to which 
potentially confounding variables are effec- 
tively controlled. 

Because many of the studies in this area 
use this resume approach, they obviously do 
not involve face-to-face interviews with real 
people. Instead, subjects are asked to evalu- 
ate “pencil-and-paper” people for jobs in 
organizations that are often not well de- 
scribed. One must infer that any discrimina- 
tory effects found in these studies generalize 
to actua] interview circumstances, 
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“In-Basket” Study 


In this design, participants in the study 
(students, managers, etc.) are asked to as- 
sume the role of a personnel director or man 
ager who works through an “in-basket” and 
must take action and react to a number of 
items in memorandum or letter form. Typi. 
cally, each in-basket provides information 
about the various departments in the ог 
ganization and information about the mem- 
bers of the organization (e.g., performance 
appraisal data, attendance information, etc.) 
Also contained in each participants in-basket 
are a series of different types of personnel: 
problems. The participants are asked to 
make some kind of decision based on «ће 
information given. One of the decisions рге 
sented is the hiring or promotion of a par 
ticular individual. The problems are written] 
in two or more versions so as to change 
the sex, race, age, and so on of one or тог 
of the characters in the problem. For ex 
ample, one participant may receive an in- 
basket set in which the problems involve. 
whether a male should be hired or promoted, 
whereas another participant may receive the 
exact same information, except that the sex 
of the person being considered for hire ог 
promotion is female. J 

Other kinds of problems are presented in 
addition to hiring and promotion. For ex 
ample, training and development opportuni- 
ties, leaves of absences, job assignments, and 
so forth are investigated with regard 10 
whether participants make different kinds 
of decisions based on the sex, race, and 50 
on of the individuals depicted in the in- 
basket materials. 


А 


Videotape Studies or Field Experiments 


Less frequent studies employ designs M 
which actual minority and nonminority 1 
terviewees are observed by interviewers either 
face to face or in videotape presentations 
Typically, interviewers (subjects, managers, 
etc.) interview or observe only one job can 
didate, who is either a minority or a nom 
minority member, and make evaluatio 
about the suitability of the candidate fo 
hire. Efforts are made to control the content, 


of the interview to insure that the same 
| questions are asked and that similar re- 
sponses are delivered by the interviewees. 
Table 2 summarizes the results of 23 stud- 
ies that have investigated bias in the inter- 
view. As can be seen, most studies employed 
the resume approach, the second most fre- 
quent strategy was the in-basket strategy, 
and finally, the videotape or in vivo type 
of study was used in only three instances. 
Although there is considerable variability 
| in the sample sizes used across the various 
studies, the majority of designs (19) in- 
cluded more than 75 subjects. Thus, the 
studies were in general reasonably powerful 
with regard to detecting significance (Hayes, 
1973). 


Applicant Sex 


The studies in Table 2 show in a reason- 
ably consistent fashion that females are gen- 
'erally given lower evaluations than males 
where these candidates have similar or iden- 
tical qualifications. 

Dipboye, Fromkin, and Wiback (1975) 
gave 30 male professional interviewers and 
30 male undergraduates 12 resumes to evalu- 
ate and rate for the position of head of a 
furniture department. Applicants' sex, physi- 

-4Cal attractiveness, and scholastic perform- 
ance were manipulated. Among other find- 
ings, a main effect (Е = 15.84, p < .01) for 
applicant sex was found in which male ap- 
plicants were evaluated higher than were 
female applicants. However, this effect ac- 
counted for only 1% of the rating variance. 
One potential confound in this study was 
that the description of the furniture depart- 
„ment job included several references to 
“him,” which perhaps connoted to subjects 
that male applicants were preferred. 

Dipboye, Arvey, and Terpstra (1977) pro- 
vided a partial replication and extension of 
the aforementioned study. College students 
(N = 110) evaluated 12 resumes for а man- 
agement-trainee sales position. Besides the 
qualifications of the ratees, subjects’ sex and 
attractiveness were investigated. In addition 
to other findings (summarized below), ap- 
plicant sex demonstrated a significant main 
effect on an employability rating. Raters ex- 
pressed significantly more willingness to hire 
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a male than to hire a female applicant (F = 
20.44, p < .001). Again, however, this effect 
accounted for a minute proportion of the 
variance in ratings (.006). 

Haefner (1977) asked 286 managers to 
rate 16 resume profiles in which applicant 
characteristics varied by sex, age, race, and 
competence. All applicants were described 
as being disadvantaged. In addition to sev- 
eral interactions (discussed below), sex ex- 
hibited a significant main effect (F = 78.53, 
р < 01) in which males were given higher 
recommendations for hire than were females. 
A factor that might have influenced the 
results of the study is that the presentation 
of the stimulus profiles and subsequent eval- 
uations took place over the telephone. The 
effects, if any, of using this data collection 
procedure are not known. 

Rosen and Jerdee (1974a) used 235 male 
undergraduate subjects to evaluate male or 
female candidates for jobs with demanding 
versus routine job requirements. Females 
were rated significantly lower than were 
males on an overall rating (F = 14.02, p< 
.01). 

Zikmund, Hitt, and Pickens (1978) varied 
sex and scholastic performance of applicants 
in a resume study in which 100 personnel 
directors were asked to reply to letters from 
the candidates that expressed interest in an 
accounting job. The dependent variable was 
the number of replies received. Females re- 
ceived a significantly lower number of re- 
plies (х = 4.5, р < .05). In addition, the 
replies that were received by the females 
were significantly less positive in nature than 
were those for other candidates. 

Several studies, however, failed to demon- 
strate main effects for applicant sex. Fidell 
(1970) asked 147 psychology department 
chairpersons to evaluate the chances of 10 
candidates’ getting an offer for a full-time 
position, Sex of the candidates was varied. 
Although the mean ratings of male and fe- 
male candidates did not differ, women were 
offered positions at significantly lower levels 
than the positions offered males. 

Using an in-basket methodology, Terborg 
and Illgen (1975) asked 36 male under- 

(text continued on page 754) 
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graduates to evaluate male and female job 
candidates and found that while there was 
no difference in hire rating on the basis of 
applicant sex, females were assigned a sig- 
nificantly lower starting salary than were 
identical male applicants (F = 4.51, p< 
:05). This study is somewhat limited, how- 
ever, by the small sample size and the use 
of college undergraduates as subjects. Dip- 
boye and Wiley (1977) showed videotapes 
of aggressive and passive male and female 
applicants to 66 college recruiters and found 
that while aggressive candidates were evalu- 
ated significantly higher than passive candi- 
dates on an employability scale, there were 
no differences between the male and female 
interviewees. However, the recruiters per- 
ceived the females' overall qualifications and 
their experience and training as superior to 
those of the males. One possible problem 
associated with this study is that only one 
particular stimulus male and stimulus female 
were presented in the videotape conditions. 

Thus, the observed effects may have been 

unique to the specific individuals presented. 

Renwick and Tosi (1978) asked 80 male 
and female graduate students to evaluate 
resumes for two managerial jobs; all appli- 
cants were portrayed as exceptionally well 
qualified. The applicant variables manipu- 
lated were sex, marital Status, undergraduate 
major, and graduate degree. Each subject 
reviewed and evaluated 10 profiles. No evi- 
dence of differential evaluations as a func- 
tion of applicant sex was found, except as 
à component of a four-way interaction. 

A study by Muchinsky and Harris (1977) 
yielded results somewhat Opposite to the 
previous findings. College students (N= 
100) evaluated resumes that varied by ap- 
plicant sex and qualifications for three dif- 
ferent jobs (mechanical engineer, day-care 
person, and copy editor). Although numerous 
two- and three-way interactions were ob- 
served, a main effect (F=6.77, p< -95) 
for applicant sex occurred in which females 
were given higher ratings than males, 

A similar finding was observed in a re 
lated study by Kryger and Shikiar (1978), 
Personnel managers (№ = 75) evaluated let- 
ters of recommendation of male and female 
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candidates for a management-trainee job, 
Sex of the author of the letter of recommen- 
dation was manipulated, as was the favor- 
ableness of the letter. Subjects rated appli- 
cants in terms of whether the applicant 
should be interviewed. Results indicated that 
subjects were far more likely to proceed with 
an interview with female applicants than 
with male applicants (F= 5.3, p< 05), 
Kryger and Shikiar interpreted their results 
as being a function of affirmative action con 
siderations by personnel managers. It should 
be noted that this study has more to do 
with whether an applicant is granted an 
interview than with decisions resulting from 
the interview itself. Although more females 
may be granted interviews, the decisions re- 
sulting from actual interviews may be un- 
favorable for women, as noted above. 

In addition, several studies have revealed 
that both male and female raters give lower 
evaluations to female applicants. Studies by 
Dipboye et al. (1975) and Dipboye et al. 
(1977) showed that the observed effects are 
not confined only to male interviewers. How- 
ever, when sex of the rater was investigated 
in the Rose and Andiappan (1978) and 
Muchinsky and Harris (1977) studies, sig- 
nificant main effects were observed in which 
female raters gave significantly higher rat- 
ings to applicants of both sexes than did 
male raters. Thus, there is some indication 
that female raters are more lenient than 
male raters, which corresponds to the find- 
ings of London and Poplawski (1976), in 
Which females were observed to give more 
lenient evaluations of stimulus objects in 4 | 
study involving the formation of stereotypes. 
However, Renwick and Tosi (1978) found’ 
no evidence of a sex-of-interviewer effect. 

In addition to simply searching for main 
effects, research has focused on several vari- 
ables that are predicted to interact with ap- 
Plicant sex in influencing the evaluations 
given. The variable given the most attention 
is the type of job for which the candidates 
are considered. Typically, a prediction is 
made that females will be given lower eval- 
uations compared with males when being 
considered for jobs assumed to be masculine 
in nature, that is, jobs that are either рге 
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| dominantly held by males (Muchinsky & 

+ Harris, 1977) or jobs that reflect demanding 
or challenging activities (Rosen & Jerdee, 
1974b). What is proposed is a sex-congru- 

 ency model whereby a situation or job is 

sex typed as being more appropriate for a 

male or a female and thereby influences the 

| evaluation given. 

| To date the research provides fairly con- 

| sistent evidence confirming the sex-congru- 

| ency notion. Studies by Shaw (1972), Cohen 

' and Bunker (1975), Cash, Gillen, and Burns 
(1977), and Rosen and Jerdee (1974b) 

| show significant interactions between sex and 

| type of job in their influence on evaluations. 

· However, the study by Muchinsky and Har- 
ris (1977), in which individuals evaluated 
male and female resumes for jobs in mechan- 
ical engineering (traditionally masculine), a 
child-care center (traditionally feminine), 
and journalism (a neuter job), indicated only 

"mixed support for the hypothesis of differ- 
ential evaluations as a function of job type 
and sex. In different contexts, Feather (1975) 
and Feather and Simon (1975) found evi- 
dence for the differential evaluation of males 
and females as a function of the appropri- 
ateness of the occupation for one or the 

a other sex. 

- An additional job variable that has been 
investigated is the predominant sex of the 
subordinates in the job. Rose and Andiap- 
pan (1978) hypothesized that greater differ- 
ential evaluations of male and female job 
candidates would occur when interviewers 
evaluated such candidates for managerial 
jobs that involved either a predominantly 
male or a predominantly female work force. 
Male (п = 55) and female (m = 20) college 
students evaluated resumes for a managerial 
position, Sex of subject, sex of applicant, 
and predominant sex of subordinates were 
the variables investigated. The results of 
their resume study indicated that female 
candidates were evaluated more favorably 
When the predominant sex of the subordinate 
employees was female. Likewise, male candi- 
dates were given higher evaluations when 
their potential subordinate work force was 

, Predominantly male. 

| A further variable that has been investi- 
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gated in several studies is applicant quali- 
fications or competence. Two basic questions 
were behind the investigations of the effects 
of applicant qualifications. First, researchers 
were interested in whether significant sex 
effects emerged simply because sex was the 
only salient cue in the stimulus set and 
whether when sex was considered jointly with 
qualifications the sex effects would diminish 
or even disappear because of the powerful 
effect of the qualification variable. A second 
question concerned the possible interaction 
between applicant sex and qualifications and 
whether interviewers evaluate highly com- 
petent females more poorly than highly com- 
petent males in comparison with the differ- 
ences in evaluations that occur when the 
candidates are not so well qualified (Spence 
& Helmreich, 1972). These predictions were 
based on the notion that highly competent 
or qualified women are particularly threaten- 
ing to male interviewers. Several studies bear 
directly on these issues. Dipboye et al. 
(1975) found a large main effect for com- 
petence: Highly qualified job candidates 
were preferred over less qualified candidates. 
Although the sex of the applicant was still 
a significant main effect, it accounted for less 
than 1% of the variance in the ratings 
compared with 3896 accounted for by the 
qualification variable. On the other hand, 
when subjects were asked to choose only one 
candidate from the resumes, 72% of them 
chose a male. Finally, there was no indica- 
tion of an interaction between applicant 
qualifications and applicant sex. 

Similar results were obtained by Dipboye 
et al. (1977). Although applicant qualifica- 
tions accounted for a large portion of the 
evaluation variance compared with that ac- 
counted for by applicant sex (50% vs. 
.006%), when subjects were asked to choose 
only one candidate for hire, highly qualified 
males were selected by 54% of the subjects, 
whereas 28% chose the highly qualified 
females. This difference was significant at 
the .05 level. 

The study reported by Muchinsky and 
Harris (1977) indicated that significant dif- 
ferences were found between males and fe- 
males of average ability (as indicated by 
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scholastic standing) but that no differences 
occurred between male and female appli- 
cants at high or low ability levels. 

Heneman (1977) reported a study in 
which 144 college students evaluated hypo- 
thetical applicants for the job of life insur- 
ance agent. The qualifications of applicants 
were manipulated indirectly through the use 
of test scores. In this study, highly qualified 
females were evaluated as being much less 
suitable for hire than were highly qualified 
males. 

Finally, the resume study conducted by 
Haefner (1977) found that while highly 
competent individuals were preferred over 
less competent candidates, preferences were 
clearly given to highly competent males over 
highly competent females. Two research 
trends are suggested by these data. First, 
although qualifications of candidates are 
clearly the most powerful factors in influ- 
encing interviewers’ decisions, sex of appli- 
cant remains a significant variable. More- 
over, although the sex of the applicant does 
not appear to account for a great deal of 
rating variability, when only one or two 
hiring choices are given to evaluators, sex 
of the applicant is a highly important vari- 
able. This situation is analogous to the util- 
ity of a test that has low validity but is used 
in a situation in which a large number of 
candidates are considered and only a small 
number of jobs are available. The low selec- 
tion ratio gives a test with low validity a 
high degree of utility (Wiggins, 1973). 

The findings regarding the predicted inter- 
action of applicant qualifications with rater 
sex are mixed. Some studies demonstrate 
support for this notion and others do not, or 
they indicate interactions other than those 
predicted. 

An additional variable that was investi- 
gated in three studies is the attractiveness 
of the job candidates. Dipboye et al. (1975) 
investigated the notion that physical attrac- 
tiveness may be a more important variable 
in influencing interview evaluations for fe- 
males than for males. However, their results 
showed that attractive candidates were pre- 
ferred over unattractive candidates regardless 
of sex. A second study by Dipboye et al. 
(1977) showed different results. Attractive 


RICHARD D. ARVEY 


type of job under consideration. 


males were rated significantly higher 
were attractive females, and unattraet 
males were rated higher than were unattr 
tive females, but no difference occurred; 
tween male and female candidates who 
moderately attractive. f 

The third study investigating this уа a 
was more complex. Cash et al. (1977) 
several predictions: 

1. When candidates are under consider 


considered traditionally male or female; 
tractive applicants will be more favor 
evaluated than unattractive candidates, 
gardless of sex. 

2. When candidates are considered for 
traditionally masculine job, attractive 
will be more highly evaluated than attrac 
females, In contrast, when under consi 
tion for a traditionally feminine job, att 
tive females will be more positively e 
ated than attractive males. 

Results of the study indicated that th 
predictions were for the most part confirm 
In summary, attractiveness is an impo 


influence interviewers’ decisions different! 
males and females, again depending on 


Applicant Race 


Somewhat surprisingly, only three stud 
have dealt with race of the applicant in 
terview evaluations. Even more surpri 
however, is that little evidence was fou 
suggest that interviewers give more шаў 
able evaluations to black job candidates col 
pared with white candidates. ў 

Wexley and Nemeroff (1974) conduci 
study in which resumes of blacks or wl 
were presented to 120 students who 
asked to evaluate either black or white cal 
didates with regard to their suitability | 
the position of mechanical-engineer te hr 
cian. In addition, each resume contained 
formation indicating whether the candida 
was relatively similar in background to | 
interviewer (e.g., father was an office won 
and mother a school teacher) or dissi 
in background to the interviewer (e.g., fath 
was a laborer and mother a domestic wo 


UNFAIR DISCRIMINATION 


er). The students were also divided into 
‘relatively high and low racially prejudiced 
groups on the basis of their scores on an 
attitude measure of prejudice. The results 
indicated that the race of the applicant had 
no effect or; the employability ratings given. 
However, similarity of biographical back- 
ground did prove to be a major determinant 
of the evaluations of job candidates. More- 
over, interviewers who were relatively more 
prejudiced tended to give lower ratings to 
both white and black applicants than did 
interviewers who were relatively less preju- 
diced. 

Haefner (1977) also used a resume design 
and found that race interacted with both 
sex and age to influence interviewers' evalua- 
tions, but these effects were very small. 

Finally, Rand and Wexley (1975) showed 

80 white males and 80 white females video- 
tapes of employment interviews in which a 
black or white applicant was presented. 
Again, race of the applicant did not signifi- 
cantly affect the evaluations of the candi- 
dates. Additional results showed that bio- 
graphical similarity and racial prejudice on 
the part of interviewers influenced the eval- 
uations given, but these factors operated to 
influence the ratings of both black and white 
"applicants. Thus, not a great deal of research 
evidence is available that demonstrates that 
interviewers give lower evaluations to black 
applicants who have qualifications equal to 
those of white applicants. 

A factor that might account for the pau- 
city of significant findings is that the studies 
were conducted within the past 5 years and 
that the interviewers generally consisted of 
students who may have been somewhat lib- 
eral or sensitive to the EEOC and legal 
issues associated with evaluating black ap- 
plicants for jobs. They might have been apt 
to respond somewhat differently if they had 
not been sensitized to these issues. A further 
factor is that all the studies except that 
reported by Haefner (1977) used between- 
subjects designs in which subjects viewed 
only one stimulus condition (either the black 
or white applicant). Within-subject designs 
have two advantages that might be capital- 

ized on in future studies: First, given the 
same total number of subjects, there are 
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more degrees of freedom, thus making an 
experiment relatively more powerful in de- 
tecting differential evaluations if they, in 
fact, occur; second, subjects are allowed to 
view more than one stimulus condition, 
which allows them to establish anchor and 
reference points on the rating scales. The 
between-subjects designs in the studies re- 
ported above may have simply not been 
powerful enough to detect significant effects. 


Applicant Age 


Only two studies have investigated the 
effects of candidates’ age on interviewers’ 
evaluations. Haefner (1977) found that age 
was a significant factor in interviewers’ eval- 
uations and that age also interacted with 
race (as noted above) and with competence. 
Although age played no role in the evalua- 
tion of barely competent candidates, younger 
highly competent individuals were preferred 
over relatively older but highly competent 
individuals. 

Using the in-basket methodology, Rosen 
and Jerdee (1976b) found some interesting 
results, Undergraduates (N = 142) reacted 
to six items embedded in an in-basket in 
which age was the variable manipulated. 
Older employees were evaluated as less suit- 
able for a job compared with younger em- 
ployees. In addition, the older employees 
were evaluated as (a) less promotable, (b) 
more resistant to change, (c) having lower 
physical capability, and (d) less likely to 
have organizational support for retraining 
opportunities. Although these two studies 
suggest a rather strong and pervasive effect 
due to applicant age, more research is needed 
to substantiate these findings. 


Applicant Handicap 


There is a relative dearth of studies dem- 
onstrating the effects (or lack of effects) of 
handicapped status on interview evaluations. 
An early study by Shaw (1972), for ex- 
ample, investigated the differential evaluation 
of job candidates when one was depicted as 
an individual with a “withered arm" and 
weak vision. Subjects were 132 college re- 
cruiters. Although no significant effects were 
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observed, the results indicated that the can- 
didate with the physical disability was per- 
ceived relatively favorably. The authors' ex- 
planation was that subjects perceived the 
candidate “as a courageous figure who had 
overcome physical adversity, rather than an 
employment risk because of physical handi- 
caps” (Shaw, 1972, p. 337). 

Krefting and Brief (1977) reported the 
results of a study in which 145 college stu- 
dents evaluated a packet of resumes and 
other materials necessary for determining 
whether an applicant should be hired for 
the position of typist. The applicant in the 
resume was depicted as either healthy or 
confined to a wheelchair. Also, the applicant 
was depicted as either experienced or non- 
experienced. 

The individuals reviewing the resume ma- 
terials gave estimates of the applicant’s 
health, motivation, potential for staying, and 
so forth in addition to an overall rating. Dis- 
abled applicants were seen as significantly 
more highly motivated and as more likely to 
become long-term employees than were non- 
disabled applicants. However, a puzzling in- 
teraction also occurred. The inexperienced, 
disabled applicant was evaluated higher 
overall than was the experienced, disabled 
applicant. The authors suggested that the re- 
sults of the study were relatively encouraging 
for qualified, disabled applicants. One of the 
problems with this and similar studies is 
that the subjects may have guessed the pur- 
Pose of the study because of the nature of 
the applicant variable (i.., handicapped 
Status) displayed. 

In a second study along these lines, Rose 
and Brief (Note 1) assessed the impact of 
applicant disability (epilepsy) and job type 
on evaluation judgments. Students (N= 
145) evaluated a resume and other data that 
portrayed the applicant as either healthy or 
having epilepsy (although the seizures were 
under control). Applicants were Portrayed as 
applying for a job that involved either a 
great deal of public contact and supervisory 
responsibilities or no public contact or ѕирег- 
visory responsibilities. Subjects gave evalua- 

tions along several dimensions. Results indi- 
cated that epileptic and normal applicants 
were not perceived to be significantly differ- 
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ent in terms of their overall performance 
tenure with the firm, work effectiveness, and: 
amount of salary. There was, however, | 
significant interaction between disability anj| 
type of job on several evaluations. Epileptic 
seeking non-public-contact jobs were pep 
ceived as more likely to satisfy clients thani 
were normal applicants. | 

A third study is perhaps not so encourag. 
ing. Johnson and Heal (1976) reported the 
results of a study that again compared 4 
wheelchair job applicant with a healthy ap- 
plicant. In this study, actual employment 
agencies (50) were visited by one of (ће 
researchers, who indicated interest in finding 
a job. However, in half the interviews, the 
same researcher appeared in a wheelchair 
The researcher rated the agency responses onl 
a number of variables. The analyses of the 
data showed that handicapped applicants à 
opposed to healthy applicants were offered 


addition, the representatives gave the handi 
capped applicants a lower probability of 
getting the kind of work that was being) 
sought and offered relative discouragemeni 
to the handicapped applicants about seeking 
a position with normal public exposure. и 
The results of this study may have been 
influenced by the fact that the researcher did 
the ratings herself. However, the findings 
certainly suggest that interviewers may give! 
less favorable evaluations to individuals dis 
playing a handicap. 


Summary of Evidence Concerning 
Differential Evaluations 


A review of the research concerning dif- 
ferential evaluations of minority job candi 
dates based on components of the intervie 
yields mixed results. First, the evident?) 
based on the resume research is fairly cot) 
sistent in showing that women tend to bé 
evaluated more poorly than men. Могеоуеу 
the degree of differential evaluation appeals 
to be related to the type of job for whi d 
women are considered; a more prominen 
bias occurs when women are considered for ' 
typically masculine-oriented jobs. 
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When qualifications of candidates are con- 
sidered, research studies tend to demonstrate 
the following: (a) Qualifications of job can- 
didates show a powerful main effect that 
accounts for 25%—50% of the variance in 
ratings; (b) the predicted interaction be- 
tween applicant qualifications and sex was 
not consistently found and, in particular, did 
not support the notion that highly competent 
women are prone to more negative evalua- 
tions compared with highly competent males 
in relation to differences between less quali- 
fied male and female applicants. 

A somewhat surprising finding is the pau- 
city of evidence for differential evaluation 
of job candidates as a function of race. Al- 
though only a few studies have investigated 
this phenomenon, the data do not support 
typical a priori assumptions about the in- 
terview’s providing a ready mechanism by 
which to discriminate against blacks. 

This review also reveals a relative dearth 
of research investigating the possible bias in 
interview circumstances that operates to neg- 
atively affect elderly applicants as well as 
those with handicaps. Although the existing 
studies suggest a strong and pervasive effect 
due to applicant age, the data dealing with 
,handicapped applicants are more complex. 
iThe few studies that have explored this issue 
suggest that handicapped applicants are 
viewed as having higher motivational tend- 
encies, possibly because of the perceptions 
of interviewers that handicapped applicants 
must have exerted a great deal of effort, 
worked harder, and so on in order to over- 
come their disabilities. Obviously, more re- 
Search needs to be performed on the kinds 
pf evaluations handicapped individuals re- 
ceive in employment settings. 


Differential Validity of the Interview 


A major thrust of the 1970 EEOC guide- 
lines on testing was the need for employers 
to investigate the possibility that a selection 
device may exhibit differential validity. 
Boehm (1972) and Humphreys (1973) have 
indicated that differential validity has to do 
With whether the correlation between pre- 
dictor and criterion for one subgroup of a 
Population differs significantly from the cor- 
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relation between a predictor and criterion 
for a different subgroup of a population. 

Although numerous studies have examined 
the validity of the interview and interview 
process in general (Mayfield, 1964; Ulrich 
& Trumbo, 1965; Wright, 1969), only three 
studies were located that examined the valid- 
ity of the interview as a function of em- 
ployee subgroup characteristics. 

Lopez (1966) reported the results of a 
study in which black and white female ap- 
plicants for toll collection jobs were given 
a standardized 10-minute biographical data 
interview in which training, experience, and 
personal qualifications were numerically as- 
sessed. Although the study was seriously 
flawed, the data indicated that the interview 
was significantly correlated with toll accu- 
racy and length of time in service for whites. 
For the black sample, the interview was sig- 
nificant when correlated against the criteria 
of attendance, toll accuracy, length of ser- 
vice, and supervisor’s appraisal. However, 
these results are extremely suspect because 
the correlations were corrected for restriction 
of range, sample size values were not re- 
ported for the separate groups, and tests of 
differential validity were not conducted. 

Kirkpatrick, Ewen, Barrett, and Katzell 
(1968) reported validity coefficients sepa- 
rately by race. Interviewers’ ratings based 
on a 5-point scale were correlated against six 
different criterion measures for 94 white and 
22 black female clerical workers. The inter- 
view ratings lacked validity in the total 
sample and in each group separately. Thus, 
the differential validity hypothesis was not 
supported; ^ judgments based on the inter- 
view were invalid for both groups. In addi- 
tion, no difference was observed between the 
ratings given to the black and white groups 
by interviewers. 

Freytag (Note 2) reported the results of 
a study designed to investigate the differen- 
tial validity of the interview in predicting 
performance of 64 minority members (blacks, 
females, Hispanics, etc.) and 54 white pa- 


15 These authors tested for differential validity 
incorrectly, but since both coefficients did not differ 
significantly from zero, one can infer that no dif- 
ferential validity occurred in this instance. 
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trol] officers. Interview ratings based on a 
three-man oral board were correlated with 
performance ratings. Only interview ratings 
of the officers’ communication skills corre- 
lated significantly for the total sample; 
however, a test of differential validity indi- 
cated that the interview was equally valid 
for both groups. 

To summarize these three studies, very 
little evidence exists that indicates that the 
interview is differentially valid for minority 
group members and nonminority members. 
On the other hand, little evidence indicates 
that the phenomenon does not exist because 
of the paucity of research investigating this 
issue. 


Concluding Comments and Needed Research 


For a selection device that is so widely 
used, we know very little about its discrimi- 
natory impact on minorities. Given the high 
probability that the interview will be sub- 
ject to greater judicial review, it is some- 
what startling to realize that only a little 
over 20 studies have investigated the issue 
of differential evaluations or validity of the 
interview. 

Throughout this article, a number of re- 
search needs have been identified. Some of 
these are described below. 


Methodological Needs 


More research is needed that uses different 
methodologies in investigating possible differ- 
ential evaluation or bias in the interview. 
Presently, there seems to be an overreliance 
on resume techniques. It seems likely that 
evaluations given to “pencil-and-paper” peo- 
ple differ from those given to more fully de- 
scribed and portrayed stimulus people. More 
efforts should be made to study in vivo inter- 
view simulations or to use methodologies that 
present fuller stimulus fields of the inter- 
view situation. The studies by Wexley and 
Nemeroff (1974), Wexley, Sanders, and 
Yukl (1973), and Dipboye and Wiley 
(1977) are excellent examples of the pre- 

sentation of stimulus material through video- 
tape techniques. In addition, more efforts to 
conduct field research using the methodology 
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suggested by Johnson and Heal (1976) 
should be made. Face-to-face interviews als 
need to be studied. 

Future researchers may also want to ex. 
plore some Bayesian methodologies, concepts, 
and approaches (Hayes, 1973; Slovic & 
Lichtenstein, 1971) in consideration of the 
differential evaluation issue. Such techniques 
might be used to examine the revision of 
interview judgments or prior probabilities of 
interviewers’ success as more information 
about candidates becomes available. Prior 
probabilities of success in jobs for minorities 
may be more resistant to change than prior 
probabilities for nonminorities, Moreover, the 
prior-probability distributions may them- 
selves be different based on the minority/ 
majority status of the interviewee. Osburn 
and Constantin (Note 3) suggested that in- 
terviewers, when lacking information on bona 
fide qualifications, respond as if women have 
a lower probability of success in jobs tradi- 
tionally held by women. Thus, the formation 
of differential evaluations of candidates 
might be viewed from a Bayesian perspec- 
tive. 

Between-subjects and within-subject де 
signs need to be contrasted with regard to 
the information they yield regarding differ- 
ential evaluations. Within-subject desi 
may be more powerful in detecting differ- 
ential evaluation for two reasons: First, from 
a statistical perspective, more degrees of free- 
dom are available. Second, interviewers 
themselves may be more prone to give dif 
ferential evaluations when they have com- 
parative stimulus sets of interviewees. 

Researchers also need to tap subject pools 
other than undergraduate and graduate stu 
dents. Though some research indicates that 
college student reactions are not appreciably 
different from those of managers (Bernstein, 
Наке], & Harlan, 1975; Dipboye et al; 
1975), more real-life interviewers need to bê 
used in research of this sort. 

Finally, researchers should be more аза 
of the need to present greater stimulus sam 
pling in the conditions presented; that ју 
subjects should have the opportunity to view 
more than one particular female or handi 
capped individual in studies of this kind. 
Otherwise, any significant effect observ 
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might be unique to the specific stimulus in- 
dividual presented because of other uncon- 
trolled minor characteristics (hair color, 
height, etc.). Pools of stimulus applicants 
from which samples can be drawn to show 
subjects should be developed. 


Research on Race, Age, and the Handicapped 


Research is needed to further establish 
whether interview judgments are biased 
against the elderly, the handicapped, or other 
minorities, Little interview research has been 
conducted on these minorities, Given the 
relatively high probability of more lawsuits’ 
occurring in these areas, it would be wise 
to focus greater research efforts on studying 
the interview as it may impact on these par- 
ticular groups, For example, an area of re- 
search that demands further exploration is 
the notion that subjects overevaluate handi- 

‘capped applicants (Krefting & Brief, 1977). 
Perhaps in these situations subjects attribute 
higher motivation, ability levels, and so forth 
to handicapped individuals filing applications 
for jobs because of their perceived efforts to 
overcome their disabilities. 

Further research might also focus on dif- 
ferential evaluations as a function of type of 

j disability. Some disabilities (е.2., mental 
illness) are perceived more negatively by 
employers than are other disabilities (Nagi, 
McBroom, & Colletts, 1972). Thus, lower 
evaluations may be given to applicants with 
the most negatively viewed disability or 
handicap, 

Similarly, research is needed to investigate 
a possible Age x Type of Job interaction. It 
„Seems likely that there are some jobs in 
which older individuals are perceived as the 
more desirable candidates (e.g., president 
of company, chairperson of board) and some 
jobs in which old age is perceived as a handi- 
cap (eg, airplane pilot). The possible in- 
teractions need to be explored. 


Process Research 


More research is needed to determine what 
goes on in interviews to influence differential 
evaluations; that is, researchers should begin 
to focus on the underlying process by which 
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differential evaluations take place. As noted 
earlier, the exact methods by which stereo- 
types affect interviewer judgments are not 
known. Do interviewers base their differen- 
tial judgments on the matching or the stereo- 
typing paradigms described earlier? 

Many of the questions asked earlier about 
the process of interviewer judgments (Web- 
ster, 1964) need to be further examined in 
the context of differential evaluations. For 
example, a number of process issues and 
areas need further investigation. When do 
differential evaluations occur during the in- 
terview process—early or late? Do inter- 
viewers gather different kinds of information 
or pay attention to different information de- 
pending on the majority or minority status 
of the candidate? Policy-capturing methodol- 
ogies along the lines of those employed by 
Zedeck and Kafry (1977) might be useful 
jn these contexts. 

What kinds of differences occur between 
studies that examine the impact of face-to- 
face contact with applicants and studies that 
assess candidates on the basis of resumes? 
Imada and Hakel (1977) showed that non- 
verbal communication (e.g, eye contact, 
posture, gesturing, etc.) has a strong im- 
pact on interviewer decisions. These behav- 
ioral components of the interview process 
need to be examined with regard to their 
impact on differential evaluations. In short, 
more molecular studies of the verbal and 
nonverbal aspects of the interview need to 
be carried out. 

Does the amount of information about 
candidates and jobs affect differential evalu- 
ations? Studies by Longsdale and Weitz 
(1973) indicate that interviewers are more 
prone to agree about job candidates when 
there is a great deal of information concern- 
ing the job than when there is not much 
information about the job. Wiener and 
Schneiderman (1974) conducted two experi- 
ments showing that complete and unambigu- 
ous job information reduces the effect of ir- 
relevant stereotypes on interviewer decisions. 
Does this also result in a minimization of 
differential evaluations? Similarly, does pro- 
viding more information about the applicant 
also serve to decrease differential evalua- 
tions? 
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Strength of Effects 


Analyses that yield precise estimates of 
the proportion of variance accounted for by 
the minority/majority status of the candi- 
dates in interview situations need to be car- 
ried out, The few studies that have calcu- 
lated and presented such estimates indicate 
that minority status accounts for relatively 
small amounts of the variation. Yet, such 
effects may have decidedly powerful effects 
when only a small number (e.g., one) of the 
candidates are chosen as the best or most 
likely candidates. 


Differential Validity and Regression 


More research is needed to investigate the 
possibilities of differential regression in the 
interview. Is the interview equally valid for 
blacks and whites, women as well as men, 
and so on? It may be that if interviewers 
collect and process different information in 
an interview based on the minority/majority 
status of job candidates, then the interview 
may indeed show different patterns of validity 
with different job criterion measures. Simi- 
larly, more psychometrically sophisticated 
models should be used in reviewing the fair- 
ness of the interview. 


Interviewer Training 


Does interviewer training affect the kinds 
of evaluations given to minority and ma- 
jority job candidates? Studies by Wexley 
et al. (1973) and Latham, Wexley, and Pur- 
cell (1975) have shown that workshop train- 
ing of interviewers geared toward the re- 
duction of rating errors is fairly effective. 
Do efforts to provide interviewers with train- 
ing about how to interview minorities, what 
questions to avoid, and so forth result in 
more accurate evaluations and the reduction 
of differential evaluation? To date no studies 
exist concerning these issues, 


Interviewer Variables 


Additional efforts should be made to ex- 
plore possible factors associated with the in- 
terviewer that may affect any differential 
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evaluations given. Arvey, Passino, 3 
Lounsbury (1977) found that sex of. 
evaluator showed marginal effects om) 
gathering of job analysis data. Londom 
Poplawski (1976) reported that females fi 
to make more favorable evaluations tham 
males. In addition, Rumenik, Capasso, | 
Hendrick (1977) summarized a number 
studies concerning sex of the interviewer ај 
concluded that this variable is indeed a 
tent one. [ 

Moreover, other interviewer characteris 
should be investigated with regard to th 
possible influence on differential evalua 
For example, Rose and Andiappan (1 
suggested that the interviewer differenc 
androgyny may influence differential 
uation. 


Integration of Findings With Other Researa 


Research should be conducted that 
solves the conflict between many of the fi 
ings gathered in the context of interview t 
search and those gathered in other cont 
For example, Hamner, Kim, Baird, and 
ness (1974), Bigoness (1976), and 
son and Effertz (1974), among others, 
found that females are evaluated more f 
ably when raters evaluate performance. 
the results from these studies differ substal 
tially from those summarized earlier nee 
to be determined. 

It is hoped that the present article Wil 
serve to direct and foster future re 
efforts. The last word about the intervi 
a long way from being said. 
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Gideon J. Mellenbergh, Henk Kelderman, Jenneke G. Stijlen, 
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University of Amsterdam, Amsterdam, The Netherlands 


Linear models are described for the situation wherein a measurement instru- 
ment is constructed for all elements of the Cartesian product of several facets 
when the elements of each facet are not ordered. The structure of the covar- 
iance matrix of the instrument is derived from the models. By using covariance 
Structure analysis, the models can be tested, and estimates of the parameters 
can be obtained. Models for 20 tests were formulated and tested and were con- 
structed from a design with a behavioral and a situational facet measuring social 
anxiety in children; for 15 tests a model proved to fit the data. It is concluded 
that covariance structure analysis is useful for the analysis and construction of 


measurement instruments, 


Facet designs can be used for test item writ- 
ing. A facet is a set consisting of a finite number 
of elements (Foa, 1965). The Cartesian pro- 
duct of a finite number of facets is a design for 
item and test construction. This design for test 
construction corresponds to the factorial design 
for experimentation. Facet designs have been 
used in different areas of psychological and 
educational measurement. For example, Gutt- 
man and Schlesinger (1967) constructed dis- 
tractors for ability and achievement items from 
facet designs. Hamersma, Paige, and Jordan 
(1973) reported facet designs for the construc- 
tion of attitude scales. Butt and Fiske (1968) 
compared dominance scales constructed from 
a facet design with dominance scales developed 
using other strategies. Guilford’s (1967) struc- 
ture-of-intellect model can be conceived as a 
facet design (Fiske, 1971, p. 128), The multi- 
trait-multimethod matrix (Campbell & Fiske, 
1959) results from a facet design: The cor- 
relation coefficients of this matrix are computed 
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between the elements of the Cartesian product 
of a facet with traits and a facet with methods: 
Scores from tests constructed in a facet 


Gleser, Nanda, & 
Rajaratnam, 1972). For example, Endler and 
Hunt (1966) used a facet design with 11 situa- 
tions and 14 modes of response. For each 
combination of situation and mode of response 
а 3-point scale was constructed. By consider- 
ing subjects as elements of a separate facet and 
by assuming that the elements of each facet 
are a random sample from a universe, variance 
components were estimated for the situations, 
the modes of response, the subjects, and their 
interactions. Scores from tests constructed in 
a facet design can also be analyzed using three- 
mode factor analysis (Tucker, 1963). For 
example, Levin (1965) analyzed the Endler | 
and Hunt data with this technique. The results 
of this analysis are three matrices: one with | 
factor loadings of the situations, one with factot 
loadings with the responses, and a core, The 
core contains the scores for the interaction of 
Situations and responses for types of subject 
Another approach is the analysis of measures | 
of association between the instruments con- 
structed in a facet design. For example 
Schlesinger and Guttman (1969) computed 
correlation coefficients between intelligence 
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and achievement tests in a facet design; models were tested; parameter estimates are 
they used smallest space analysis for the reported for the model that best fits the data. 
analysis of the correlation matrix. Bock and 
Bargmann (1966) analyzed covariance matrices Models for Facet Designs 
of tests constructed in a facet design. They 
considered the elements of the facets from 
which the tests were constructed as fixed and 
the elements of the facet with subjects as 
random. Assuming that the observed scores 
were linear functions of a set of uncorrelated 
latent variables, they analyzed the covariance 
^ matrices and estimated the variances of the 
latent variables. Wiley, Schmidt, and Bramble 
(1973) generalized the models considered by 
Bock and Bargmann. 

New models have recently been developed 
for the analysis of covariance matrices : 8 
Hypotheses about the structure of the matrix Of the Cartesian product of the two facets (for 
can be formulated and tested; estimates of an example, see Table 1). It is assumed that 
the parameters in the hypothesized model can for each element of the Cartesian product, one 
be computed (Bentler, 1976; Joreskog, 1970, measurement instrument 1s constructed. ; f 
1973; Long. 1976; Mukherjee, 1973). A facet It is also assumed that the score of Subject 7 
design is preeminently suited to generating on the instrument from the jth row and the 
and formulating hypotheses about covariance kth column is a linear function of a set of 
matrices, Covariance structure analysis should latent variables: 
therefore be considered as ап important method у; = mje + 265: + "Ку + суб 
for the analysis of covariance and correlation RCs К 

5 i У + "АКСА Еве (1) 
matrices computed between instruments con- 
structed from a facet design; it can further In this equation mys is the mean score on the 

an "integration of test design and analysis" instrument in the population of subjects; the 
PA Guttman, 1970). parameters je fj Cjky and те are loadings 

Linear models are formulated for facet specific to the jth row and the kth column of 
designs. The models are applied to à correla- the facet design. The other terms in the equa- 
tion matrix of 20 instruments from a facet tion represent the scores of Subject оп random 
design measuring social anxiety in children. latent variables: S; is the score on a general 
Using covariance structure analysis several latent variable, Rij and Са are the scores on 


The situation wherein the elements of a 
facet are not ordered is considered; instru- 
ments are constructed for all elements of the 
Cartesian product of the facets. A general 
model for this situation is described; from this 
model special cases are derived. For the sake 
of simplicity, the models are described for two 
facets with р and д elements; the models can 
easily be generalized to more than two facets. 
The two facets can be represented by a 
rectangle with p rows and q columns. Each 
square of the rectangle represents an element 


Table 1 
Cartesian Product of a Situational and Behavioral Facet for Social Anxiety in Children 


у 
Behavioral facet 


Situational 


facet Cognitive Avoidance Physiological Affective 
Social Social- Social- Social- —— Social- 
cognitive (yu) avoidance (ул) physiological (уна) affective (ум) 
4 Intellectual Intellectual- Intellectual- Intellectual— Intellectual— 
cognitive (ya) avoidance (угз) physiological (yas) affective (ун) 
Physical Physical- Physical- Physical- — Physical- 
cognitive (ум) avoidance (уза) physiological (ys) affective (ум) 
Exclusion Exclusion— Exclusion- Exclusion- — Exclusion= 
cognitive (ya) avoidance (уа) physiological (ужа) affective (y44) 
Appearance Appearance- Appearance Appearance Appearance— 
nitive (ум) avoidance (Ув) physiological (ys) affective (ум) 
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latent variables specific to the jth row and 
the kth column, and RC;; is the score on a 
latent variable specific to the combination on 
the jth row and the kth column; Е; is the 
error score. The terms RC;; and Ei;;, are 
both scores of Subject i on latent variables 
specific to the combination of the jth row 
and the kth column and cannot be separated 
without replications. Therefore, the term 
то КС ук is absorbed into the residual term: 
Eijk“ = rc RCijy + Eijk; omitting the asterisk 
in the residual yields the following model: 


Уцк= та ВК РСС Ei. (2) 


Restricting parameters in Equation 2 yields 
special cases. First, one or two of the param- 
eters Sj, rj, and cj, are set equal to zero: 


Dik = Min raRug + caCa + Ein; (3) 
Dijk = Min F SSi H Cir + Ep; (4) 
Vik = ту + SSi + туку + Ej; (5) 
Dijk = Mix 59: + Eijk; (6) 
ук = mg + rgRij + Eijk; (7) 
Vite = ть caCa + Ei. (8) 


Second, the parameters can be set equal to 
one; Equation 2 reduces to 


Уж = те + Si + Rij + Са + Eg. (9) 


This model can be restricted, thus further 
eliminating one or two of the latent variables. 
Third, in Equations 2-9 parameters can be 
constrained in the sense that they are set equal 
to each other. 


Equation 2 can also be formulated in vector 
notation: 


Ук = my + Vaf; + Ein, (10) 


where f’; is the vector with the scores of 
Subject i on all latent variables [f'; = (Si, Ra, 

o Rin ... Rip, Cat Cay ies vig C4)] and 
I ik is the vector with three parameters in 
positions corresponding to the positions of the 
elements S;, Rij, and Cy, in the vector f^ ; and 
is zero otherwise [l's = (55,0, .. aie, USUS 
Cj, 0, ...)] The models displayed in Equa- 
tions 3-8 can also be written in the form of 
Equation 10 by setting the appropriate ele- 
ments of the vector I’;, equal to zero ; the model 
in Equation 9 is written in the form of Equa- 
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tion 10 by setting the three parameters of th 
vector l’; equal to one. t 
In Equation 10 the score of Subject i oni 
one instrument is considered. However, for 
Subject i scores are obtained on all pXq 
instruments. The scores of Subject i on thel 
instruments are collected in one observation! 
vector: у: = (ya, -.-, Yao Yan -+ +y Yao van 
Vipis +++) Yip). From Equation 10 follows the 
model for the vector of observations of Subject 

i: 
y 2m-Lf; e, (0977 


where the vector m" = (та, ..., mig, Moi sss 
Тоду - ++, Тру - -+y Mpg) Contains the popula. 
tion means of all instruments; the vector 
ег= (Ens... Ene Em, ..., Eno У 
Eip ..., Ej), the residuals of Subject f 
on all instruments; and L is the matrix with 
rows las L' = (ln, ..:, li, bat, ..., loo «чи 
li ..., lg). It is assumed without loss of 
generality that the latent variables have zero 
population means, with covariance matrix E 
for the residuals and C for the other latent] 
variables. From the assumption that the 
population covariances between the residuals) 
and the other latent variables zero and from) 
Equation 11, it follows that the population] 
covariance matrix S of the observed scores 
on the instruments has the following structure 
(Јӧгеѕкор, 1974): 


S = LCL’ + E- (12) 


The restrictions on the parameters meni 
tioned above give rise to special cases of tht 
matrix L. Equation 6 implies that only the 
first column of L contains parameters; the 
other elements are zero. The model is a one} 
factor model; if this model fits the data, the. 
instruments are termed congeneric test 
(Jóreskog, 1971, 1974). The models display 
in Equations 3, 4, 5, 7, and 8 imply that the 
matrix L contains columns with both param: 
eters and zeros; the models are restricted factor 
models (Lawley & Maxwell, 1971, chap. 7). 
Equation 9 implies that L is а matrix Wi 
zeros and ones, which in experimental desig" 
models is termed a design or incidence matt 
(Searle, 1971, p. 166); the model is a variance 
component model (Jéreskog, 1974). 

Assumptions about the covariances be! 
the latent variables yield special cases of р 
matrices C and E? in Equation 12. Assuming = 


tween 
the 


"In practical 
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that the population covariance of the residuals 
‘of the instruments is zero implies that E? is a 
diagonal matrix with the variances of the 
residuals on the diagonal. From the assump- 
tion that the population covariances of all 
latent variables are zero, it follows that C is 
a diagonal matrix with the variances of the 
latent variables on the diagonal; assuming that 
the population covariances of only some of the 
latent variables are zero implies that the matrix 
C contains covariance parameters and zeros. 
applications the investigator 
should state his or her assumptions regarding 
the population covariances of the latent 
variables. 

These points can be illustrated by the multi- 
trait-multimethod matrix. The model for this 


parameter 
factor (Jóreskog, 1971). In the most general 
formulation it is assumed that all factors are 
correlated (Werts & Linn, 1970) and that all 
elements of the matrix C are parameters. 

special case is obtained by assuming that the 


у. get factors are correlated and that the method 


b 


a 


actors are correlated but that the trait factors 
are not correlated with the method factors 
(Alwin, 1974). The matrix C contains param- 
eters and zeros. A further restriction can be 
made by assuming that the effects of a given 
method are constant (Alwin, 1974). This 
implies that some parameters in the matrix L 
are constrained: The parameters in a column 
for a method factor are set equal to each other. 
зй Equations 2, 4, 5, and 6 all contain а general 
factor, whereas Equations 3, 7, and 8 do not 
contain such a factor. Equation 3 has been 
discussed in the literature, as it is the basic 
model for the multitrait-multimethod matrix. 
Tn many cases the incorporation of a general 
factor seems to be important. First, the general 
factor can be of theoretical interest. For ex- 
ample, Eiting (Note 1) used musicality tests 
constructed in a facet design; he used a facet 
with musical abilities and a facet with musical 
materials such as melody and rhythm. The 
theoretical interest in this study was in the 
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comparison of the models in Equations 3 and 
4. Equation 3 implies that the correlation 
matrix of the musicality tests can be explained 
by material factors and separate factors for 
musical abilities, whereas Equation 4 implies 
that next to the material factors only one 
musical ability is involved. Other theoretical 
studies comparing models with and without 
a general factor can easily be conceived. Sup- 

tests are considered in a design with two 
facets of Guilford's (1967) structure-of-intellect 
model and that all the tests belong to the same 
element of the third facet, for example, memory 
tests for the elements of the Cartesian product 
of the content and product facet. A point of 
theoretical interest in this case is the compari- 
son of models with and without a general 
memory factor. Second, the comparison of 
models with and without a general factor is 
important from the point of view of test con- 
struction. For items constructed in a facet 
design that fit a model with a general factor, 
it makes sense to combine items in a total 
score measuring the general aspect of the 
construct; items that do not fit a model with 
a general factor should only be combined in 
subtest scores. Regarding the general factor, 
it cannot be stated a priori that this factor 
should be correlated or uncorrelated with other 
factors; this depends on the research problem 
or can be investigated empirically. In test- 
construction, it is generally useful to postulate 
that the general factor is uncorrelated with the 
other factors. But in a theoretical study, such 
as in the above-mentioned structure-of-intelli- 
gence tests, it can be of interest to study the 
correlation of a general factor with other 
factors. 


Estimation, Testing, and Identification 


If in a sample an unbiased estimate § of the 
covariance matrix is obtained, the parameters 
can be estimated using the generalized least 
squares or the maximum-likelihood method, 
By assuming that the observed scores on the 
instrument have a multivariate normal dis- 
tribution for both methods, statistics can be 
computed that are asymptotically distributed 
as chi-squares (Jóreskog & Goldberger, 1972). 
The chi-square value can be used for testing 
the goodness of fit of a model against the very 
broad alternative that S is any positive, de- 
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Table 2 

Means, Standard Deviations, and K-R 20 
Reliability Coefficients for 5-Item Tests 
Measuring Social Anxiety in Children 


Test M SD K-R 20 
Social-cognitive .82 1.08 52 
Social-avoidance 1.23 OQ E Ti 
Social-physiological 65 94 49 
Social-affective 42 1.00 45 
Intellectual-cognitive 1.27 1.43 .65 
Intellectual-avoidance 74 .80 "Il 
Intellectual-physiological 1.23 1.39 65 
Intellectual-affective 1.17 1.12 47 
Physical-cognitive 1.04 1.27 65 
Physical-avoidance AT 82 44 
Physical-physiological 54 .87 47 
Physical-affective .42 13 .42 
Exclusion-cognitive 1.07 1.19 56 
Exclusion-avoidance 62 84 31 
Exclusion-physiological AT 93 .66 
Exclusion-affective 1.71 116 x33 
Appearance-cognitive 1.16 1.27 56 
Appearance-avoidance 1.37 100  .26 
Appearance-physiological 1.24 1.32 .62 
Appearance-affective 1.63 1.29 46 


Note. К-К 20 = Kuder-Richardson formula. For 
Ms and SDs, N = 320; for K-R 20, N = 396: 
The group of 320 subjects was increased with the 
addition of 76 third-grade students. 


definite matrix (Jóreskog, 1974). Moreover, 
different models can be compared by the right- 
tail probabilities of their chi-square values. 
Also, a Model Mi, nested in Model Ms, can 
be tested against Model Mz: The difference in 
chi-square values is asymptotically a chi- 
square with degrees of freedom equal to the 
difference of the degrees of freedom of the 
models. Model M; is said to be nested in M: 
if M; can be obtained from M, by constraining 
one or more of the free parameters in М, to be 
fixed or to be equal to one another (Long, 
1976). Another method for the investigation 
of the fit of a model is the inspection of the 
matrix with the residual covariances : 


R-$-— (Кеј + Ёз), (13) 


where ї, €, and É? are the estimates for the 
matrices L, C, and E?, The computations can 
be done with the computer programs ACOVS 
(Jóreskog, Gruvaeus, & Van Thillo, Note 2) 
and LISREL (Jóreskog & Van Thillo, Note 3; 
Saris, Note 4). 

A model is said to be identified if all param- 
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eters are uniquely determined. For example, 
the factor analytic model is not identified; 
the parameters are not unique, and the factor 
solution can be rotated. A necessary condition 
for identification is that the number of dis- 
tinct parameters in the covariance matrix § be 
at least as high as the number of parameters 
implying that the number degrees of freedom! 
must be equal to or greater than zero. This 
condition is necessary but not sufficient, A 
second condition is that the linear equations 
in the model be such that each individua 
parameter can be separated from the other 
parameters. In general it is hard to investigate 
this condition. Moreover, it is possible that 
the condition may be fulfilled but the mode 
still not identified. This can happen if the 
estimate of a parameter in a sample is about 
zero (Saris, Note 4). A practical procedure 
for investigating the identifiability of a mode 
is to ask for the standard errors of the param- 
eter estimates. If the program can come 
pute the standard errors, one has good as- 
surance as to the identifiability of the modd 
(Wiley, 1973). Another practical procedure is 
to estimate the parameters twice by starting 
the computing procedure with different initial 
values for the parameters. If the estimates of 
the parameters for both computer runs are 
equal, one also has good assurance as to the 
identifiability of the model (Saris, Note 4): 


Measurement of Social Anxiety in Children 


Dekking and Raadsheer (Note 5) used à 
situational and a behavioral facet for the 
construct of social anxiety in children. In the 
Situational facet five types of situation in; 
which children can exhibit social anxiety were 
differentiated: social, intellectual, ha 
exclusion, and appearance. In the behavior 
facet four types of reaction indicating S 
anxiety were differentiated: cognitive, avoi- 
ance, physiological, and affective. The Carte 
sian product of the two facets has 20 elements 
(see Table 1); for each element 5 items wet® 
written. An example of a social-avoidan? 
item is as follows: “During recreation I don 
play with other children" (yes/no); an €% 
ample of an appearance-cognitive item 5 8% 
follows: *Having had a hair cut I am cx 
others think I look strange" (yes/no). The ds 


D 
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items were administered to 320 children from 

y the fourth, fifth, and sixth grades of three 
elementary schools. Each item was scored 1 
(indicating social anxiety) or 0 (indicating the 
absence of social anxiety). For every S-item 
test the score was the sum of the item scores. 
For the 20 tests, means, standard deviations, 
and Kuder-Richardson formula (K-R 20) 
reliability coefficients of the scores were 
calculated (see Table 2); the correlation co- 
efficients among the 20 variables were also 

, calculated (see Table 3). The interest in this 
study was in the relations among the variables 
and not in the scale in which the variables are 
measured. It was therefore decided to analyze 
the correlation matrix that is the covariance 
matrix of the standard scores. From the means 
in Table 2 it is obvious that the frequency 
distribution of most of the variables is skewed. 
The assumption of a multivariate normal 
distribution of the observed scores is certainly 
,not fulfilled, and goodness of fit tests should 
therefore be interpreted very carefully. It is 
assumed that the population coyariances of 
the residuals of the instruments are zero; 
therefore, in all the analyses E? is à diagonal 
matrix with the residual variances on the 
diagonal. 
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The variance component model (Equation 
9) is the model with the smallest number of 
parameters. Hence the analysis was started 
with this model; the form of Equation 11 is 
presented in Table 4. From the assumption 
that the population covariances of all latent 
variables are zero, it follows that in Equation 
12 C is a diagonal matrix with diagonal ele- 
ments: var (S), var (Ка), var (Rə), var (Ка), 
var (Ка), var (Rs), var (C1), var (Сз), var (Сз), 
and var (C,). Matrix L is not of full column 
rank; therefore, constraints are necessary for 
estimating the variance components (Jóreskog, 
1974). In this stage of the investigation, how- 
ever, interest was not in estimating the 
parameters but in fitting a model. The un- 
constrained model was therefore fitted to the 
data. The value of the chi-square for this model 
is 547.39 with 181 degrees of freedom; the fit 
of the variance component model is very poor. 

In Equation 11 the variances of the latent 
variables of the vector f; were subsequently 
set equal to one. The matrix L is then a matrix 
with loadings: The ones in the matrix L in 
Table 4 are replaced by parameters that should 
be estimated. In this case Equation 12 gives 
the structure of the covariance matrix of the 
observed scores in a restricted factor model 


ys ум Ja Yaz Уб Ум ys Ую Уы ум 
43 .26 .56 .30 .39 42 .52 .39 .53 .52 
(4523. 10.235 .14 11 .19 .14 .14 14 
49 115 .43 .25 .53 .34 44 .31 .55 .43 
41 .25 .44 AL .47 .38 .52 .39 .57 .53 
35 .14 .57 .22 .39 .49 .59 40 .56 .54 
47 003 20 .35 20 .23 25 .27 .27 .24 
‘Al .16 .47 .28 48 .46 .50 .33 .56 .49 
.39 .13 45 28 45 .43 .46 .34 .53 .48 
30 .24 28 10 .13 .26 .27 23 .22 .28 
24 AO 18 .18 .14 .17 .18 24 17 11 
.33 41 .27 .46 .31 .33 .24 .38 .29 

СА 18 18 16 19 .17 .09 .16 

48 .47 .49 .53 .31 .54 .52 

nol 620) 227 S20 co 824. 

.36 .43 .32 .50 .35 

45 34 48 .48 

32 .60 .62 

.42 .40 

.68 


iTable 3 HS 
Correlation Coefficients Among 20 Variables Measuring Social Anxiety in. Children 
Variable yi уз yu yar m Ja Ум Yar Уз 
Уп 24 45 .50 .57 26 44 45 .34 21 
M 43 .15 .19 .19 .19 .13 .19 .26 
Уа .54 .52 .24 48 47 1 11 
Ум 154 .28 .46 .54 .19 .09 
AUS .35 .64 .63 .24 43 
j| ја .31 .33 14 .23 
Ба n 150r 65 117: 
E 44 1 
yu 46 
Ја 
Yas 
Ya 
Уа 
Vaz 
k Уа 
yu 
Ум 
Уз 
M 
Ји 


j. Note. N = 320. 
J 


712 


Table 4 
Form of Equation 11 
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ya mu ЧОО 
2 | mia 158120170. 
ys mias Е) 
Jis mu rH 1502/0 
yin та 1420 1:40 
yin то 117006120. 
Viza mas ОКО 
yi mu 10» 1250 
Viar та 1900) Ој 
Va| = |m:| + |1 0 0 1 
Jis mas ПО Оу 
Уи mu 1001 
yu ma 1000 
Y mas 1000 
yia таз 1000 
yu ma 1000 
Уы та 1000 
b msz 1000 
y) mss 1000 
Yisa, ты) te 0. OREO! 


cooo----oooooooooooo 


01000 Ein 
00100 Bin 
00010 Eiis 
00001 Eia 
01000 Ein 
0010 0|[S Ein 
0 0 0 1 O||Ra Eh 
0 0 0 0 1| |Rs Eis 
0 1 0 0 0| R5 Ein 
0 0 1 0 о [к + Eis 
0 0 0 1 0| I Ris Ела 
0 0 0 0 1||Ca Ei 
EEES O | Са Biu 
0 0 1 0 0||Ca Bias 
0 0 0 1 оса Eis 
00001 TOR 
11000 Ein 
10100 Eise 
10010 Biss 
10001 [2 


with one general factor, five situation factors, 
and four reaction factors. From the assump- 
tion that the factors are orthogonal, it follows 
that the matrix C in Equation 12 is an identity 
matrix. The value of the chi-square for this 
model is 210.49 with 130 degrees of freedom. 
Comparing the statistics of both models shows 
that the fit of the factor analytic model is 
better than the fit of the variance component 
model. Inspection of the matrix R shows that 
residual correlation coefficients higher than .10 
or lower than —.10 are between the following 
variables: social-avoidance and  physical- 
cognitive (.113), social-avoidance and phys- 
ical-avoidance (.131), social-avoidance and 
physical-affective (.166), social-affective and 
exclusion-avoidance (.117), intellectual-avoid- 
ance and physical-affective (.137), and phys- 
ical-affective and appearance-physiological 
(—.108). In five of the six cases, tests measur- 
ing avoidance reactions are involved. More- 
over, from Table 2 it is clear that the tests 
measuring avoidance reactions generally show 
low K-R 20 reliability coefficients. "Therefore, 
the five tests measuring avoidance reactions 
were eliminated from the correlation matrix, 
the avoidance-reaction factor was also elimin- 
ated, and the revised data were reanalyzed. 
The resulting chi-square value is 99.59 with 60 
degrees of freedom. However, the standard 


errors of the parameters could not be com] 
indicating that the model is not idei 


that the loading of social-cognitive 0 

factor Social and the loading of excl 
affective on the factor Exclusion are Бой 
zero. To obtain an identified solution the 
loadings were constrained to be equal t 
other. In the computing procedure 
countered serious difficulties: In the ite 
process the loading of social-affective om 
factor Affective became larger and la 
than one, while the residual variance bee 
smaller and smaller than zero. It was tht 
decided to stop the iteration process. A 
for difficulties in obtaining an identified. 
can be that all factors are specified to | 
correlated. Therefore, it was assumed thi 
situation factors are correlated betwee 
another and that the reaction factors ài 
related between one another; the correl 
between situation and reaction factors 
between the situation and reaction fac 
one hand and the general factor on the 0 
are assumed to be zero. The fit of this mi 
is excellent: The value of the chi-sq 
33.57 with 47 degrees of freedom; the 
of the coefficients of R is from —.047 to; 
The program computes standard errors 
parameter estimates, indicating that the 
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Symmetrical 90% Confidence Intervals for Factor Correlation Coefficient Estimates 


~ 
АС 


Factor 1 2 3 4 5 6 7 8 9 

1. General 1 

2. Social 0 1 

3. Intellectual 0. —:92; 1.91 1 

4. Physical 0 19, .73 | —6.28, 4.27 1 

5. Exclusion 0 .22, .95 —12.13, 8.63 —.13,.80 1 

6. Appearance 0 41, .91 —5.57, 4.09 —.82,.29 —.71,.96 1 

7. Cognitive 0 0 0 0 0 0 1 

8. Physiological 0 0 0 0 0 0 .72, 1.04 1 

9. Affective 0 0 0 0 0 O .63,1.07 .77,102 1 | 
Note. У = 320. 


is identified. Table 5 gives the symmetrical 
90% confidence intervals for the loading 
estimates, and Table 6 gives the confidence 
intervals for the factor correlation estimates. 
The confidence intervals for the correlations 
between the factor Intellectual and the other 
Situation factors are inadmissibly large. The 
concept of the specific factor Intellectual is not 
very appropriate for the description of the 
correlation matrix. The use of a general factor, 
uncorrelated with all other factors, seems ap- 
propriate: 10 instruments have substantial 
loadings on this factor. Also, it seems ap- 
propriate to distinguish the factors Social, 
Physical, Exclusion, and Appearance. The 
factor Social is correlated with the other three 
factors: The intervals for the correlations do 
not contain the value zero; the intervals also 
do not contain the value one, indicating that 
it makes sense to distinguish the factor Social 
from the other three factors. The intervals of 
the intercorrelations of the factors Physical, 
Exclusion, and Appearance all contain the 
value zero. Moreover, almost all loadings on 
these four situation factors are substantial. 
There is, however, not much reason to dis- 
tinguish the reaction factors: The intervals of 
the intercorrelations of these three factors all 
contain the value one. 


Discussion 


The analysis of the social anxiety data has 
shown that covariance structure analysis can 
be an important tool in test construction. To 
construct an instrument for measuring general 
social anxiety, one can select and combine 
tests with high loadings on the general factor. 


It is also possible to construct composite 
instruments that measure special aspects of 
social anxiety. For example, a composite of 
the tests physical-physiologica!, and physical- 
affective, and physical-cognitive would con- 
stitute an instrument that measures social 
anxiety in physical situations. 
The question of why covariance structure 
analysis should be used for the analysis of 
correlation matrices from facet designs cam. 
be raised. Conventional factor analytic 
methods, especially three-mode factor analysis 
(Tucker, 1963), could also be used. Some 
points are mentioned briefly. First, conven- 
tional factor analytic methods are mostly used, 
for exploring the structure of a correlation: 
matrix, whereas covariance structure analysis” 
is appropriate for testing hypotheses about 
matrices. Second, in covariance structure 
analysis the parameters can be estimated with 
the maximum-likelihood method. This im 
plies that the properties of the estimates are 
known; the desirable properties of maximum- 
likelihood estimates are described by Mood; 
and Graybill (1963, p. 185). Moreover, from 
the standard errors of the parameter estimates ^ 
confidence intervals can be derived. The 
properties of the estimates in convention? 
factor analytic methods are not known, and it 
is also not possible to specify conficta 
intervals. Third, it should be emphasized tl 
it is assumed that the observed scores have #0 
multivariate normal distribution. In applic 
tions this assumption is sometimes even 10 
approximately fulfilled, and the effect of і 
is not known. Finally, it should also be A 
phasized that there are many problems I 7" 
use of covariance structure analysis. To me 


. tion some, it is sometimes hard to formulate 
hypotheses and identifiable models, the com- 
puting time can be high, some problems, such 
as the power of the test of a model, are not 
resolved, and the variables must be quantita- 
tive and their relations linear. 

The model with 10 factors proved to be an 
adequate explanation for the correlation 
coefficients between the 15 social anxiety 
variables. This does not mean, however, that 
it is the only model that fits the data. For the 
i^ correlation matrix of the 20 variables, for 

example, the following model was used: five 

situation factors, four reaction factors, and two 
correlated second-order factors, one a general 
situation factor and the other a general reac- 
tion factor. Each test has a loading on one 
situation and on one reaction factor; each 
situation factor has a loading on the general 
situation factor, and each reaction factor has 

a loading on the general reaction factor. The 
‘model is a restricted, second-order factor 

analysis model (Jóreskog, 1974). The total 

number of parameters in this model is one 
less than in the model with the 10 orthogonal, 
first-order factors. The value of the chi-square 
for this model is 207.16 with 131 degrees of 
freedom ; the fit of the model is about the same 
as the model with the 10 orthogonal, first- 
order factors. Inspection of the matrix R 
shows that the coefficients higher than .10 or 
lower than —.10 are between the following 
variables: physical-cognitive and social-avoid- 
ance (.116), physical-avoidance and social- 
avoidance (.138), physical-affective апа 
social-avoidance (.136), physical-avoidance 
and social-affective (—.108), physical-affec- 
tive and intellectual-affective (—.111), ex- 
^ clusion-affective and physical-cognitive (.112), 
“fappearance-avoidance and physical-cognitive 
^ (109), physical-cognitive and appearance- 
affective (105), and physical-affective and 
appearance-physiological (—.135). Although 
the picture is not as clear as in the previous 
analysis, it is likely that elimination of some 
j, tests can improve the fit of the model. This 
was not done because a model with two cor- 
related second-order factors is more difficult 
to interpret than a model with first-order 
factors. 
It has already been stated that the general 
- model in Equation 2 can easily be generalized 
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to more than two facets. For example, the 
model for the score of Subject 7 on the com- 
bination of the jth element of the first facet, 
the kth element of the second facet, and the 
lth element of the third facet is 


Уы = ты + 5: H ты Кау 
+ GC + ћиНа + гора RC ijn 
+ та! у + chiaCHia + Eia 
From this equation it can be seen that the 


number of parameters is very large; restric- 
tions in the model are necessary. 


(14) 
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Models for Biases in Judging Sensory Magnitude 


Poulton 


Medical Research Council, Applied Psychology Unit 
Cambridge, England 


Category ratings and magnitude judgments are affected by four range biases, the 
centering bias, the stimulus and response equalizing biases, and the contraction 
bias; by three nonlinear biases, the local contraction bias, the stimulus spacing 


bias, and the logarithmic bias; and by 


bias from transfer. Models of the biases 


are described. The biases are most marked in sensory dimensions that students 


А 


are not taught to handle, such as loudness and brightness. Avoiding all the biases 


requires exceedingly rigorous investigations. 


Why Sensory Magnitudes 


Sensory magnitudes are selected for this 
review of biases in judgment because the 
| stimuli can be measured on a physical scale. 
Judgments of the quality of life or of the 
likableness of people lack a precise measure 
of the stimulus. Thus the biases are more 
difficult to specify exactly. Anderson (1974), 
Helson (1964), Parducci (1963), and the 
late S, S. Stevens (1975) all use sensory mag- 
nitudes when presenting their theories of 
judgment. 


EY 


Kinds of Stimuli 


Averaging two weights and multiplying 
two lengths to calculate the area of a rec- 
tangle are a part of common knowledge. Stu- 
dents can handle these problems with rela- 
tively little bias (Anderson, 1974, Figures 1 
and 8) because they are taught models of 

, the way the sensory dimensions work. 
& Common knowledge does not include aver- 
j^ aging two sound intensities or two light in- 
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tensities, which is usually described as bi- 
secting the interval between the two intensi- 
ties. Nor does common knowledge include 
halving and doubling loudnesses and bright- 
nesses, Judgments of these unfamiliar kinds 
are more easily biased. They are therefore 
the judgments to use in studying biases. 


Kinds of Responses 


Responses that are closely linked to the 
stimuli by well-known rules are less easy to 
bias than responses that are only loosely re- 
lated. The display that follows lists the kinds 
of responses that are asked for in studying 
judgments of sensory magnitude. They are 
ordered by the extent to which they are 
linked closely to the stimuli by well-known 
rules: 


Familiar physical units 
Named categories 
Numbered categories 
Numbers 

Cross-modal matches 


First is a familiar physical measure of the 
stimulus such as length in meters, Provided 
the observer is given an adequate oppor- 
tunity to study the stimulus, these responses 
are not easily biased except by the contrac- 
tion bias. Next are rating scales with named 
or numbered categories. Finally, there are 
the numerical magnitude judgments and the 
cross-modal matching responses advocated by 
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Figure 1. Models for the biases in judging sensory magnitudes; Models A, B, and c are concerned 
with the overall range, whereas Models D, E, and F are concerned with the nonlinearities within 


the overall range. (S = stimulus; R = response.) 


S. S. Stevens (1975). Ratings and magnitude 
judgments are all relatively easily biased. 

It is assumed in the display that the in- 
vestigator selects the physical stimuli and 
that the observer gives one of the kinds of 
responses mentioned, But there is an alter- 
native type of investigation in which the 
investigator gives one of the kinds of re- 
sponses and the observer sets the size of the 
physical stimulus to match it. Both types of 
investigation can be biased in similar ways. 


Kinds of Bias 


In the centering bias of Figure 1A, the 
observer centers his range of responses on 
the range of stimuli. In the stimulus spacing 
bias of Figure 1E, the observer responds as 
if all the stimuli were equally spaced geo- 
metrically and equally probable. Parducci 
(1963; Parducci & Perrett, 1971) describes 
the’ effect of these two biases together by his 
tange-frequency model. The model extends 
and modifies Helson’s (1964) original adap- 
tation-level model of the two biases, 

The contraction bias of Figure 1C is a 
general characteristic of human behavior. 


It affects the distances reproduced by a limb 
as well as the judgments of anything that can 
be quantified, such as overt judgments of 
sensory magnitude (Poulton, 1973, 1974, 
1975). Large stimuli and differences between 
stimuli are underestimated, whereas small 
stimuli and differences are overestimated. 
Once the observer knows the range of re- 
sponses, he selects a response too close to 
the middle of the range. S. S. Stevens and 
Greenbaum (1966) call the contraction bias 
the “regression effect." 

The local contraction bias of Figure 1D 
appears to have been described only for 
judgments of loudness. Acoustic stimuli Ш ү 
small, very high-intensity and very M 
tensity ranges are treated as if they ha 
less extreme values than they do have. This 
corresponds to the time-order error in a 
vestigations of the differential thresho 
(Hollingworth, 1910; Woodworth, 1938, PP: ! 
438-448). A 

Judgments of sensory magnitude аге i 
fected also by the stimulus and rep 
equalizing biases of Figure 1B. For E 
stimulus equalizing bias, the two e. б, 
the sides represent stimuli, whereas the 
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in the middle represents responses. The ob- 
server uses his full range of responses what- 
ever the size of the range of stimuli. For the 
response equalizing bias, the two scales at 
the sides represent responses, whereas the 
scale in the middle represents stimuli. The 
observer uses a larger range of responses 
when he is provided with a larger range. 
Numerical judgments that require a step 
change in the number of digits are affected 


‘also by the logarithmic bias of Figure IF. 


In using numbers logarithmically, the ob- 
server treats 1-, 2-, 3-, . . . , m-digit numbers 
as equally frequent instead of treating n- 
digit numbers as 10 times as frequent as (м 
— 1)-digit numbers. This shrinks the upper 
part of the numerical scale (Banks & Hill, 
1974). 

The logarithmic bias, the two contraction 
biases, and the stimulus and response equal- 


, izing biases can all affect the observer’s very 


first judgment. In contrast, the effects of 
the centering bias and the stimulus spacing 
bias come on gradually as the observer learns 
the set of stimuli. These last two kinds of 
bias can both be said to be due to transfer 
from previous stimuli. 

Judgments of sensory magnitude can be 
biased also by transfer from previous inves- 
tigations, from instructions and demonstra- 
tions, from previous stimuli, from previous 
judgments, and from previous responses. The 
importance of the bias introduced into ex- 
periments by transfer is now becoming clear 
(Erlebacher, 1977; Greenwald, 1976; Poul- 
ton, 1973, 1974, 1975; Poulton & Freeman, 
1966). 


Centering Bias 


The centering bias affects category ratings. 
Together with the stimulus spacing bias, it 
accounts for Helson's (1964) level of adap- 
tation. Parducci (1963; Parducci & Perrett, 
1971) shows that the level of adaptation de- 
pends on two characteristics of the distribu- 
tion of stimuli, which can be varied inde- 
pendently, The influence of the midpoint of 
the range of stimuli is described here as the 
centering bias. The influence of the median 
of the distribution of stimuli is discussed 
later as the stimulus spacing bias. 
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Figure-1A illustrates a model of the cen- 
tering bias. The stimulus intensities used in 
the theoretical investigation represented on 
the left side of the figure are two steps 
greater than the stimulus intensities used in 
the investigation represented on the right. 
Yet once the observer has learned the range, 
he centers his response scale on the range 
of stimulus intensities, whatever their physi- 
cal values are. This is indicated by the lines 
that all slope down toward the right. Thus, 
a more intense stimulus from the investiga- 
tion represented on the left side of the figure 
receives the same rating as a less intense 
stimulus from the investigation represented 
on the right side of the figure. 


Centering Bias in Rating Noisiness 


Figure 2 illustrates the centering bias in 
five fairly comparable investigations of the 
noises made by road vehicles and by air- 
craft. The filled points represent motor ve- 
hicle noise (from D. W. Robinson, Cope- 
land, & Rennie, 1961). The unfilled points 
represent aircraft noise (from Bowsher, John- 
son, & Robinson, 1966). The vertical lines 
show the range of noises in decibels (dB [A] ). 
The short horizontal lines show the mid- 
points of the ranges. The vertical lines have 
been separated horizontally to make all the 
midpoints of the ranges lie on the dashed 
straight line. 

The diamonds indicate the midpoints of 
the straight lines fitted to the ratings. This 
is the transition between acceptable and 
noisy in the D. W. Robinson et al. (1961) 
investigation and between moderate and 
noisy in the two Bowsher et al. (1966) in- 
vestigations. In the Andrews and Finch 
(1951) and Lauber (Note 1) investigations 
the diamonds indicate the transition between 
unobjectionable or inoffensive and objection- 
able or annoying. 

If the observers were to behave like sound 
level meters and have a fixed criterion for 
the just acceptable level of noise, all the 
diamonds in Figure 2 would lie on a hori- 
zontal line. If the ratings were determined 
entirely by the midpoints of the ranges of 
sounds presented, the diamonds would be 
superimposed upon the short horizontal lines. 
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Figure 2. The centering bias in rating noises. (The 
vertical lines indicate the ranges of noise intensities 
heard. The short horizontal lines represent the 
midpoints of the ranges, The diamonds show the 
average just acceptable noise levels that lie at the 
midpoints of the rating scales obtained from sepa- 
rate groups of between 19 and 37 observers. An- 
drews = Andrews & Finch, 1951; Robinson=D, 
W. Robinson, Copeland, & Rennie, 1961; Bowsher 
— Bowsher, Johnson, & Robinson, 1966; and Lau- 
ber = Lauber (Note 1). Adapted from Poulton, 
1977, Figure 2.) 


The dotted line fitted to the diamonds by 
eye is a compromise, which favors the mid- 
points of the ranges. The midpoints of the 
ratings are determined less by the actual 
sound levels than by the ranges. 

In the Andrews and Finch investigation 
on the left side of Figure 2, the diamond 
representing the midpoint of the 10-point 
Scale of objectionableness is too high. It is 
pulled up by the middle of the range of 
noise intensities. In the investigations illus- 
trated on the right side of the figure, the 
diamonds are too low. They are pulled down 
by the middle of the range of noise intensi- 
ties. The dashed and dotted lines cross at 
about 85 dB (A). Here a diamond would be 
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pulled neither up nor down. The Crossover 


represents the only point in the data that .. 


is not affected by the centering bias, 


Avoiding the Centering Bias in Ratings 


A field investigator who wants to obtain 
named categories to represent sensory mag- 
nitudes can avoid the centering bias by using 
a method that provides data comparable to 


all the data illustrated in Figure 2. Separate. 


groups of observers hear all the sounds, each 
group with a different constant amplification 
or attenuation. Thus, each group of observers 
provides a single vertical function like one 
of those of Figure 2. For the group whose 
average midrating is found to lie at the 
crossover point illustrated in the figure, the 
judgments are not affected by the centering 
bias. This method of avoiding the centering 
bias is listed at the top of the middle col- 
umn of Table 1. 

Table 1 indicates that the centering bias 
can also be avoided by restricting each group 
of observers to a single stimulus. If the in- 
vestigator wants to measure only the just 
acceptable noise level at the middle of the 
range of ratings, separate groups of observ- 
ers can each be given a single noise to judge 


at a different intensity. The observers have | 


to decide simply whether the noise is accept- 
able or not acceptable. The noise that is 
judged acceptable by just 5095 of the ob- 
Servers can then be determined. However, it 
is essential that the observers not have heard 
a range of noises of different intensities be- 
fore judging their single noise. This is be- 
cause their judgment will be affected by the 


a 


Previous stimulus range, even though they if 


do not judge the noises (Bowsher et al., 1966, 
Table 3; Von Wright & Kekkinen, 1970). 

Investigators who use numerical category 
ratings are often concerned only with the 
relative positions of the sensory magnitudes 
on the rating scale, not with their absolute 
Positions. Here the centering bias does not 
matter. Often the investigator anchors the 
upper end of the rating scale to the most in- 
tense stimulus in his series and the lower 
end of the rating scale to his least intense 
stimulus. In doing so, he centers the rating 
scale on the range of stimuli. 


| 


Centering Bias in Direct 
Magnitude Estimation 


The top of the last column of Table 1 
indicates that in direct magnitude estimation, 
the centering bias can also be avoided by 
asking each observer for a single judgment. 
However, in using direct magnitude estima- 
tion, the investigator generally wishes to 
obtain only a relative judgment between two 
intensities, not an absolute judgment. The 

/ relative judgment shows how the reported 
difference in sensation varies with the differ- 
ence in intensity. S. S. Stevens (1971, 1975) 
uses the following model: 


у= Кф", 
or taking logs, 
log у = log К + n log 4, 


where y is the subjective intensity and ¢ is 


ЎА the physical intensity, In this model the cen- 


p 


i 


А 


+ ward, whereas when 


tering bias is represented by log K. It is 
equated across observers or investigations by 
fixing or adjusting the value of K. 


Logarithmic Bias 


The Arabic numbering system introduces 
a logarithmic bias into the observer’s use of 
numbers when he reaches a step change in 
the number of digits. After counting up to 
10 the observer has two alternatives for the 
way in which to proceed. He can behave line- 
arly and continue 11, 12, 13, . . . , OF he can 
а more logarithmic and continue 20, 30, 
es 

If the observer generates numbers linearly, 
he uses two-digit numbers 10 times as often 
as single-digit numbers. He uses three-digit 
numbers 100 times as often, and so on. If 
the observer generates numbers logarithmi- 
cally, he uses 1-, 2+, 3+. . ., digit num- 
bers equally often. This shrinks the upper 
part of the response scale, as illustrated on 
the right side of Figure 1F. 

The numbers generated can be arranged 
in rank order of size at equal intervals along 
the abscissa of a graph. When the numbers 
are plotted with a linear scale on the ordi- 
nate, the function is markedly concave up- 
the scale on the ordi- 


E] 
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nate is logarithmic, the function is slightly 
concave downward. The untrained observer 
compromises between pure linear and pure 
logarithmic choices, but favors logarithmic 
choices (Banks & Hill, 1974). 

Figure 1F shows the relationship between 
numbers plotted logarithmically and numbers 
plotted linearly. The two sets of numbers are 
scaled so that they are the same height at 
10 and at 1.4 on the two scales. At these 
two points, 

x = 10 logio 2. 


At other points, the corresponding numbers 
are not the same height because here x and 
10 log;o x are different. 

The lines connect various xs on the linear 
scale to the corresponding 10 logioxs on 
the logarithmic scale. Between 3 and 7 the 
lines are almost parallel, so the two scales are 
virtually identical. Above 7 the logarithmic 
scale shrinks compared with the linear scale, 
while below 3 it expands compared with the 
linear scale. But when observers generate 
numbers in an experiment, the variability 
is so large that it is usually not possible to 
distinguish reliably between the two num- 
bering systems in the range between 1 and 
10. 

The figure shows that above 10 the loga- 
rithmic scale is very much more condensed 
than the linear scale. So when an observer 
generates numbers that extend from 1 or 2 
up to 50 or 100, it is possible to determine 
whether he is behaving linearly or logarith- 
mically. The broken vertical line in Figure 
1F indicates that in generating numbers, the 
unpracticed observer compromises between 
the two scales. 

Numbers below 1.0 are rarely used in 
psychophysical investigations. Here the log- 
arithmic scale is very much more spread out 
than the linear scale, This explains why when 
sensory magnitude is plotted with a logarith- 
mic scale on the ordinate of a graph against 
physical magnitude on the abscissa, the slope 
of any psychophysical function becomes 
steeper as the threshold is approached. The 
threshold represents zero sensory magnitude. 
On the logarithmic scale of the ordinate, the 
threshold is infinitely far away in a down- 
ward direction (Poulton, 1968, p. 5). 
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Figure 3. The logarithmic bias in direct magnitude estimation of electrotactile stimuli and in 
rating the stimuli on a 49-point scale corresponding to the average range of numbers used in 
the magnitude estimation. (Ratings using 7 categories are also shown. Panel A has a linear scale 
on the ordinate, whereas Panel B has a logarithmic scale. Results are from Gibson & Tomko, 


1972.) 


Logarithmic Bias in Rating 


Figures 3A and 3B (results of study by 
Gibson & Tomko, 1972) illustrate the loga- 
rithmic bias that occurs when an observer 
rates electrotactile intensity with a large 
number of numerical categories, The numeri- 
cal judgments on the ordinate are plotted in 
Figure 3A with a linear scale and in Figure 
3B with a logarithmic scale. Thus, the solid 
straight line of Figure 3B becomes the solid 
line that is concave upward in Figure 3A. 
The electrotactile stimuli on the abscissa have 
the same logarithmic scale in both figures. 

The unfilled circles represent the arith- 
metic means of a group of 11 observers who 
use a rating scale with 49 main categories. 
The categories 1 and 51 are reserved for 
stimuli judged to fall outside the initially 
defined range of stimuli, With the linear ver- 
tical scale of Figure 3A, the function repre- 
sented by the unfilled circles is markedly 
concave upward, whereas with the logarith- 
mic vertical scale of Figure 3B, the function 

is slightly concave downward, The slight 
downward concavity is indicated by the un- 
filled circles at the top and bottom of the 
function, which lie below the solid straight 
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Figure 4. Mean 7-point category ratings plotted on” 
a linear scale against mean magnitude judgments 
plotted on a logarithmic scale. (The two sets 0 

data are from two separate groups of 11 observers 
The function is slightly concave upward because” 
the magnitude judgments are plotted on the аррго“ 
priate logarithmic scale instead of on a linear scafe: | 
Results are from Gibson & Tomko, 1972.) 
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line. The slight downward concavity shows 
that the observers do not use a purely loga- 
rithmic numbering system, although their 
hybrid numbering system is nearly loga- 
rithmic. 

The logarithmic bias does not affect rat- 
ings with 9 or less numerical categories 
(Montgomery, 1975; Parducci, 1963). The 
squares and the dashed fitted straight lines 
in Figures 3A and 3B represent the arith- 


„metic means of a separate group of 11 ob- 


servers who use a rating scale with 7 main 
categories. The categories 1 and 9 are re- 
served for stimuli judged to fall outside the 
initially defined range of stimuli. With the 
linear vertical scale of Figure 3A, the func- 
tion represented by the squares is closely 
fitted by the straight line. Thus the loga- 
rithmic bias affects the rating scale with 49 
categories but not the rating scale with 7 
categories. 

S. S. Stevens (1975, Figure 50) reports a 
logarithmic bias in ratings of loudness using 
seven categories. But this is presumably 
caused by transfer of the logarithmic bias 
from previous investigations on magnitude 
estimation performed by the same observers. 
Transfer between S. S. Stevens’s various in- 
vestigations is discussed later in this review. 

With the logarithmic vertical scale of Fig- 
ure 3B, the squares are also fitted fairly well 
by the straight line. This is because Figure 
1F shows that within the range between 1 
and 10, the linear and logarithmic scales are 
reasonably comparable. The most deviant 
square in Figure 3B is on the extreme left, 
with a mean rating as low as 1.4. Here Fig- 
ure 1F shows that the two scales begin to 
differ appreciably. 


Logarithmic Bias in Direct 
Magnitude Estimation 


The filled circles and fitted solid line in 
Figure 3B illustrate the logarithmic bias in 
direct magnitude estimation. Here the points 
represent geometric means instead of arith- 
metic means. The results are from another 
separate group of 11 observers who judge 
the stimulus intensities in numbers without 
a specified standard or modulus. In Figure 
3B the function represented by the filled 
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circles is slightly concave downward, like 
the function represented by the unfilled 
circles for the ratings. The slight downward 
concavity shows that the observers do not 
use a purely logarithmic numbering system. 
However the filled circles in Figure 3A show 
that the observers certainly do not use a 
linear numbering system. 

When the magnitude judgments of Figures 
3A and 3B are plotted against the category 
ratings on the 49-point scale represented by 
the unfilled circles, the resulting function is 
a straight line (Gibson & Tomko, 1972, Fig- 
ure 4). This indicates that the logarithmic 
bias affects magnitude judgments and cate- 
gory ratings to a similar extent, provided 
the range of numbers used as responses is 
about the same. In Gibson and Tomko's ex- 
periment, the numbers used for the category 
ratings are selected to correspond to the 
average range of numbers used in the magni- 
tude judgments. It follows that the differ- 
ence between direct magnitude estimation by 
unpracticed observers and numerical cate- 
gory rating is simply in the range of numbers 
normally used, not in the kind of judgment 
that the observers are instructed to make. 
Montgomery (1975, Figure 1, Panels Aj, 
A», and Аз) makes а similar point. 


Ratings and Magnitude Estimates 


The same stimuli can be judged while 
using either a linear numbering system or a 
logarithmic numbering system. When the two 
sets of judgments are plotted against each 
other with the linear scale on the ordinate 
and the logarithmic scale on the abscissa, 
the resulting function should be a straight 
line, But as Torgerson (1960) points out, the 
function is usually slightly concave upward, 
like the function in Figure 4. The figure 
shows Gibson and Tomko’s (1972) 7-point 
ratings plotted on a linear scale against their 
magnitude judgments plotted on a logarith- 
mic scale. The function is slightly concave 
upward because Figures 3A and 3B show 
that the ratings use a linear numbering sys- 
tem, whereas the magnitude judgments have 
a logarithmic bias, but are not completely 
logarithmic. 

Torgerson's (1960, Figures 3 and 4) re- 
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sults for Munsell grays and Eisler's (1962, 
Figure 4, Panels ILa, IILa, IIILa, ISa, IISa, 
and IIISa) and Montgomery's (1975, Table 
2) results for loudness all give functions that 
are slightly concave upward, like the func- 
tion in Figure 4. Where the stimulus spacing 
does not bias the shape of the functions, 
Montgomery's results are clearly due to the 
hybrid numbering system used for the mag- 
nitude judgments, as with the data of Figure 
4. Unfortunately Torgerson's and Eisler's re- 
sults are influenced by the stimulus spacing 
bias and perhaps also by transfer from 
previous magnitude judgments or category 
ratings. Transfer bias from previous condi- 
tions is not possible in Gibson and Tomko's 
and Montgomery's experiments because they 
use a separate group of observers for each 
condition. 

Figure 3A shows that Gibson and Tomko's 
(1972) 7-point ratings are linear on a log- 
normal or semilog plot. Thus in physical 
units, the equal intervals between the ratings 
on the ordinate of Figure 4 correspond to a 
logarithmic physical scale. The logarithmic 
scale of the magnitude judgments on the ab- 
scissa of Figure 4 also corresponds more or 
less to a logarithmic physical scale. This is 
because Figure 3B shows that the relation- 
ship between the magnitude judgments and 
the physical units is almost a straight line 
on a loglog plot. So both the vertical and 
the horizontal scales of Figure 4 are more 
or less equivalent to the logarithm of the 
physical units, hence the more or less 
straight-line relationship between them in the 
figure. 

It is important to distinguish between the 
slightly concave upward relationship of Fig- 
ure 4 and the markedly concave downward 
relationship reported by S. S. Stevens and 
Galanter (1957, Figures 8 and 9) between 
category and direct magnitude scales of loud- 
ness. S. S. Stevens and Galanter plot the 
logarithmic ratio judgments using a linear 
scale of sones instead of a logarithmic scale. 
This converts the slightly concave upward 
relationship of Figure 4 into S. S. Stevens's 
well publicized concave downward relation- 

ship (Eisler, 1962, Figure 4, Panels ILb, 
IILb, ПІ, ISb, IISb, and IHSb; Gibson 
& Tomko, 1972, Figure 3; Torgerson, 1960, 
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ship is produced simply by plotting what. 
corresponds to a logarithmic physical scale on 
the ordinate against what corresponds to a 
linear physical scale on the abscissa. It has 
no greater significance than this. 


Avoiding Hybrid Numbering Systems 


The logarithmic bias is most likely to occur 
when the observer uses a range of number 
that includes a step change in the number 
of digits. In S. S. Stevens's method of direct 
magnitude estimation, a stimulus in the 
middle of the range of stimuli may be called 
10. The observer has then to use both one- 
digit and two-digit numbers. If the observer 
serves in several such investigations, the log- 
arithmic bias in using numbers greater than 
10 is likely to transfer to the use of num- 
bers between 1 and 10. Eventually the ob- 
server may consistently use numbers loga- 
rithmically. Differences between sensory in- 
tensities are then expressed as numerical 
ratios. The consistent use of a purely loga- 
rithmic numbering system can be encouraged 
by instructing the observer to use ratios, a 
S. S. Stevens (1971, p. 428) does. Those 
who do not give ratios consistently are not 
invited to serve as observers. 

The logarithmic bias should not occur 
when the observer uses numbers that all have 
an equal number of digits, except as a result 
of transfer from prior ratio judgments. The 
methods of avoiding the logarithmic bias are 
listed at the bottom of the second section 
of Table 1. 


£ 


Stimulus Spacing Bias 


The stimulus spacing bias is one of the 
two biases that together account for Helson 3 
(1964) level of adaptation (Parducci, 1963; 
Parducci & Perrett, 1971), the other being 
the centering bias, which has already been 
discussed. Figure 1E illustrates a model oi 
the stimulus spacing bias. The five stimuli 
at the top of the range of intensities a 
the left are spaced at geometric intervals only 
half the size of the three stimuli at m 
bottom of the range. In judging the stimu’ 


the observer uses their rank order of mag- 
nitude rather than their relative subjective 
sizes. He behaves as if all the stimuli were 
subjectively equally spaced. In category rat- 
ing this means using all the categories equally 
often. Thus the smaller intervals are over- 
estimated compared with the larger intervals. 

The stimulus frequency bias is a special 
case of the stimulus spacing bias. The ob- 
"server behaves as if all the stimuli were 
equaly probable. When one stimulus is pre- 
sented more frequently than the remainder, 
the observer treats the more frequent stimuli 
as if they were all nearly, but not exactly, 
‘the same size. Thus, three identical stimuli 
‘one quarter of the way from the top of the 
stimulus range are treated as three closely 
spaced stimuli of slightly different size, as 
illustrated in Figure 1E. In judging all the 
stimuli while using their rank order of mag- 
E nitude rather than their relative subjective 
Ü sizes, the observer allocates an excessive 

amount of his response scale to the identical 

stimuli. The stimulus region around the iden- 

tical stimuli is therefore overestimated com- 

pared with the remainder of the stimulus 


Stimulus Frequency Bias in Rating 


The stimulus frequency bias in category 
rating is described at length by Parducci 
(1963; Parducci & Perrett, 1971), who uses 
separate groups of observers for each condi- 
tion and by Pollack (19642, 1965а, 19655), 
who uses the same or different observers. A 

number of extra stimuli of one particular in- 
Brensity are added to a set of stimuli. The 
) observer tends to use all his response cate- 

gories equally often. Thus stimuli a little 
greater than the added stimuli receive higher 

ratings than previously, whereas stimuli a 
- little smaller than the added stimuli receive 
_ lower ratings than previously. In the rating 
| Scale the added stimuli produce a local ex- 
фо, which is flanked on both sides by 

compensatory contraction. 


Stimulus Spacing Bias im Rating 


An example of the stimulus spacing bias 
that corresponds to the model of Figure 1E 
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Figure 5. The stimulus spacing bias in rating the 
loudness of white noise. (Results are from Mont- 
gomery, 1975.) 


is illustrated on the lognormal or semilog 
plot of Figure 5 (results from Montgomery, 
1975). The unfilled squares represent ratings 
of the loudness of white noise made on a 
7-point scale by 10 psychology undergradu- 
ates, The stimuli are spaced at 10-dB inter- 
vals between 30 and 100 dB (SPL), with 
additional stimuli inserted at 85 and 95 dB. 
The two inserted stimuli increase the slope 
of the function at the top end. The six points 
at 10-dB intervals are fitted by a separate 
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Figure 6. The stimulus spacing bias in ratings and 
direct magnitude judgments of the loudness of 
white noise. (The difference in slope between the 
filled triangles and the filled circles is produced by 
the stimulus equalizing bias. Results are from Mont- 
gomery, 1975.) 
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straight line from thé five points at 5-dB 
intervals. As the extra stimuli halve the in- 
terval between stimuli, they should double 
the slope. The ratio of the two slopes is in 
fact rather larger than this: 1:2.3. Similar 
results are reported by Parducci and Perrett 
(1971), by Pollack (1964a, 1965a, 1965b), 
and by J. C. Stevens (1958). 

In contrast, the filled triangles and dotted 
line represent ratings made on a 9-point 
scale by a separate group of 10 psychology 
undergraduates. They show the almost linear 
function that is produced by a uniform geo- 
metric spacing between all stimuli. 

The unfilled squares and dashed fitted line 
of the loglog plot of Figure 6 (results from 
Montgomery, 1975) correspond to the un- 
filled squares and dashed fitted lines in Fig- 
ure 5. Figure 6 shows that the discontinuity 
in the function of Figure 5 disappears when 
the ratings are plotted on a logarithmic scale. 
Inserting the extra stimuli between the stim- 
uli at the top of the range helps to make the 
category ratings linear on a loglog plot like 
that of Figure 6. 

The unfilled circles and dashed fitted line 
in Figure 6 are for category ratings be- 
tween 1 and 100 made by another group of 
10 psychology undergraduates. They show 
that with as many as 100 categories, insert- 
ing the extra stimuli at 85 and 95 dB (SPL) 
is not quite so much help in making the 
category ratings linear on a loglog plot. 


Stimulus Spacing Bias in Direct 
Magnitude Estimation 


J. C. Stevens (1958, Figure 3) describes 
the stimulus spacing bias in direct magnitude 
estimation using numbers, One group of 
observers judges the loudness of white noises 
spaced at 10-dB intervals between 40 and 
100 dB (SPL). Three other groups judge the 
same range of noise intensities, but with one 
of three different 10-dB intervals filled with 
three or four extra noise intensities sepa- 
rated by 2 dB. The extra stimuli produce 
local increases in the steepness of the psy- 
chophysical function relating loudness to in- 
tensity on a loglog plot like that of Figure 6. 

The stimulus spacing bias can be used to 
increase the effect of the logarithmic bias 
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illustrated in Figure 1F. In judging senso 
magnitudes, unpracticed observers produc 
functions on a loglog plot that are slight] 
concave downward, like the function repre- 
sented by the filled triangles in Figure 6 (ге 
sults from Montgomery, 1975) and by 
filled circles in Figure 3B (Gibson & Tomko, 
1972). The filled circles in Figure 6 are from 
a separate group of 10 psychology under- 
graduates who judge noises ranging from 3 
to 100 dB (SPL). Here the stimuli ar 
spaced 10 dB apart, except for addition 
stimuli inserted at 85 and 95 dB. The tw 
inserted stimuli straighten out the concavil 
that would otherwise occur at the top end 
the function. 

Inserting additional stimuli near the t 
of the range is now a well-known procedur 
for obtaining a nearly straight-line function 
on a loglog plot with unpracticed observers 
(Eisler, 1962; J. C. Stevens & Tulving, 
1957). This does not prove that subject 
sensory intensity is a power function of phys- 
ical intensity. It simply demonstrates the 
investigator’s practical familiarity with the 
logarithmic and stimulus spacing biases and 
his skill in combining them to produce 
nearly straight line on a loglog plot. 


Avoiding the Stimulus Spacing Bias 


The second section of Table 1 indicates 
that the stimulus spacing bias can be avoid 
by restricting each group of observers to à 
single judgment. A more practical alternative 
is to space all the stimuli geometrically and 
to present the stimuli equally often. 

Confusions between stimuli reduce the 
slope of the psychophysical function at the 
point where the confusions occur. At the 
limit, where two different stimuli are alway® 
confused, the slope is zero because both stim 
uli receive the same average response. 109 
variations in slope can therefore be bs 
by making successive pairs of stimuli ps у 
confusable. Following Webers law (5 i 
Woodworth, 1938, pp. 430-438), this a 
using geometric spacing of stimuli. Geom” | 
spacing is thus the subjectively neutral TA 
ing, which produces a straight line and avo 
the stimulus spacing bias. 
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Geometric stimulus spacing can be used in 
both category rating and in magnitude esti- 
mation. In category rating with not more 
than about nine categories, it produces ap- 
proximately equally spaced ratings. In mag- 
nitude estimation by observers who are 
trained to use numbers logarithmically, it 
produces approximately equal subjective 
"ratios, which are equally spaced on a loga- 
kithmic plot. 
Lil 


Contraction Biases 


Figure 1C illustrates a model for the con- 
traction bias. Large stimuli and differences 
between stimuli are underestimated, while 
small stimuli and differences between stimuli 
are overestimated. The contraction bias af- 
fects an observer’s very first judgment, be- 
cause he always has some idea of the range 
of stimuli or stimulus differences to expect. 

The contraction bias is facilitated by giv- 
ing the observer a limited set of responses to 
use with an obyious middle value. Examples 
are a set of ratings, the finite set of num- 
bers used in fractional magnitude judgments, 
‘and the’ range of the gain control used in 
magnitude production and cross-modality 
matching. Once the observer knows the range 
of responses, he selects a response that is 
closer to the middle of the range of re- 
sponses than it should be (S. S. Stevens & 
Poulton, 1956, Table 1). 


‘Contraction Bias in Judging Visual Н eight 


To investigate the contraction bias di- 
"у rectly, it is necessary to obtain responses in 
+ the physical units used to measure the 
stimulus. Figure 7 illustrates the contraction 
' bias in very first judgments of the height of 
a white post in a field (from Joynson, New- 
son, & May, 1965). The abscissa and the 
solid lines show the true height in inches. 
„ће points represent the mean very first 
 Gudgments of five separate groups, each with 
30 observers, in Experiment 1 and of four 
separate groups, each with 18 observers, in 
Experiment 2. On the left of the figure, the 
heights of short posts are overestimated. On 
the right, the heights of tall posts are under- 
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Figure 7. The contraction bias in the very first 
judgments of the height of a white post in a 
field. (One third of the observers in each group 
judge their post at a distance of 25 m, one third 
at 100 m, and the remaining third at 400 m in 


Experiment 1 and at 230 m in Experiment 2. Re- 
sults are from Joynson, Newson, & May, 1965.) 


estimated. As in Figure 2, the only unbiased 
points are where the lines cross. 

Experiment 1 is carried out on an airfield. 
The dashed line fitted to the filled points 
crosses the solid line at a height of 80 inches 
(2 m). This is about the height of the 
perimeter fence around most airfields. Ex- 
periment 2 is carried out on a level field of 
mown grass. The fitted dashed line crosses 
the solid line at a height of 48 inches (1.2 
m). This is close to the mean height of 
about 54 inches (1.4 m; SD= 12 inches or 
3 m) given by 40 people in answer to the 
question, “How tall is a post?” (Joynson, 
Note 2). In making their very first judg- 
ments, the observers appear to take as a 
reference the expected height of a fence or 
post. They select a height a little nearer to 
this reference height than it should be. The 
reference height takes the place of the middle 
of the range in Figure 1C. 


Contraction Biases in Direct M. agnitude 
Estimation and Production 


The contraction bias can be demonstrated 
indirectly by contrasting magnitude estima- 
tion with magnitude production. The differ- 
ence between the two procedures is in the 
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dimension used-for the observer's responses. 
Whichever dimension the observer uses is 
contracted. Plotting the results of magnitude 
estimation and magnitude production in the 
same figure produces crossing functions like 
those in Figure 7. The crossing functions 
differ from those in the figure in that both 
functions are biased in opposite directions; 
there is no unbiased function like the solid 
line in the figure. 

A number of examples of the crossing 
functions produced by contrasting magni- 
tude estimation with magnitude production 
are given by S. S. Stevens and Greenbaum 
(1966), who call this the “regression effect.” 
However, in most of these investigations the 
two dimensions differ in size. Thus, the cross- 
ing functions are influenced by the stimulus 
and response equalizing biases as well as by 
the contraction bias. There are also transfer 
effects, because each observer makes a num- 
ber of both kinds of judgment, each kind in 
a separate part of the experiment. 


Sequential. Contraction Bias 


After the presentation of a stimulus near 
the top of the range of stimulus intensities, 
the next stimulus tends to be judged larger 
than it should be. After the presentation of 
a stimulus near the bottom of the range of 
intensities, the next stimulus tends to be 
judged smaller than it should be. This re- 
sult is found in the rating of stimuli on a 
10-category rating scale (Holland & Lock- 
head, 1968; Ward, 1972; Ward & Lock- 
head, 1971), in direct magnitude estimation 
without a standard (Cross, 1973; Jesteadt, 
Luce, & Green, 1977, Experiments 1 and 3) ; 
in direct magnitude estimation with a vari- 
able standard (Jesteadt et al., 1977, Experi- 
ment 2; Ward, 1973), and in cross-modal 
matches without a standard (Ward, 1975). 

The sequential contraction bias is a special 
case of the contraction bias. After the pre- 
sentation of a stimulus at one end of the 
range of stimuli, the average difference from 
the next stimulus selected at random is likely 
to be large. Large differences tend to be 
underestimated. Thus, the next stimulus will 
be judged on the average nearer than it 

should be to the previous stimulus. After 
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the presentation of a stimulus near the top 
of the range, the bias produces а positives 
average constant error. After the presenta 
tion of a stimulus near the bottom of the 
range, the bias produces a negative average 
constant error. After the presentation of 
stimuli near the middle of the range, 
average constant error is small. 

When the subjective judgments are plotted 
against the physical stimuli, the sequentia 
contraction bias reduces the slope of the psy# 
chophysical function. This is because after, 
the presentation of a stimulus near the top 
end of the range, the stimulus with a large 
difference that is likely to be underestimated 
will be near the bottom end of the range, 
The size of the range is therefore underesti- 
mated. There is a similar underestimation: 
after the presentation of a stimulus near (ће 
bottom end of the range, because the stimu- 
lus with a large difference that is likely to bey 
underestimated will be near the top end o 
the range. Thus the sequential contraction 
bias provides the mechanism for S. S. Ste 
vens and Greenbaum's (1966) regression ef 
fect in a series of magnitude judgments and 
productions (Cross, 1973). 


Local Contraction Bias 


In addition to the global effect of th 
contraction bias in direct magnitude estima= 
tion and production that has already been 
discussed, the contraction bias has a local 
effect, which is illustrated in Figure 1D. The 
local contraction bias corresponds to the! 
time-order error that is found in measure 
ments of the differential threshold (Holling 
worth, 1910; Woodworth, 1938, pp. 4383 
448). || 

When presented in а small range, Ve 
high-intensity and very low-intensity stimuli 
are treated as if they had less extreme values, 
At the top of Figure 1D for a very intensă 
standard, increasing the subjective intensi 
by a fixed proportion requires a small 5 
change in physical intensity than reducing 
the subjective intensity by the same РЕД 
portion. At the bottom of the figure for $ 
barely perceptible standard, the tendency 4 
reversed. Increasing the subjective intensity 
by a fixed proportion requires а 18186. 


| 
| 
change іп physical intensity than reducing 
i the subjective intensity by the same propor- 
tion. 

As yet the local contraction bias appears 
to have been described only in judgments of 
loudness in a within-subject experimental 
design. The bias is illustrated in Figure 8. 
The figure shows the median number of deci- 
bels required to halve and double the loud- 
ness of white noise in different 10-dB ranges 
е sound pressure level, using the method of 
magnitude production (Poulton & Stevens, 
1955). Each of 36 observers sets the in- 
tensity of a variable noise to what he judges 
to be half or double the loudness of a num- 
ber of standard noises, using a volume con- 
trol. He pauses for about 1.0 sec between 
listening to the standard noise and listening 
to the variable noise. In the range between 
90 and 110 dB (SPL), doubling the loudness 
requires fewer decibels than halving does. 
AWhile in the range between 30 and 50 dB, 
doubling requires more decibels than halving 
does, Over the middle range between 50 and 
90 dB there is no obvious asymmetry. This 
corresponds to the absence of bias in the 
middle of the range in Figure 1D. 

Two obvious explanations of the local con- 
traction bias of Figure 8 can be rejected. 
JAlthough the contraction occurs at both ends 

of the range, it is not due to a ceiling or 
floor effect from the observer approaching 
the end of his range of responses. At each 
intensity level the intensity corresponding to 
the standard is always about one quarter of 
the way from the lower end of the volume 
control in doubling and about one quarter of 
the way from the upper end in halving. Thus, 
Die is always plenty of movement of the 
volume control available, with the corre- 
sponding increase or decrease in intensity. 

A second obvious explanation is that the 
contraction bias with the ranges of intensities 
centered on 95 and 105 dB could be due to 
the observer refusing to set the intensity of 
his volume control high enough. Similarly the 
contraction bias with the ranges centered on 
35 and 45 dB could be due to the observer 
refusing to set the intensity of his volume 
control low enough. However, these explana- 
tions do not account for the similar biases 
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Figure 8. The local contraction bias in magnitude 
production with small very high-intensity and very 
low-intensity ranges of stimuli, (Results are from 
Poulton & Stevens, 1955, Figure 3.) 


found with numerical magnitude estimation 
(S. S. Stevens, 1955b, Tables 1 and 2), be- 
cause here the experimenter sets the vari- 
able sound intensities, and the observer sim- 
ply judges how much louder or softer they 
are than the standard. 

A local contraction bias similar to the 
dashed line of Figure 8 for halving loudness 
is reported by Warren (1970, 1973) for very 
first judgments of half loudness, for both 
white noise (Warren, 1973, Figure 2) and a 
I-kHz tone (Warren, 1970, Figure 3). In 
both of Warren's investigations reliably more 
decibels are required to produce numerical 
judgments of one half of a sound of 100 dB 
(SPL) than to produce numerical judgments 
of one half of a sound of moderate intensity, 
whereas reliably fewer decibels are required 
to produce numerical judgments of one half 
of a sound of 35 dB. For the intermediate 
ranges between 45 and 90 dB, the number 
of decibels required for half loudness remains 
approximately constant. 

As in the experiment of Figure 8, in both 
of Warren’s (1970, 1973) experiments the 
local contraction bias affects the judged sub- 
jective intensity of the sound. The bias does 
not affect the number dimension, whether 
numbers are used as stimuli or as responses. 


Avoiding the Contraction Biases 


Investigators using ratings may be con- 
cerned only with the relative positions of 
the sensory magnitudes on their rating scale, 
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If so, they can avoid the contraction bias by 
using anchors, as indicated at the bottom of 
the first section of Table 1. At the start of 
the investigation the greatest stimulus is 
presented and given the largest rating. The 
smallest stimulus is presented and given the 
smallest rating. This encourages the observer 
to use the complete rating scale for the 
range oí stimuli. 

Without anchors, ratings not affected by 
the contraction bias are obtained only at the 
neutral point in the middle of a range of 
named categories. Here the stimulus is not 
underestimated because it is not judged to 
be large. It is not overestimated because it 
is not judged to be small. 

In magnitude estimation, the contraction 
bias can be avoided in theory by using only 
the judgments of subjectively moderate dif- 
ferences in intensity. The insuperable diffi- 
culty is that the observer is not asked to 
indicate what represents a subjectively mod- 
erate difference in intensity, and there is no 
other way of specifying it using the experi- 
mental data. 

S. S. Stevens (1971, p. 444) neatly side- 
steps this difficulty by following a suggestion 
he makes 15 years earlier (S. S. Stevens & 
Poulton, 1956). The contraction biases that 
occur in magnitude estimation and in mag- 
nitude production can be counterbalanced by 
averaging the results of the numerical judg- 
ments and gain adjustments, Since the con- 
traction biases of the two methods are by 
definition of equal size, they cancel each 
other because they are in opposite directions. 

In judgments of half magnitude, it is not 
possible to avoid the contraction bias. War- 
ren (1970, 1973) neatly avoids the bias in- 
troduced by the observer's tendency to select 
a response too close to the middle of the 
range of responses in very first judgments of 
half loudness. Warren does this by using only 
the judgments of the variable that receives 
an average judgment of exactly half the 
modulus, or number allocated to the stan- 
dard. This value is neither overestimated 
nor underestimated. But unfortunately the 
method can be used only for judgments of 
half, which is a relatively small ratio. Small 
ratios tend to be overestimated. Thus War- 
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ren's half-loudness judgments give a steep .. 
slope on a loglog plot. A 
The top of the second section of Table 1 
indicates that the local contraction bias of 
Figure 8 for magnitude judgments with small 
ranges of very high- or low-intensity stimuli 
can be avoided only by sticking to stimuli 
of intermediate intensity, as illustrated in the 
figure. But this still leaves the main con- 
traction bias, which affects all judgments of 
small stimulus ratios like half and twice 
loudness. 


Stimulus Equalizing Bias 


Figure 1B illustrates a model of the stim- 
ulus and the response equalizing biases. For 
the stimulus equalizing bias the two scales 
at the sides represent stimuli, whereas the 
scale in the middle represents responses. The 


observer uses his full range of responses, M 


whatever the size of the range of stimuli. He 
simply magnifies his response scale to fit a 
large stimulus range and shrinks his response 
scale to fit a small stimulus range (Jones & 
Woskow, 1962). Like the contraction bias, 
the stimulus equalizing bias affects the ob- 
server's very first judgment, because he al- 
ways has some idea of the range of stimuli 
to expect. Sea 


Stimulus Equalizing Bias in Rating 


. Harvey and Campbell (1963) compare two 
ranges of five weights. Columns 2 and 3 of 
Table 2 show that Weights A and B of 12.7 
and 17.9 ounces (362 g and 508 g, respec- 
tively) are numbers 1 and 5 at the ends o 
the narrow range, but occupy the intermedi- 
ate serial positions 2 and 4 of the wide range. 

The two far-right columns of the table 
give the mean category judgments of sepa- 
rate groups of 40 undergraduates using a 
5-point numerical scale. The bottom row 
shows that the difference between Weights A 
and B is 2.5 category steps when they are 
at the ends of the narrow range but only 1.7 
category steps when they occupy the inter- 
mediate positions in the wide range. Thus 
the average category rating received by a 
weight depends partly on the size of the 
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/ Table 2 
| Stimulus Equalizing Bias in Rating Weights 


Weight in ounces 


Judgment in ounces 


M category judgment 
using 5-point scale 


к Я Weight Wide range Narrow range Wide range Narrow range Wide range Narrow range 
Е 10.7 
А. 127 127 10% 114 24 2.0 
t 13.9 
15.1 bos (CHE 14.7 16.4 3.4 3.3 
3 5 16.4 3 
B 17.9 17.9 21.5 22.8 41 4.5 
21.2 
B—A 5.8 5.8 11.4 11.4 1.7% 2.5" 


* For wide-narrow range, $. < .01. 


range of weights being judged and only 

. partly on its actual weight. E 
Й Parducci and Perrett (1971, p. 436) re- 
F N Abort, а simigar effect of the size of the range 
| *.' of'stimuli, They obtain judgments of the 
у sizes of squares using nine named'categories 
l ~ instead of the five numbered categories of 
l Table 2. The average named category given 
to a particular square changes by 1.5 cate- 
% gory steps when the, position, of the square 
"is changed from the largest in the range to 


Ti third largest. $ Я 


Judgments іп Familiar Physical Units 


у 


The two middle columns of Table 2 give 
results from two more separate groups of 
40 undergraduates. They show that the stim- 
.ulus equalizing bias does not affect the judg- 
ments of physical weight. The bottom row 
\shows that the difference of 11.4 ounces 
"idi (323 g) between the judgments of Weights 
A and B is the same in both-ranges. This is 
because the response scale of physical weight 
is closely bound to the stimulus scale of 
physical weight, which- students are taught 
to use at school. Similarly, in Parducci's 
(1963) results the stimulus equalizing bias 
affects the numerical category ratings of the 
number of dots in a rectangle (Parducci, 
1963, Figure 22) but not the judgments of 
the actual number of dots (Parducci, 1963, 

. Figure 12). 7 4 
In the display in the section entitled Kinds 
g^ 4 = 


Note. Each column represents results from a separate group of 40 undergraduates (Harvey & Campbell, 1963). 


of Responses at the beginning of the article, 
the response scales are ordered by the close- 
ness of their links to the stimulus scale. First 
comes a familiar physical measure of the 
stimulus such as weight in ounces or the 
actual number of dots. These measures are 
not affected by the stimulus equalizing bias 
because the stimuli and responses are closely 
linked by well-known rules. The remaining 
kinds of responses in the display, category 
ratings and magnitude estimates, can all be 
affected by the stimulus equalizing bias. 


Stimulus Equalizing Bias in Direct 
Magnitude Estimation 


The stimulus equalizing bias can have a 
marked effect in direct magnitude estima- 
tion, The observer starts the investigation 
with what he believes to be a sensible range 
of responses. He distribytes these responses 
over the range of stimuli presented to him. 
Thus the observer who is given the smaller 
stimulus range produces the steeper slope 
when the data are presented on a loglog plot 
(Jones & Woskow, 1962). 

This is illustrated by the filled points in 
Figure 6 (results from Montgomery, 1975). 
The group of 10 psychology undergraduates 
represented by the filled triangles judge the 
loudness of the white noise covering a 25-dB 
range, from a sound pressure level of 70 dB 
to 95 dB. The separate group represented by 
the filled circles has a range of 70 dB, almost 
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three times as large, from a sound pressure 
level of 30 dB to 100 dB. The ratio of the 
slopes of the two fitted straight lines is 2:1. 

The observer's judgments are not deter- 
mined entirely by the size of the range of 
sound levels heard. If they were, the steeper 
function in Figure 6 would have almost 
three times the slope of the less steep func- 
tion, not twice the slope. But the observer's 
judgments are determined more by the size 
of the stimulus range than by the differences 
in loudness. 

There are a number of examples of the 
stimulus equalizing bias in direct magnitude 
estimation using separate groups of observers 
for each stimulus range (Poulton, 1968). 
The bias has a marked effect on dimensions 
like loudness and brightness, which students 
are not taught to handle at school. Using a 
1-kHz tone, Engen and Levy (1958, Experi- 
ment 2) compare a stimulus range extending 
from 50 dB to 75 dB above threshold, with 
a range extending from 25 dB to 75 dB. 
For separate groups of 10 observers, dou- 
bling the stimulus range reduces the average 
slope for loudness on a loglog plot like that 
of Figure 6 by a ratio of 1.6:1. 

Using white noise, Frederiksen (1975, Ex- 
periment 2) compares a stimulus Tange ex- 
tending from a sound pressure level of 42 
dB to 61 dB with a range extending from 42 
dB to 80 dB. For Separate groups of 10 
undergraduates, doubling the stimulus range 
reduces the slope by a ratio of 1.6:1. In the 
corresponding Experiment 1 for light in- 
tensity, doubling the physical range reduces 
the slope by a ratio of Mri 

The stimulus equalizing bias has a smaller 
effect on dimensions such as length and dis- 
tance, which students are taught to handle 
at School, In discussing the stimulus equaliz- 
ing bias within a sensory dimension, Teght- 
soonian (1973) gives only three sets of data, 
all his own. Two are for apparent distance 
and apparent length, where the stimulus 
equalizing bias is small as expected. 

The third set of data is for the loudness 
of a 3-kHz tone, Teghtsoonian presents 
ranges of sound pressure level centered on 
84 dB to separate groups of 16 Observers. 
Here doubling the stimulus range from 20 
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dB to 40 dB has no effect on the slope of 
the loudness function. It is not clear why 
this occurs. The instructions state that “the 
ratios of successive loudnesses might some- 
times be very large or very small" (Teght- 
soonian & Teghtsoonian, 1978, p. 307). Per- 
haps this encourages the observers with the 
large 40-dB range not to underestimate the 
ratios and so not to be influenced by the 
stimulus equalizing bias. The effect of trans- 
fer from the instructions is discussed in a 


later section of this review. Whatever the 9 


reason, Teghtsoonian (1973, Figure 4) gives 
the impression that the stimulus equalizing 
bias has little effect within a sensory dimen- 
sion. Yet the results just discussed indicate 
that this is not usually so for loudness or 
brightness. 


Size of the Stimulus Range in 
Cross-modal Comparisons 


à 


In cross-modal comparisons, the inversehg 


relationship between the exponent or slope on 
a loglog plot and the size of the stimulus 
range is first pointed out by Jones and Wos- 
kow (1966). They use the data from S. S. 
Stevens (1960, Figure 11 and Table 3), 
which show the exponents obtained for a 
number of sensory dimensions, using force of 


handgrip for the responses. When the expo-* 


nents are plotted against the reciprocals of 
the log geometric stimulus ranges, the straight 
line fitted to the points accounts for 96% 
of the variance of the points (Teghtsoonian, 
1973). 

Figure 9 shows the corresponding relation- 
ship for 21 of S. S. Stevens’s (Note 3, Table 
1) investigations using direct numerical mag- 


nitude estimation. S. S. Stevens's data e Y 
e^ 


taken from Poulton's summary (1967, Tabl 
1), but with the exponents calculated from 
the actual subjective and physical ranges 
used instead of taking S. S. Stevens's best 
fitting exponents derived from a number of 
investigations. Also, the physical ranges of 
sound and vibration are transformed into 
units of amplitude instead of power by taking 
the square root, following S. S. Stevens 
(1966) and Teghtsoonian (1971). S. S. Ste- 
Vens uses amplitude instead of power be. 
cause amplitude is more comparable with 
the measures of length that he uses for other 
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sensory dimensions. Following S. S. Stevens 

(1966), in Figure 9 the square root is also 

taken of the physical range for light, for the 

dimensions both of brightness and of the 
lightness of grays. This is because light and 
sound behave so similarly (J. C. Stevens & 

Marks, 1965; S. S. Stevens, 19552). But the 

fit is slightly better if the range for light is 

left uncorrected, as Teghtsoonian (1971) 

does. The figure follows the method of anal- 

ysis described by Teghtsoonian. The fitted 

pd straight line has a slope of 1.43 or log 27. 

= It accounts for 83% of the Variance of the 
points. 

d Instead of using the exponents of Figure 
9, in Figure 10 the ‘average geometric sub- 
jective ranges are plotted against the geo- 
metric stimulus ranges on а loglog plot. The 
figure shows that there is little relationship 

between the sizes of the subjective ranges 
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Figure 10. The virtual absence of the stimulus 
equalizing bias in S. S. Stevens's (Note 3) com- 
parisons between sensory dimensions when expo- 
nents are not used. 
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and the sizes of the stimulus ranges. The 
tau coefficient of rank correlation is only .28. 
For 21 points this is not reliable (.1 >% 
> .05). 

The overwhelming relationship illustrated 
in Figure 9 is produced by transforming the 
subjective ranges into exponents. Figure 10 
shows that the subjective ranges run from 
9 to 225, a ratio of 25 or 1.4 log units, 
whereas the stimulus ranges run from 3 to 
3,000, a ratio of 1,000 or 3 log units, which 
is over twice as large in log units. The ex- 
ponents are the ratios of the log subjective 
and log stimulus ranges (Teghtsoonian, 
1971): 

exp = (log y range) / (log range). 


Because the subjective ranges vary so much 
less than the stimulus ranges, they can be 
approximated by a constant. Using for the 
constant the slope of the line in Figure 9, 


exp = (log 27)/(log Ф range). 


The value 27 is the average subjective 
range of the point for electric shock on the 
extreme left of Figure 10, which has the 
smallest stimulus range. In Figure 9 this 
point stands by itself at the top on the 
right. Owing to its commanding position it 
exerts the greatest influence on the slope of 
the line passing through the origin and fitted 
to all the points. It insures that the slope 
will be about log 27. 


796 


S. S. Stevens's (1960) cross-modal com- 
parisons using force of handgrip for the re- 
Sponses can be accounted for in the same 
way. Here the points fit more closely to the 
line because the subjective ranges vary by 
only .4 log units, whereas the stimulus ranges 
vary by 1.6 log units, four times as much. 
S. S. Stevens's (1966, Figure 2 and Table 1) 
cross-modal matches using binaural loudness 
can also be accounted for in this way. In 
each case the slope of the fitted line, like that 
of Figure 9, is determined largely by the 
point for electric Shock, which lies by itself 
in the top right corner of the figure, as in 
Figure 9, 

Clearly Figure 10 is a more appropriate 
plot for comparing the data from different 
sensory modalities than a plot like Figure 9, 
As S. S, Stevens (1971) points out, his stim- 
ulus ranges usually extend from not far above 
threshold to near the maximum that the in- 
Vestigator can obtain or the observer will 
tolerate, Thus, on both axes of Figure 10 the 
points are reasonably representative, They 
Show how most of the observer's subjective 
Tange on each dimension is related to most 
of his stimulus range. 

Perhaps in the future those interested in 
comparing sensory magnitudes across modali- 
ties will describe the relationships as they 
are presented in Figure 10. Thus both loud- 


of handgrip and binaural loudness as the de- 


(S. S. Stevens, 
1966). The excellent fits of the lines indicate 


E. C. POULTON 


only that the stimulus ranges are very much 
larger than the subjective ranges. 
Teghtsoonian (1973) also states that vari- 
ations in the stimulus range have a marked 
effect in comparisons across modalities but 
little effect within modalities. This is the 
wrong way around. The stimulus range has 
no reliable effect across modalities (Figure 
10), unless comparisons are made between 
exponents. But the stimulus range has a 
marked effect within modalities that students 


i 


v 


are not used to handling, such as loudness 


and brightness, 


Avoiding the Stimulus Equalizing Bias 


The stimulus equalizing bias can be 
avoided only by using a range of stimuli of 
the same subjective size as the range of re- 
sponses. The filled points in Figure 6 show 
that the bias cannot be avoided in direct 


magnitude estimation, because the observer 4) 


tends to use the same range of numbers what- 
ever the size of the range of stimuli. The 
subjective size of the range of numbers is 
not constant, as it must be to avoid the bias. 

The stimulus equalizing bias is also un- 
avoidable in Category rating, but here it is 
not usually regarded as a bias. The experi- 


menter may make use of the stimulus TE. T 


ing bias to insure that all his response cate- 
Sories are used reasonably often, He anchors 


stimulus, By doing so, he insures that all his 
Categories will be used whatever the size of 


In cross-modal comparisons, the stimulus 
bias can perhaps be avoided by. 
using a stimulus dimension of the same sub- 
jective size as the response dimension, This 
is indicated in the first section of Table 1. 
A possible example is loudness matched to 
brightness. These two dimensions appear to 
have about the same sizes of both the physi- 
cal and the subjective ranges (J. C. Stevens 
& Marks, 1965; S. S, Stevens, 19552). 


Response Equalizing Bias 


Figure 1B 


í illustrates the response equaliz- 
ing bias as 


well as the stimulus equalizing 


bias. For the response equalizing bias the 
wo scales at the sides represent responses, 
while the scale in the middle represents stim- 
uli. Whatever range of responses the ob- 
server is given, he distributes the responses 
over the range of stimuli. He uses a larger 
range of responses when a larger range is 
available. 


Response Equalizing Bias in Rating 
* 


The response equalizing bias is not usually 

regarded as a bias when rating with a small 
number of categories. In rating sensory mag- 
nitudes, the investigator often encourages the 
response equalizing bias by his instruction. 
He tells his observer to use the full range 
of categories that he is given. The instruc- 
tion insures that the proportional relation- 
ships between a set of variables will hardly 
ђе affected by the exact number of categories 
^used. This applies to small sets of both 
named and numbered categories. Thus on a 
linear plot there is a nearly perfect linear 
relationship between the average named rat- 
ings on a 6-point scale and the average 
named ratings from separate groups of ob- 
servers using a 9-point scale (Parducci & 
Perrett, 1971, Figure 4). 
8 The linear relationship does not hold be- 
tween a small set of categories and 50 or 
100 categories, because the use of both single- 
digit and two-digit categories introduces the 
logarithmic bias of Figure 1F. 


Response Equalizing Bias in Very First 
) Judgments of Reflectance 


/j In direct magnitude estimation, the re- 
‘sponse equalizing bias can be demonstrated 
РА by giving a standard stimulus a modulus of 
10, Separate groups of observers judge the 

same variable against the same standard, 
using numbers ranging either between 10 and 
0 or between 10 and infinity. Poulton and 
Simmonds (1963) compare the two response 
| ranges in judgments of the reflectance of 
gray papers. They use 10 pairs of groups, 

4 each group comprising 50 students making 
| their very first magnitude judgments. Of the 
10 groups that use numbers ranging between 

| 10 and infinity, 9 have median judgments 
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that differ from the standard by at least 10, 
whereas none of the 10 groups that judge the 
same stimuli using numbers ranging between 
10 and 0 have median judgments that differ 
from the standard by as much as 10. To do 
so, the median would have to be 0. Thus, 
a larger range of numbers is used when a 
larger range is available. 


Avoiding the Response Equalizing Bias 


The response equalizing bias can be 
avoided in direct magnitude estimation by 
leaving the observer free to choose his own 
range of responses. The bias is unavoidable 
in category rating. As already indicated, in 
category rating the investigator may en- 
courage the response equalizing bias. It 
makes his proportional relationships almost 
independent of the exact number of cate- 
gories that he uses, provided he sticks to nine 
categories or less. 

In cross-modal matching, the investigator 
can avoid introducing the response equaliz- 
ing bias by using a response dimension of 
the same subjective size as the stimulus di- 
mension. This is indicated in the first section 
of Table 1. 


Bias From Transfer 


The biases that result from transfer can 
be classified by what 15 transferred: 

l. Transfer from previous stimuli: Here 
the present stimulus is judged in the context 
of previous stimuli. Examples are the center- 
ing bias and the stimulus spacing bias of 
Figures 1A and 1E. These two transfer bi- 
ases occur gradually as the observer learns 
the set of stimuli used in the investigation. 
Another example is the sequential contrac- 
tion bias, which depends on the immediately 
preceding stimulus (Jesteadt et al., 1977). 
These transfer biases have already been dis- 
cussed. In this section we consider only the 
remaining transfer biases. 

2. Transfer from previous judgments: Here 
it is not just the previous stimuli but also 
the observer's responses to them that bias 
his judgments. The present response is made 
to be consistent with the responses made pre- 
viously to previous stimuli. Examples of the 
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influence of one judgment on the next are 
the transfer between ranges of stimuli, the 
transfer between ranges of responses, and 
the transfer from interval to ratio judgments. 

3. Transfer from previous responses: Here 
the observer tends to use the same range of 
responses as he uses in a previous condition 
with different stimuli. 


Transfer Between Ranges of Stimuli 


In investigations on transfer between 
ranges of stimuli, the stimuli to be judged 
are increased in magnitude or reduced in 
magnitude. Compared with the judgments of 
a control group that does not experience the 
change, after a range of large stimuli, smaller 
stimuli are judged too small. After a range 
of small stimuli, larger stimuli are judged 
too large. 

The transfer can be explained as follows: 
The centering bias of Figure 1A affects the 
Observer's responses to the first range of 
stimuli. It leaves only the more extreme re- 
sponses available for the second range of 
stimuli, The stimuli are therefore judged 
more extreme than they should be. 

Transfer occurs even when the stimuli are 
judged in familiar physical units such as 
length in inches, provided the stimulus lines 
are flashed on a screen for only .2 sec and 
are therefore not easy to judge accurately 
(Krantz & Campbell, 1961). Transfer occurs 
when the stimuli are judged in named cate- 
gories (DiLollo, 1964; Parducci, 1954, 1956; 
Ross & DiLollo, 1968a; Tresselt, 1947). 
"Transfer also occurs when a small set of 
numbered categories is used (Campbell, 
Lewis, & Hunt, 1958; Melamed & Thurlow, 
1971; Parducci, 1963; Pollack, 1964b, Ex- 
periment 1), when a potentially large set of 
numbers is used centered on 100 (Krantz & 
Campbell, 1961) or extending up to 100 
(DiLollo & Kirkham, 1969), and when an 
infinite set of numbers is used, as in direct 
numerical magnitude estimation (Melamed & 
Thurlow, 1971; Ross & DiLollo, 1968b). 


Transfer Between Ranges of Responses 


The response equalizing bias in very first 
judgments of reflectance has just been de- 
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scribed (Poulton & Simmonds, 1963). Here 
there is marked transfer from the very first) 
judgments using finite (10 to 0) or infinite 
(10 to infinity) ranges of numbers to second 
judgments of the other kind using an infinite 
range after a finite range or a finite range 
after an infinite range. 

For the very first judgments, 8 out of the 
10 comparisons between the finite and the 
corresponding infinite ranges show that a re- 
liably larger range of numbers is used when 
the larger range is available. However, on 7 
transfer to the second judgment of the oppo- 
site kind, 7 of the 8 possible matched com- 
parisons before and after transfer are less Po 
marked after transfer (p< .05 on a two- ) 
tailed Wilcoxon test). Only 3 of the 8 com- 
parisons after transfer still show that a re- 
liably larger range of numbers is used when | 
the larger range is available (Poulton, 1968, 

Figure 2 and p. 10). Thus after a judgment 

using the other range of numbers, finite апа у 
infinite ranges of numbers give more similar 
judgments than they do when they are used 

for very first judgments. | 

This is presumably because for the very 
first judgments the students consider only a 
range of numbers extending from the modu- 
lus of 10 either to O or to infinity, whereas 1 
the second judgments of both kinds are madea\ 
in the context of a total range of numbers 
extending from 0 to infinity. The second 
judgments are therefore more similar than 
the first judgments. 


Pai S 


Transfer From Interval to Ratio Judgments 


Fagot, Eskildsen, and Stewart (1966) re- А 
port reliable transfer from а set of interval 
judgments made over 3 successive days to a: 
set of ratio judgments made on 3 subsequent 
successive days (Fagot, Note 4). For the in- 
terval or bisection judgments the observer 
controls the brightness of a circle of light 
placed between two standard circles of 580 
fL and 2 fL, respectively. For the ratio or 
fractionation judgments, the observer does 
not see the circle of 2 fL. He adjusts the 
brightness of the light that he controls to | 
half the brightness of the light of 580 fL, | 
When the ratio judgments are performed 
first, the average geometric mean for half | 
| 
| 
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brightness is 52 fL, giving a mean slope on 
» a loglog plot of .28. When the ratio judg- 
ments are performed aíter the interval judg- 
ments, half brightness falls reliably to 13 fL, 
giving a mean slope of only .18. The ratio 
of the two slopes is 1.6:1. The interval bi- 
section judgments are not appreciably af- 
fected by prior ratio judgments. The average 
geometric mean falls only from 47 to 41 fL. 
This could be due to initial differences be- 
tween the two groups of four observers who 
perform the two conditions in opposite 
orders. 

The reliable transfer can be explained 
quite simply in terms of the hypothetical 
zero used for the ratio judgments. Suppose 
an observer first gives an average bisection 
judgment of 47 fL between the standards of 
580 and 2 fL. When the standard of 2 fL is 
removed for the ratio judgments, the observer 
realizes that he has now to judge a larger 

* "range of brightnesses, extending from 580 
| fL to 0 fL. He therefore selects a half bright- 

ness less than his previous average of 47 fL. 
In contrast, the observer who starts by 
making the ratio judgments has to get on as 
best he can without having seen a specified 
-]ower bound to the range of brightnesses. 
When he is subsequently presented with a 
lower bound for his bisection judgments, he 
of course uses it. 


Transfer From the Range of Numbers 
Used in the Instructions 


In direct magnitude estimation, the nu- 
merical examples used by the investigator in 
explaining the technique can reliably affect 
the slope of the psychophysical function that 
(he obtains. In a series of investigations, G. 
H. Robinson (1976) gives alternate observers 
different numerical examples. The standard 
stimulus is always called 100. For half the 
observers the written instructions state that 
if the variable is one and a half times the 
standard, it should be called 150. If the vari- 
able is half the standard, it should be called 
50. For the other half of the observers the 
instructions state that if the variable is seven 
and a half times the standard it should be 
called 750. If the variable is one quarter of 
the standard, it should be called 25. The 
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range of numbers specifed is simply in- 
creased from 50-150 to 25-750, an increase 
in ratio from 1:3 to 1:30. 

In G. H. Robinson's (1976) Experiment 
1 for magnitude estimations of auditory pulse 
rate, the increase in the specified range of 
numbers increases the exponent from .87 to 
L3 and from .89 to 1.3 in a replication, 
ratios of about 1:1.5. In Experiment 2 for 
the loudness of a 1-kHz square wave, the 
exponent increases from .73 to 1.0 and from 
.77 to .95 in a replication, ratios of about 
1:1.3. All four increases are reliable. The 
plotted functions are concave downward like 
those at the top of Figure 3B. Each function 
for the larger range of numbers is a little 
steeper than the function for the smaller 
range. For the smaller range of numbers, 
each function is a good deal more concave 
downward at the upper end than the func- 
tions at the top of Figure 3B, as if the 
observers run out of high numbers. 

Clearly, the method of direct magnitude 
estimation is extremely sensitive to numerical 
examples given in the instructions. If an in- 
vestigator wishes to obtain unbiased data, 
he should not use numerical examples. Un- 
fortunately S. S. Stevens (1971, p. 428) 
recommends their use, suggesting numbers 
20 times and one fifth of the standard, a 
ratio of 1:100. Only 4 of Stevens's 21 aver- 
age geometric subjective ranges, illustrated in 
Figure 10, are as large or larger than this. 
So if he does use these instructions, in most 
of his investigations he is biasing his ob- 
servers into giving unduly large exponents. 
This may explain why other investigators 
often find rather smaller exponents than S. 
S. Stevens does. 

The disturbing feature is that the inves- 
tigators who now follow Stevens in giving 
numerical examples may suggest ranges of 
numbers to their observers that will produce 
the sizes of exponents they expect to find. 
Investigators would be foolish to suggest 
ranges of numbers that produce unexpected 
sizes of exponents. Yet suggesting appropri- 
ate ranges of numbers will produce a spuri- 
ous consistency between the exponents and 
the theory. Unfortunately investigators do 
not often report the exact instructions that 
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they give. In these cases the reader has no 
idea whether the investigator finds the size 
of exponent that he does find simply because 
he suggests the particular range of numbers 
to his observers. 


Transfer From the Range of Numbers 
Used in a Previous Investigation 


S. S. Stevens and his colleagues use the 
same members of the departmental staff and 
graduate students for more than one investi- 
gation (S. S. Stevens, 1959). From the evi- 
dence just reviewed on the transfer of ranges 
of numbers, it is almost certain that there 
is transfer of the range of numbers from one 
investigation to the next. This is a bias that 
S. S. Stevens and his students ignore. 

S. S. Stevens (1971, p. 428) even recom- 
mends “under some circumstances," which 
he does not specify, clarifying the nature of 
the task before starting an investigation. 
This is done by getting the observer to 
match numbers “to an easier continuum, 
such as apparent length of lines, or apparent 
size of circles" (Stevens, 1971, p. 428). Yet 
as is pointed out in an earlier review (Poul- 
ton, 1968), 


It would be too easy for the experimenter to select 
a range of lengths of lines calculated to elicit the 
set of numbers which he believed to be appropriate 
to the range of stimuli which he was subsequently 
going to present to the observer. Clearly, the ex- 
perimenter would be foolish to present lines which 
elicit a quite inappropriate set of numbers, in view 
of the expected transfer effects, The experimenter is 
thus in a predicament which is better avoided. 
(р. 4) 


Avoiding Bias From Transfer 


As the third section of Table 1 indicates, 
the only way of avoiding bias from prior 
conditions is to use separate groups of ob- 
servers for each condition, Bias from in- 
structions and demonstrations can be avoided 
by using unbiased instructions and no dem- 
onstrations, Transfer from previous stimuli, 
judgments, and responses can be avoided 
only by restricting each group of observers 
to a single judgment. 

The difficulty with using separate groups 
is the large individual differences in judging 
sensory magnitudes. S. S. Stevens (1971, 


Figure 3) reports that his individual ex- 


ponents for loudness range from .4 to 1.1, i 


a ratio of 1:2.7. To obtain a reasonably 
precise average numerical value, it is there- 
fore necessary to use large numbers of ob- 
servers. Yet S. 5. Stevens and his colleagues 
typically use only 10 or 12 observers in each 
investigation. To obtain data that are com- 
parable across investigations, they need to 
stick to the same observers or to use groups 
of observers who are known to give com- 
parable magnitude judgments because they 
have served in previous investigations. Vet 
the use of selected trained observers almost 
certainly introduces bias from transfer. 


Avoiding all Biases 


The fourth section of Table 1 indicates 
how to avoid all the biases that can be 
avoided. As already pointed out, in category 
rating it is not possible to avoid the stimulus 
and response equalizing biases. An investi- 
gator who wishes to avoid these two biases 
does not use ratings. 

In magnitude estimation the response 
equalizing bias can be avoided by leaving 
the observer free to choose his own range 
of responses. However, the only way of 
avoiding the stimulus equalizing bias is by 


the cross-modal matching of sensory dimen- St 


sions with subjectively equal-sized ranges, 
as indicated at the bottom of Table 1 on 
the right. This matches only subjective mag- 
nitudes in one sensory dimension to subjec 
tive magnitudes in another sensory dimension 
with a subjectively equal-sized range. There 
is no sure unbiased method of matching 
subjective magnitudes to numbers, because 
the observer tends to use the same range of 
numbers whatever the size of the range of 
stimuli. Thus the subjective size of the 
range of numbers varies with the size of the 
range of stimuli. 

Clearly, avoiding all possible biases pro- 
vides so little useful data for so much effort 
that not many investigators are likely to 
contemplate it. The most practical alterna- 
tive is to collect data with the minimum of 
known biases by balancing the biases against 
each other, as suggested by S. S. Stevens 
(1971, pp. 444-446). Then make checks using 
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separate groups of untrained observers to 
estimate the sizes of the residual biases. 
Finally, the data should be corrected for the 
residual biases. 

A difficulty with this alternative is that 
the effects of the residual biases may be con- 
siderably larger than the effects that the 
investigator wishes to study. Also, the biases 
may interact with the effects being studied. 
Unfortunately, at the present time most in- 
— simply collect biased data without 

ttempting to correct for the biases or even 
to measure and report them. 
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Behavioral Treatment of Children's Fears: A Review 


Anthony M. Graziano, Ina Sue DeGiovanni, and Kathleen A. Garcia 
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Behavioral literature on childhood fears, including conceptual models, normative 
research, and fear-reduction studies is reviewed. The main conclusions are as 
follows: (a) The information value of nearly 60 years of normative studies is 
meager, and their continuation is of doubtful value; (b) most research has been 
limited to laboratory studies of mildly to moderately fearful children, and few 
data exist on severe fears studied in the child’s natural environment or on the 
clinical prevalence of fear; (c) cognitive and developmental factors have been 
largely ignored; (d) modeling is the most frequently used and reliably effective 
fear-reduction strategy; (e) a cognitive, verbal-mediation approach is promis- 
ing, but is not yet sufficiently researched; (f) there is little evidence that sys- 
tematic desensitization or contingency management strategies are effective. Im- 
plications for large-scale fear reduction and prevention are discussed. The need 
for research that recognizes the complex paradigms of children's fears is 


suggested. 


This article is a selective review of be- 
havioral treatment of children's fears, an area 
relatively neglected by behavior therapists 
and researchers despite their considerable at- 
tention to adult fear reduction (Graziano, 
1975). Adults seem to minimize the importance 
of children's fears, viewing them as common 
and transitory and thus not a particularly 
serious part of normal development. But 
children's fears may not always be transient, 
and some, such as specific animal phobias 
(Jersild, 1968; Jersild & Holmes, 1935a; 
Marks & Gelder, 1966) and fear of physical 
injury or psychic stress (L. C. Miller, Barrett, 
Hampe, & Noble, 1972b) may persist as 
adult problems. Children do experience fears, 
often intense and disturbing, and the psycho- 
logical suffering of a fearful child, even if it 
remits in a few years, is at least as worthy 
of professional concern as is the suffering of 
adults. There seems good reason for urging 
more study of fear reduction in children. 

Because of this review's behavioral focus, 
the large psychoanalytic literature is not in- 
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cluded. Likewise, school phobia articles, which, 
outnumber those of any other fears by about 
25:1 (L. C. Miller, Barrett, & Hampe, 1974), 
have been well reviewed elsewhere (Gelfand, 
1978; Hersen, 1971; Kelly, 1973) and are 
only briefly discussed. 

The modern psychological study of children's 
fears is nearly 60 years old. The literature 
consists of psychoanalytic and behavioral case 
studies, normative fear surveys, controlled. 
fear-reduction experiments that are mainly 
clinical analogs, and a few theoretical articles 
and reviews. Much of the literature is psycho- 
analytic, but interestingly, the earlier (e.g., 
Jones, 1924a, 1924; Watson & Rayner, 1920) 
and the most recent articles have been 
behavioral. 

Berecz (1968) concluded that the research 
has been “disappointing,” giving “only hints"? 
of the nature and incidence of childhood fears. 
A decade later, the situation regarding nor- 
mative research appears much the same. 
A recent increase in behavioral studies of 
child fear reduction includes some well-de- 
signed research (e.g, Bandura & Menlove, 
1968; Kanfer, Karoly, & Newman, 1975; 
Kornhaber & Schroeder, 1975; Melamed & 
Siegel, 1975; Murphy & Bootzin, 1973). 
Overall, however, the bulk of research still 
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leaves us with hard-to-interpret data, results 
that are more suggestive than definitive, and 
an important area that is very much in need 
of more empirical research. 

One reason for the persistence of uncertain 
results may be a generally held assumption 
that knowledge about adult fears is sufficient 
for understanding fears of children, an as- 
sumption that may have encouraged inappro- 
priate concepts and approaches. Piaget's work 
(e.g. 1929, 1951) on cognitive development 

“teed should alert one to the dangers of considering 
children to be miniature adults. Berecz (1968) 
suggested other reasons, namely, a too in- 
clusive use of the term phobia to describe 
“such a variety of conditions" as to make 
the term meaningless and definitions of 
phobias that are heavily biased by theoretical 
conceptions. 

Ethical restraints, too, make this a difficult 
area in which to conduct research. For ob- 
ious ethical reasons children cannot be sub- 
jected to controlled presentations of fear 
stimuli in order to measure their reactions. 
Less intrusive measures, such as self-reports, 
questionnaires and observations taken in 
clinical treatment and research, must be used. 
To these constraints add the problems of 
obtaining reliable and valid personal informa- 
tion from children, and some of the difficulties 
WA о apparent. Accordingly, researchers 
have tended to tread lightly, using non- 
intrusive methods that rely heavily on retro- 
spective, subjective reports based on un- 
systematic self-observations. Vitually all of 
the normative data were drawn from one 
form or another of retrospective, subjective 
report. Frequently the data were Írom second- 
person (parents and teachers) verbal reports 
Zor were limited to observations made in 
clinical case studies. These limitations may 
largely account for the lack of hard data that 
still characterizes the field despite more than 

a half century of psychological study. 


Definitions: Fears, Phobias, and 
Clinical Problems 


Despite these vagaries, there is fairly good 
general agreement on definitions of fear and 
phobia. Fear is commonly thought of as a 
normal reaction to genuine threat that in- 
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volves at least three response systems: (a) 
overt behavioral expressions, (b) covert, sub- 
jective feelings and thoughts, and (c) physio- 
logical activity. The term phobia is commonly 
used to denote a specific fear in which at 
least one of the three elements is excessive, 
persistent, and unadaptive (Marks, 1969; 
L. C. Miller et al., 1974). Thus it is generally 
agreed that fear is a normal response to 
threatening stimuli, whereas phobia is an 
unreasonable response, often to usually benign 
or ill-defined stimuli. 

Severity ranges from the mildly fearful 
responses of many children toward darkness, 
dogs, and so on to disabling, highly intense 
levels of fear. It is reasonable to assume that 
fears presented at clinics for professional 
intervention are the most severe fears. Two 
questions are raised immediately when one 
considers clinical fears. First, how are clinical 
fears to be defined other than by virtue of 
the fact that clients show up in clinics for 
treatment? Hampe, Noble, Miller, and Barrett 
(1973) found that most children, even those 
with intense fears, overcome them with or 
without treatment within 2 years. But some 
do not, and they might appropriately be 
classed as having fears of clinical duration. 
Intensity and duration might be important 
defining characteristics, and we suggest that 
clinical fears be defined as those with a dura- 
tion of over 2 years or an intensity that is 
debilitating to the client’s routine life-style. 
The second question is, how prevalent are 
severe fears and should current research be 
focused only on severe fears or on both mild 
and clinical levels of the problem? As is 
shown in the review of controlled studies, 
most researchers treat all levels of severity 
the same, working primarily with mildly 
fearful children and generalizing to all phobic 
children. A recent review of behavioral treat- 
ment of clinical phobias by Mathews (1978) 
emphasizes the inappropriateness of general- 
izing from what are essentially analog studies 
to treatment of clinical fears. This limitation 
seems to have occurred because both mild 
and severe levels of fear are important re- 
search areas, and mildly fearful children are 
considerably easier to recruit as research sub- 
jects than are severely fearful children. How- 
ever, it seems important to identify cleatly 
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the severity of subjects’ fears and the limita- 
tions that severity places on generalization 
of the data, especially since one does not 
know whether mild and clinical fears operate 
according to identical principles. 

We know surprisingly little about childhood 
fears of high intensity, long duration, and 
disturbing content. L. C. Miller et al. (1974) 
estimated that school phobics make up 1% 
of the school-age population, whereas other 
child phobics may “run as high as 20 per cent” 
(p. 91). In his review, Marks (1969) found 
little data on the frequency with which phobic 
children are referred for psychiatric treatment. 
Graham (cited in Marks, 1969) reported five 
cases of school refusal out of 162 clinical 
referrals and only 10 specific phobias in 239 
cases referred to a children's psychiatric 
hospital. Rutter, Tizard, and Whitmore (1970) 
screened all 10- and 11-year-old children 
(N — 2,199) on the Isle of Wight and found 
only 16 children to have clinically significant 
and handicapping phobias. Their screening 
procedure omitted those with monosympto- 
matic phobias, and thus 16 is a conservative 
estimate. In the most recent attempt to assess 
clinical prevalence, Graziano and DeGiovanni 
(1979) obtained questionnaire data from 19 
behavior therapists. They reported a total 
of 547 children referred within the previous 
6 months, of which 37, or 6.895, were referred 
for treatment of clinical level fears, as defined 
above. At present only guesses can be made 
about the nature and extent of the clinical 
problem of children's fears. 


Fear Paradigms 


Although several theoretical models have 
been advanced, none adequately explains the 
etiology and maintenance of phobias. The 
major conceptual models are psychoanalytic 
and behavioral, with ethological theory and 
à cognitive-developmental model representing 
less common but potentially important ap- 
proaches. The major focus of this article is 
on the behavioral literature. 

Behavioral approaches consist of three para- 
digms, each of which assumes that fears and 
phobias are learned: respondent conditioning, 
operant conditioning and the two-factor 
theory of learning. The respondent model 


A. GRAZIANO, I. DEGIOVANNI, AND K. GARCIA 


postulates that anxiety is the central aspect 

of phobic behavior and that any neutral 4 
stimulus present at the time a fear response 
occurs may become a conditioned stimulus. 
In future presentations it will elicit the asso- 
ciated fear response. The conditioning occurs 
if the original conditioning situation was of 
high intensity or was repeated a number of 
times. The conditioned fear is presumably 
maintained in the absence of reinforcement 
because the avoidance behavior precludes the 
requirements for extinction, that is, the re- 
peated confrontation of the actually benign 
conditioned stimulus without its pairing with 
the unconditioned stimulus. A decrease in 
avoidance behavior can be achieved by re- 
placing the conditioned anxiety response with 
a response antagonistic to anxiety (e.g., re- 
laxation) that inhibits or weakens the anxiety. 
Wolpe’s (1974) systematic desensitization is 
best known of the therapy procedures based 
on this reciprocal inhibition paradigm. Тһе“ 
major problems with the respondent model 
are that it does not explain why some neutral 
stimuli seem more likely than others to be- 
come conditioned fear stimuli, and, although 
it attempts to, it does not adequately explain 
the maintenance of phobic behavior in the 
absence of reinforcement. 

Operant models hold that reinforcement, 
rather than anxiety, primarily social rein- 
forcement such as parental attention, is the 
central aspect of phobic behavior. Children 
are presumably taught to be afraid by parents 
and other significant persons who selectively, 
albeit unintentionally, attend to and reward 
fearful behavior. Therapy procedures based 
on operant models attempt to reduce children's 
phobic behavior by changing the reward struc- 
ture in their immediate environments, that is Wa 
by teaching parents to ignore phobic behavior 
and to differentially reward alternative non- 
fearful approach behaviors. The major problem 
with the operant paradigm is its failure to 
adequately account for and therapeutically 
treat the subjective feelings of intense fear or 
anxiety and their accompanying thoughts that 
are often experienced by phobic children. 

Modeling, one of the most frequent and 
promising therapeutic techniques, is based on 
social learning theory, which is an extension 
of operant conditioning. Modeling assumes 


that behavior can be changed through the 
jJ vicarious experience of rewards (either tangible 
or cognitive) for new behavior or of the 
informational value of the modeled behavior. 
For example, a child's phobic avoidance of 
dogs may be decreased by his or her vicarious 
experience of the rewards given to another 
child (the model) who approaches a dog. 
Effective rewards can be either objective, 
d such as verbal praise explicitly given to the 
model for approach behavior, or subjective 
and nonobservable, such as internal cogni- 
tions of social praise generated within the 
observing child (e.g., “That child is brave 
for approaching that dog. I will be brave for 
approaching that dog.") Although both tan- 
gible and cognitive stimuli may be equally 
effective in generating the initial performance 
of new approach behaviors, it seems likely 
that reinforcing cognitive  self-statements 
(either those generated by the child sponta- 
neously or those supplied by the therapist) 
are crucial in maintaining the new approach 
behavior. Similarly, the acquisition, as well 
as the therapeutic extinction, of phobic be- 
havior may involve cognitive self-reinforcers, 
such as *She or he got knocked down by 
playing with that dog; I will get hurt if I play 
with that dog." Although the therapeutic 
\ effects of modeling with phobic children have 
been explored (e.g., Bandura, Grusec, & 
Menlove, 1967), little attention has been 
given to examining directly the nature and 
function of cognitive variables in the acquisi- 
tion, maintenance, and extinction of children’s 
phobias. In light of the role that such cogni- 
tions seem to play in phobic behavior, this 

is a major lack of the current research. 
The two-factor theory incorporates both 
(/tespondent and operant learning concepts. 
= iFirst proposed by Mowrer (1939), it postulates 
that phobias originate according to the re- 
spondent conditioning paradigm and are 
maintained according to the operant condi- 
tioning model. After initial respondent con- 
ditioning of fear or anxiety with the con- 
ditioned stimulus, anxiety reduction associated 
with avoidance of the noxious stimulus be- 
comes a positive reward or reinforcer, since 
anxiety or fear is unpleasant. Thus, anxiety 
reduction becomes the reinforcement for 
avoiding the noxious stimulus created by 
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respondent conditioning. Problems with the 
two-factor theory have been extensively dis- 
cussed over the last several years (Bandura, 
1969; Herrnstein, 1969; Rachman, 1976, 1977, 
1978). In brief, (a) the theory assumes that 
phobias are mediated through the autonomic 
nervous system, although studies by Solomon 
and Turner (1962) and Bandura (1969) sug- 
gest that behavior is regulated in large part 
by the central nervous system; (b) it does 
not explain why people fail to acquire fears 
in what are theoretically fear-evoking situ- 
ations (e.g., air raids); (c) it fails to explain 
the “choice of symptom" issue of why some 
stimuli are more likely than others to become 
fear signals, that is, the distribution of human 
fears is not consistent with the equipotentiality 
premise of the theory; (d) it accepts the 
faulty assumption that all fears are acquired 
directly through classical conditioning, al- 
though it has been shown that operant con- 
ditioning and observational learning can also 
indirectly produce fearful behavior; (e) it 
fails to explain why active avoidance be- 
havior is so persistent and does not extinguish 
in the absence of further reconditioning or 
trauma experiences to maintain the anxiety 
associated with the conditioned fear stimulus. 
In response to some of these problems, 
Herrnstein (1969) has reported recent ex- 
periments showing that the conditioned stimu- 
lus may function as a discriminative stimulus 
for the avoidance response rather than as a 
noxious stimulus whose removal is inherently 
reinforcing, as two-factor theory requires. 
Rejection of the two-factor theory has 
significant implications for both treating and 
measuring phobias, in that it suggests the 
three major components of fear (avoidance 
behavior, subjective experience, and physio- 
logical disturbance) can covary, vary in- 
versely, or vary independently (Rachman & 
Hodgson, 1974). Routine tests of these three 
components should be carried out as a pre- 
treatment measure to determine the amount 
of synchrony/desynchrony among the three 
elements and to determine which element(s) 
does (and does not) require direct treatment. 
Different treatments may tap different ele- 
ments of a phobia more or less efficiently. 
Refutation of the two-factor theory also raises 
the important question of whether one needs 
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to treat each element directly or whethet one 
can assure patients that if one cures one 
component (e.g., avoidance behavior), the 
others will follow automatically (e.g., sub- 
jective experience). Thus, although the two- 
factor theory itself does not yield any new 
treatment techniques, the theoretical and 
empirical debates over the theory have caused 
researchers to question both the necessity for 
and.the sequencing of various therapy com- 
ponents based on respondent and operant 
paradigms. 

The major theoretical models reviewed 
briefly here are not adequate to explain 
childhood phobias. Two issues that appear 
to us to be of major importance are not even 
addressed by the existing models. These issues 
are (a) the role of cognitions in the origin, 
maintenance, and reduction of fears and (b) 
the influence of developmental issues in child- 
hood phobias. The idea that cognitions may 
influence phobic behavior is one that is 
gaining attention both on a theoretical level 
(Rachman, 1977) and on a treatment-oriented 
level aimed at fear reduction (e.g., Kanfer 
et al, 1975; Kissel, 1972; Meichenbaum & 
Turk, 1976). Developmental issues have long 
been ignored in this area, although recently 
some theorists (e.g, Bowlby, 1973; Kissel, 
1972; L. C. Miller et al, 1974) have sug- 
gested that these issues may be important. 
Both cognitive and developmental issues 
deserve further exploration, which might lead 
to a more complete theoretical model of 
childhood fears and phobias. 

There are two major categories of research 
in children's fears: normative fear survey 
research and fear-reduction studies. 


Normative Fear Survey Research 


It seems reasonable to expect that knowl- 
edge.about the normal fears of children is 
important in understanding their pathological 
fears. At least 22 papers since 1932 have 
reported normative data, but they reveal 
little clear information. One point, however, 
Às clear: The normative research has focused 
almost exclusively on the identification of 
fear stimuli by asking the question “What 
do children fear?" and then attempting to 
relate the number, type (content), or intensity 
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/ the finding that girls report a greater number 


of the fear stimuli to demographic parametel 


number of fears, and some have focused om 
the content or type of fear, but very fe 
have dealt with the intensity of fear. Despit 
many methodological problems, to be sum 
marized later, some consistencies do emerge, 


Sex Differences 


One of the most consistent findings is that 
however fear is measured, girls obtain higher 
fear scores than boys (Angelino, Dollins, & 
Mech, 1956; Bamber, 1974; Croake, 1969; 
Стоаке & Knox, 1973; Cummings, 19445 
Lapouse & Monk, 1959; Pratt, 1945; Russell, 
1967; Scherer & Nakamura, 1968; Spiegler 
& Liebert, 1970). Although no study reported 
generally higher fears for boys, three articles, 
(Maurer, 1965; L. C. Miller, Barrett, Натре | 
& Noble, 1971; Nalven, 1970) found no sex 
differences. А 

Though the number of reported fears in 
relation to sex has been studied, there аге 
uncertain data on the relationship between 
sex and either the content or the intensity 
of fears. Lapouse and Monk (1959) found 
significant sex differences in fear contentj«] 
specifically in the percentage of children who 
feared certain objects. Pratt (1945), however, 
found no sex differences in fear content. 
MacFarlane, Allen, and Honzik (1954) col- 
lected data on fear intensity, but did not 
report any analysis. Other studies (Bamber, 
1974; Russell, 1967; Scherer & Nakamura, 
1968) found that girls report a greater fear 
intensity than boys. Although sex differences; 
in content and intensity are still uncertain, 


of fears appears frequently enough and with 
no conflicting evidence of reverse findings to 
be accepted as a reliable finding. Interpreting 
the finding, however, is difficult. One cannot 
tell from these data if the girls' higher scores 
reflect greater fear reactivity or if other 
factors, such as sex role expectations, operated. 
Consistent with a general role model of feminine 
behavior, girls may be more willing tham 
boys to admit their fears. Similarly, in those 
studiés using parents’ reports, the adults may 
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incorrectly, but nevertheless reliably, attribute 
greater fear to girls than to boys. Grossberg 
and Wilson (1965) and Wilson (1967) sug- 
gested the operation of sex role factors in 
adult fear scores, whereas Scherer and Naka- 
mura (1968) and Bamber (1974) suggested 
that similar sex role factors might account 
for the data on children and adolescents. 
One would expect, too, that if sex role stereo- 
types and behavior are changing, future fear 
| surveys might find smaller and less reliable 
# sex differences. In this regard it is interesting 
that the studies reporting no differences are 
fairly recent (from 1965 on), which possibly 
reflects some contemporary changes in sex 
stereotyping. 


Age 


* Overall, there appears to be a general de- 
rease from young childhood to late adolescence 
in the percentage of children who report one 
or more specific fears (Cummings, 1944, 1946; 
MacFarlane et al., 1954) and in the simple 
number of fears reported (Angelino & Shedd, 
1953; MacFarlane et al., 1954; Nalven, 1970; 
Scherer & Nakamura, 1968). However, the 
decrease may not be in simple linear rela- 
tionship with age, as several studies have 

own a sharp increase in the number of 
reported fears around ages 9-11 (Angelino 
& Shedd, 1953; MacFarlane et al., 1954) or 
a peak at age 11 (Chazan, 1962; Morgan, 
1959). On the other hand, several studies 
have reported no significant relationships of 
fear with age (Croake, 1969; Croake & Knox, 
1973; Lapouse & Monk, 1959; Maurer, 1965; 
Russell, 1967). . 

- Some studies suggest that the ‘ype of fear 
reported is related to age (Angelino & Shedd, 
1953; Nalven, 1970; Scherer & Nakamura, 
1968). Jersild and Holmes (1935a) observed 
that in infancy children’s fears arise in re- 
sponse to occurrences in the immediate envi- 


As a child grows older, his or her range of 
fears grows wider and he or she acquires 
the ability to dwell on the past and to an- 
ticipate the future. Thus many of his or her 
fears will change to those of an anticipatory 
nature, This observation seems generally borne 


ronment (e.g., loud noises, loss of support, etc.).” 
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out by the data, as reviewed by Scherer and 
Nakamura (1968). The most consistent find- 
ings are an age-related decline in fear of 
animals (Angelino et al., 1956; Bauer, 1976; 
Lapouse & Monk, 1959; Maurer, 1965; 
Shepherd, Oppenheim, & Mitchell, 1972) and 
in fears of the dark or of imaginary creatures 
(Bauer, 1976; Holmes, 1936; Maurer, 1965; 
Shepherd et al, 1972) and an age-related 
increase in school and social fears (Angelino 
et al., 1956; Lapouse & Monk, 1959). 

L. C. Miller et al. (1972b), in contrast 
with most other studies that showed an age- 
related decline in animal fears, reported that 
fear of small animals, along with “moral” 
fears, “emerges more clearly in adult life" 
(p. 268). L. C. Miller et al. extracted three 
factors as the "principle dimensions" of 
children's fear: (a) fear of physical injury, 
(b) fear of natural and supernatural dangers, 
and (c) fear of psychic stress (e.g. social 
events, examinations, etc.). The second factor 


was found to diminish with age, whereas the 


other two factors reportedly "emerge early 


and continue through much of the life span" 
(L. C. Miller et al., 1972b, p. 268). Similar 
factors have been reported by other investi- 
gators (Russell, 1967; Scherer & Nakamura, 
1968). 

An important developmental question con- 
cerns the relationship between age and degree 
or intensity of fear. Do children of different 
ages react differently to the same type and 
intensity of fear stimuli? It is commonly 
believed that voung children have more in- 
tense or all-encompassing fear reactions than 
do older children. But to date there has been 
little investigation of this question. 

Although detailed information' on the rela- 

tionship of age and the number, kind, inten- 
sity, and factor structure of children's fears 
is not known, there is consensus that age is 
an important variable in fear reactions. 
i In summary, as children grow older, their 
fear patterns change, but not in simple linear 
relation with age. Some fear stimuli remain 
operative, others lose their value, and some 
new ones emerge. These tentative conclusions 
about fear and age are drawn from a mass 
of research data. They appear reasonable but 
hardly surprising, and they seem а, rather 
small output for so much research activity. 
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Socioeconomic Class 


As with age, socioeconomic class (SEC) 
appears to be an important variable in 
children's fears, but the details of the rela- 
tionships are not clear. Several studies have 
reported SEC differences in the type (content) 
or number of reported fears (Angelino et al., 
1956; Bamber, 1974; Jersild & Holmes, 1935a ; 
Nalven, 1970; Newstatter, 1938). The more 
reliable of the two findings is the variation 
found in content, with lower SEC children 
reporting more fears of specific events or 
others such as violence, whippings, dope 
peddlers, switchblades, drunks, money, rats, 

‚гапа cockroaches (Angelino et al, 1956; 
Nalven, 1970). In comparison, higher SEC 
children feared heights, car accidents, train 
wrecks, and large categories such as poisonous 
insects or dangerous animals. The fears of 
the lower SEC children (*ghetto" children 
in Nalven's, 1970, study) strongly suggest the 
socially determined nature of fear content 
and an immediacy and reality basis for the 
expressed fears of the lower SEC children. 
The data suggest that these children may 
perceive their immediate environments as far 
more hostile than do the higher SEC children. 
This is a hypothesis well worth testing, but 
it has not been investigated within the scope 
of the fear literature. 

Although it seems clear that fear content 
varies with SEC, it is not clear whether the 
number of reported fears also varies. Some 
studies (Angelino et al., 1956; Bamber, 1974; 
Croake, 1969; Croake & Knox, 1973; Jersild, 
Markey, & Jersild, 1933) have found a greater 
number of fears for lower than for higher 
SEC children. However, as Garcia (Note 1) 
pointed out, Angelino et al. (1956), a fre- 
quently cited study, seemed to misinterpret 
their own data. The authors concluded that 
both content and number of fears varied with 
SEC, but their own data supported only the 
difference in content. Further, Nalven (1970) 
made an observation that may have bearing 
on this issue, namely, that lower SEC children 
compared with the others tended to list 
specific fear items rather than more generic 
groupings. For example, they listed specific 
animal fears (rats and cockroaches), whereas 
higher SEC children noted groupings such as 
dangerous animals or poisonous insects. If 
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this difference in the level of abstraction of 


the concepts used by different SEC children ê 


was reliable, it may have generated a spurious 
finding of different numbers of fears between 
the groups. As Garcia suggested, studies that 
instruct children to simply list their fears are 
probably most susceptible to such an artifact. 
Finally, no studies have reported data on SEC 
differences in the intensity of fears. 


Fears and Pathological Behavior 


It seems a reasonable prediction that 
children's fears are positively related to other 
behaviors that are usually considered to be 
pathological. However, except for the sug- 
gestions of such a relationship found in 
clinical case studies, there seems to be little 
data to support this prediction. Some of the 
case studies (e.g, Tasto, 1969) have even 


pointed out that the child's severe fear was | 


a specific and isolated reaction, apparently 
not related to other behaviors. Some studies, 
however, have shown a relationship between 
fear and manifest anxiety (Scherer & Naka- 
mura, 1968) and between fear and neuroticism 
(Bamber, 1974). MacFarlane et al. (1954) 
reported that children's fears were related to 
a variety of problems such as general anxiety, 


| 
| 


| 


irritability, and timidity, all depending on (__4 


age and sex. Their correlations, however, were 
low and were not consistent across ages. 
Hampe et al. (1973) reported a general re- 
sponse to treatment over a 2-year follow-up 
period. As the primary fear reduced in time, 
so did “a host of other deviant behavior” 
(p. 451). These authors suggested that not 
only are fears related to other pathological 
behavior, but fears are related to other patho- 


logical behavior in some causative or mutually 3 


sustaining manner. 

On the other hand, L. C. Miller et al. 
(1971) found no correlation between school 
phobia and other pathological behaviors, and 
Lapouse and Monk (1959), using detailed 
individual interviews with a random sample 
of the parents of 482 children, found no 
significant correlations between fears and 
pathological behaviors such as bed-wetting, 
nightmares, nail-biting, frequent temper los- 
ses, and so on. Lapouse and Monk also 
interviewed a sizable subsample of children 
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to compare their reports with the parents" 

* reports and, again, found no significant rela- 
tionships between fear and pathological 
behavior. 

Thus there appears to be weak evidence 
that children's fears may be related to gen- 
eral conditions such as anxiety, but there is 
no clear support for the relation of fear with 
specific pathological behavior such as bed- 
wetting or tantrums. A reasonable hypothesis, 

| „пој yet explored, is that the intensity rather 


"M "than the simple number of fears may be 


positively related to behavior pathology. At 
this point, however, there is little evidence 
that childhood fears are related to other 
pathological behaviors. 


Methodological Problems in the 
Normative Studies 


For the most part methodologies have been 
fairly straightforward and simple, involving 
variations of questionnaire and interview 
methods that range from the earlier studies 
in which subjects were asked to list their 
fears to the more recent use of factor-analyzed 
rating scales. Only Jersild and Holmes (1935а) 
attempted to obtain normative data by direct 
observation methods. 

. In the earlier studies, children or parents 
were requested to "write down” ‘or “‘tell me 
the things you are afraid of” (e.g., Maurer, 
1965; Pratt, 1945) or to ‘list the fears of 
other children in your own age group" (e.g., 
Angelino et al., 1956; Nalven, 1970). Many 
studies using this list or list-and-rank method 
asked the children directly (Angelino et al., 
1956; Croake, 1969; Maurer, 1965; Nalven, 

. 1970; Pratt, 1945). Others, however, who 
. sought greater reliability or perhaps more 
ease in obtaining data used mothers as in- 
formants (e.g, Hagman, 1932; Lapouse & 
Monk, 1959). Only Lapouse and Monk 
reported any measure of concurrent validity. 
They interviewed an additional sample of 
193 children and their mothers separately, 
using different interviewers. Comparing those 
independent interviews, they found that the 
largest discrepancy concerned the mothers 
and children’s assessment of the number of 
children’s fears and worries, with “the mothers, 
in comparison with the children, underesti- 
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mating by 41 per cent" (Lapouse & Monk, 
1959, p. 812). Their results cast doubt on 
the validity of mothers' or children's self- 
reports in this and other studies. Marks and 
Gelder (1966) asked adults to recall the 
childhood onset of their fears; Abe (1972), 
in a study carried out in Japan, compared 
self-report fears of 242 mothers with their 
childhood fears as recalled and reported by 
their mothers. 

Of all the list-and-rank procedures, those 
of Lapouse and Monk (1959) appear to have 
been the most carefully executed. Using the 
Buffalo city directory, they drew a random 
urban sample and employed trained inter- 
viewers to conduct hour-and-a-half interviews 
with the mothers of 482 6-12-year-old children. 
The interviewers obtained data on the number 
and kinds of fears, on sex, age, race, and 
economic status, and on the occurrence of 
other behaviors such as bed-wetting, night- 
mares, and a variety of “tension phenomena" 
such as nail-biting. As mentioned, an ad- 
ditional sample of children and their mothers 
were also carefully interviewed to compare 
the children's and mothers' responses. 

The list-and-rank approach is characteristic 
of the studies through 1966. The studies did 
identify fear stimuli, but they share a number 
of weaknesses. Two of the shortcomings 
specific to this approach are that one cannot 
be certain that the lists of fear stimuli are 
complete, and although data are generated 
on the number and kinds of fear stimuli, 
they tell us nothing about the degree of fear- 
stimulating power of each item or the inten- 
sity of the children's fear reactions. 

Since 1967 some of the normative studies 
have been more sophisticated, but still have 
focused on the essential question, “What do 
children fear?" These later studies used rating 
scales that provided data not only on the 
number and types of fear stimuli but also 
on the relative degree of fear reactions to the 
different stimuli. In these studies, too, factor 
analyses were used to identify the presumed 
factor structure of children’s fears. 

The major approach in the rating scale 
studies is to instruct the child (Bamber, 
1974; Russell, 1967; Scherer & Nakamura, 
1968) or the parents (L. C. Miller, 1967; 
L. C. Miller et al., 1971; L. C. Miller et al., 
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1972b) to rate each of the listed fear stimuli 
for intensity of reaction. The results are then 
factor analyzed, usually with added demo- 
graphic or other behavioral measures. 

Although fear-survey rating scales provide 
more information than the earlier approaches 
and have an advantage of standardization, 
their reliability and validity are still equivocal 
(Geer, 1965; Lang & Lazovik, 1963; Mano- 
sevitz & Lanyon, 1965). For example, L. C. 
Miller et al. (1971, 1972b) reported no va- 
lidity data and only split-half reliability on 
the Louisville fear survey, perhaps the most 
often used fear rating scale for children. 
Although the split-half reliability was ac- 
ceptable (r — .80), no data were reported on 
test-retest reliability or on any form of va- 
lidity. Split-half reliability and factor anal- 
yses, the most commonly reported measures 
on fear rating scales, are measures of only 
the internal consistency of the scales. Ad- 
ditional data need to be collected on the 
stability (test-retest) and validity of fear 
rating scales before any but tentative con- 
clusions can be drawn. 

Taken together, the normative studies in- 
volve so many procedural differences and 
methodological problems that clear compari- 
sons and reliable conclusions are difficult to 
draw. In addition to the difficulties specific 
to each method already pointed out, the field 
as a whole suffers from a number of generally 
shared problems. One major limitation is that 
the data are essentially subjective, that is, 
they are usually second-person reports by 
parents and sometimes first-person reports by 
children. Further, the adult second-person 
reports are usually limited to mothers of the 
children, with whatever sex role biases and 
attributions may be operative. We do not 
know, for example, if fathers’ reports would 
generate diferent results. Perhaps it might 
be hypothesized that the sex differences would 
appear even greater, with fathers attributing 
even less fear to their sons than to their 
daughters. Whether made by parents or by 
children, the reports are based on unsystem- 
atic observations that are subject to many 
biases, and the respondents’ recall is subject 
to many distortions. Neither test-retest re- 
liability nor validity measures are routinely 
taken. 
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Although normative studies have been 
reported since 1932, their results are not 
generalizable because random sampling from 
the normal child population is seldom at- 
tempted. With very few exceptions (e.g., 
Lapouse & Monk, 1959; L. C. Miller et al., 
1971), these are not properly normative 
studies. Investigators appear to have been 
interested in limited segments of the popula- 
tion or were restricted in terms of the avail- 
ability of subjects. For example, some studies = 
(Marks & Gelder, 1966; Poznanski, 1973) 
used samples drawn from psychiatric popula- 
tions, and Maurer (1965) used children re- 
ferred to school psychologists. Abe (1972) 
obtained data from a self-selected population 
of young mothers who were patients in a free 
medical clinic in Japan ; Bamber (1974) studied 
children in Ireland; and Pratt (1945) studied 
only rural children. 

In summary, the basic subjective-report 
methods have obvious shortcomings: some * 
studies interviewed children, some parents; 
some required lists, some ratings; the ages 
of children are not comparable across studies ; 
random samplings from normal childhood 
populations were seldom taken; reliability 
and validity data were not reported. 

But the major limitation in all of this 
research is the nature of the questions that 
have typically been asked. The studies have 
been virtually limited to identifying fear 
stimuli, to asking, essentially, “What are the 
common fears of children?" and “To what 
demographic factors do they relate?" Most 
studies seem still to be trying to accomplish 
Hagman's (1932) first aim, that is, “to enu- 
merate and analyze the objects and situations 
feared” (p. 110). It seems clear that many 
factors influence the objects or situations 
chosen as fear stimuli. Some stimuli, as ' 
Maurer (1965) pointed out, seem to be 
merely the subject of the latest adventure 
story or movie to which the child was ex- 
posed; others seem to be stereotypes or 
culturally conditioned fears, and still others 
appear to have some significance for the 
child's life (e.g., Angelino et al., 1956; Maurer, 
1965). The type of identification and tracking 
of fear stimuli that has been done could 
probably be continued indefinitely. However, 
the question must be raised as to the point 
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of such an enterprise. In an area as broad 
and important as fear in children one must 
ask oneself what kind of research is worth 
doing and what kind of knowledge is worth 
having. 

The identification of fear stimuli touches 
on only a small part of what must be a com- 
plex fear process. It seems reasonable to 
assume that children's fears, like other human 
reactions, proceed through complex para- 


à «digms—from fear stimuli that vary in num- 


4 


` social environment 


ber, type, and intensity and that may be 
internal, external, or both; through emotional 
and cognitive operations within the child; 
through overt fear responses that may act 
upon and modify both the social and physical 
environment and that themselves, by means 
of feedback loops and chaining, may occasion 
variations in any part of the process. The 
processes themselves may further vary with 
developmental factors. Given this reasonable 
assumption of human complexity, it is clear 
that the normative research to date has 
ignored a great deal of it, limiting its view 
to only one part of the process, that is, iden- 
tifying fear stimuli. It is as if early researchers 
took the first reasonable steps toward learn- 
ing what it is that children typically fear, 
but then allowed the field to remain on that 


y level, the researchers generating, sharing, and 


rearranging virtually the same lists of fear 
stimuli. The research tells us little about the 
remainder of the paradigm, the nature of 
children’s fear experiences. Remaining unasked 
are questions concerning the operation of 
mediating cognitions in children’s fear ex- 
periences, the degree to which fears are self- 
or externally generated and maintained, and 
the effects of fear behavior on the child’s 
and the effects’ feedback 
influence on the child. One must investigate 
how children in their natural environments 
typically deal with fearful events and how 
their strategies vary with developmental level, 
sex, and so on. One must study the conditions 
under which natural coping processes fail and 
fear processes become debilitating, and one 
must determine the optimum conditions for 
fear-reduction intervention. 

We suggest, too, that there is à related, 
potentially important issue that has thus far 
been ignored in the literature: the recognition 
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of the possible adaptive value of childhood 
fear. Of course one has no clear indication 
that fear serves any adaptive function at all. 
But it may be that age-linked, transitory 
fears have important short-term effects on 
coping with the social environment, on learn- 
ing how to appropriately sensitize and de- 
sensitize oneself. Fear may constitute an 
important part of children’s experiences in 
learning to successfully cope with problems, 
The possible adaptive value of fear expe- 
riences in children’s development may be an 
important issue, but has been overlooked, 
clouded perhaps by an implicit, unquestioned 
value judgment that fears are only disruptive 
and hence are of negative value. 

A great variety of stimuli have been iden- 
tified as having fear-stimulus value; they 
range from specific objects (e.g. bees) to 
abstract and imaginative stimuli such as 
communism and ghosts. As pointed out by 
Berecz (1968) and L. C. Miller et-al. (1974), 
almost any event may be a potential fear 
stimulus. Thus children have access to an 
inexhaustible supply of them. The normative 
research suggests that “normal” fear stimuli 
for children are socially determined and are 
appropriate for the individual’s personal and 
social situation; for example, the elderly fear 
loneliness and physical injury, students fear 
examinations, lower SEC boys fear switch- 
blades and beatings, and young children, still 
learning the limits of reality, fear ghosts, 
witches, and darkness. The fears are appro- 
priate to age, social class and role, culture, 
and even moment in history. Thus the nor- 
mative data suggest that what is feared by 
children is largely determined by social and 
historic fashion as well as by individual ex- 
periences. Although some fears may be in- 
nately determined, it appears that children 
fear largely what they are taught to fear. 
It seems that identification of fear stimuli 
may be limited in its contribution to under- 
standing the processes of learned fears in 
children, since so many stimuli may be inter- 
changeable. Further research in this direction 
is of doubtful value, and attention to other 
parts of the complex fear paradigm would 
be more profitable. Overall, the normative 
research only suggests, and with a good 
amount of conflicting data, that the type 
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and number of fears vary with age, sex, 
and SEC. The details of the apparent rela- 
tionships, however, are not clear. There is 
conflicting evidence concerning the relation- 
ship of fears to pathological behaviors, and 
there are virtually no data on the intensity 
of fear reactions. The normative research 
reveals very little beyond the findings of 
what appear to be a few key studies (e.g., 
Jersild & Holmes, 1935a, 1935b; Lapouse 
& Monk, 1959; L. C. Miller, Barrett, Hampe, 
& Noble, 19722; L. C. Miller et al., 1972b). 
The “disappointing research" (Berecz, 1968) 
may be at least partly due to the ethical 
problems of fear research with children as 
well as to a variety of methodological problems 
that characterize the research. Because of its 
apparent fixed focus on identifyig fear 
stimuli, the research tells one little about the 
complete fear processes, 


Case Studies of Fear Reduction 


Like other aspects of the literature in this 
area, fear-reduction experimental and case 
studies have a long history. According to 
Yates (1970), most of the techniques that 
are now labeled behavior therapy were applied 
in one form or another some 50 years ago, 
almost entirely in attempts to treat children's 
fears. Watson and Rayner (1920) suggested, 
but did not investigate, four possible methods 
to remove the fear that they had conditioned 
in Little Albert. Jones's (1924b) treatment of 
Peter is an early example of the application 
of learning principles to the remediation of 
clinical problems. Jones (19242) studied 70 
children in an attempt to discover ways in 
which children's fears could be reduced or 
removed. Holmes (1936), stressing the im- 
portance of active participation by the child, 
attempted to eliminate fears of the dark and 
of heights, whereas Hagman (1932) and 
Jersild and Holmes (1935b) reviewed methods 
that parents actually used to deal with fears 
and suggested a number of more effective 
techniques. 

Unfortunately, as with the normative 
studies, the promise of the early work was 
not fulfilled. The vigorous interest in treat- 
ment of children's fears seemed to disappear, 
and except for a few case studies, there was 
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little further activity until Rachman and | 


Costello's (1961) review of the etiology and 
treatment of children's phobias. They called 
for “above all a major project to establish 
the degree and permanence of improvement 
which may be obtained by these techniques" 
(p. 104) (ie, those suggested by Jones, 
1924а, 1924b, and by Jersild & Holmes, 
1935b). This call has also gone largely un- 
answered. The amount of research in the 
behavioral treatment of children's fears since 
that time has been surprisingly meager in 
comparison with the amount of behavioral 
research on adult fears, an area with well 
over 100 outcome studies on systematic de- 
sensitization alone ‘(Wolpe, Brady, Serber, 
Agras, & Liberman, 1973). 

Most of the reports on childhood fears are 
single-case studies, with all of their attendant 
limitations. However, case studies may offer 
something that experimental studies do not, 
that is, examinations of children’s phobias at 
high levels of severity, and may give hints 
about the nature of phobias and treatment 
suggestions. 

Prior to 1960 there were only a few be- 
haviorally oriented case studies of child fear 
reduction (e.g., Jones, 1924b; Rodriguez, 
Rodriguez, & Eisenberg, 1959; Weber, 1936). 
Since 1960 at least 35 additional papers have 
appeared, which report on 112 children re- 
ferred for clinical treatment. Of these papers, 
20 were limited to school phobias and ac- 
count for 72 of the cases. The remaining 
15 case studies, accounting for 40 children, 
involved school phobias plus a variety of 
other fears. Thus, since 1960, excluding school 
Phobias, all other children’s fears averaged 
about 2 case studies per year, a strikingly 
low average compared with the rest of the 
behavior therapy field. 

Considering the variety of fears reported 
in normative studies and included in fear 
Surveys, the range of phobias that are pre- 
sented for treatment is quite small. The 
overwhelming majority of reported cases in- 
volve school phobia, followed by those in- 
volving fears of animals, noise, and bodily 
injury. Even assuming that most of these 
Clinical cases represent severe levels of fear, 
we note that both the range and the number 
of severe fears presented for treatment are 
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relatively small. Thus, the important question 
remains unanswered: Are severe fears largely 
undetected, or do they simply not occur at 
a very high rate? 

The school phobia literature has been re- 
viewed (Gelfand, 1978; Hersen, 1971; Kelly, 
1973), and this section is presented only as 
a brief overview. School phobia is more 
prevalent than other childhood phobias (L.C. 
Miller et al., 1974), a fact that may be due 
to the high cultural value and legal require- 
ments placed on school attendance. Because 
it is socially and legally less acceptable for 
a child to avoid school than to avoid dark 
places, small animals, and so on, more school 
phobia cases are likely to be referred for 
clinical help, that is, the school-phobic child 
conflicts with social convention more visibly 
and inconveniently than other fearful children. 

Previous reviews suggest that the prognosis 
for improvement is good with most types of 


zs ^ treatment, provided that school refusal has 
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not yet become a chronic pattern. Behavior 
therapy procedures for school phobia gen- 
erally are based on both classical and operant 
conditioning models. Thus, school phobia is 
seen as avoidance behavior motivated by high 
anxiety and maintained by reinforcement for 
not attending school. Four major respondent- 


\ based procedures have been reported: (a) sys- 


23 


tematic desensitization (Chapel, 1967; Laza- 
rus, 1960; Lazarus & Abramovitz, 1962; 
P. M. Miller, 1972); (b) in vivo desensitization 
(Garvey & Hegrenes, 1966; P. M. Miller, 
1972; Olsen & Coleman, 1967; Tahmisian 
& McReynolds, 1971); (c) flooding (Kennedy, 
1965); and (d) implosion (Smith & Sharpe, 
1970). Three major operant conditioning 


\ procedures were reported: (a) home-based 
У contingency management (Ayllon, Smith, & 


* Rogers, 1970; Cooper, 1973; Edlund, 1971; 
Hersen, 1970; Kennedy, 1965; Vaal, 1973); 
(b) school-based contingency management 
(Brown, Copeland, & Hall, 1974; Hersen, 
1970; Rhines, 1973; Weinberger, Leventhal, 
& Beckman, 1973), and (c) behavioral shaping 
in the clinic (Hersen, 1970; Patterson, 1965). 
Most case studies report procedures based on 
combinations of the classical and operant 
models, with emphasis on one. The choice 
of which theoretical model and therapeutic 
procedure to use seems determined by whether 
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the most pressing therapeutic need is to 
reduce the child’s anxiety with a desensitizing 
procedure or to avoid reinforcing his or her 
escape behavior. Follow-up data (ranging from 
4 weeks to 2 years) were reported for the 
majority of cases, although the data were 
limited either to assessment of continued 
school attendance or to attendance data plus 
a verbal report from parents and teachers on 
the child’s social or emotional functioning. 
Actual behavioral assessment of improvement 
after return to school was not reported for 
any case. Whether such follow-up data should 
be collected is apparently still debatable. 
Hersen (1971) has criticized behaviorally 
oriented therapists for not being more rigorous 
in obtaining data regarding academic, social, 
and emotional adjustment. However, Ayllon 
et al. (1970) stated that “school attendance 
is a legitimate if not the only relevant treat- 
ment objective for school phobia” (p. 135). 
Behavior therapy procedures for phobias 
other than school phobias are generally based 
on a classical conditioning model, that is, the 
avoidance is seen to be mediated by high 
anxiety. Two main approaches are used to 
reduce the inferred mediating anxiety: (a) 
counterconditioning procedures that pair the 
anxiety response with a stronger response 
that is also antagonistic to anxiety, such as 
muscular relaxation, and (b) extinction pro- 
cedures in which the anxiety response is 
repeatedly elicited, but presumably without 
reinforcement. Therapy for these phobias 
usually involves a combination of several 
procedures. The terminology used for the 
procedures is confusing and overlapping, but 
the various authors claim that real differences 
do exist, as summarized by the following 
four categories: (a) reciprocal inhibition 
(Bentler, 1962; Freeman, Roy, & Hemmick, 
1976; Jones, 19242; Katz, 1974; Lazarus & 
Abramovitz, 1962; Miklich, 1973; Ney, 1968; 
Tasto, 1969; Wish, Hasazi, & Jurgela, 1973), 
(b) contact desensitization (Weber, 1936), 
(c) implosion/flooding (Ollendick & Gruen, 
1972; Smith & Sharpe, 1970), and (d) in vivo 
desensitization plus guided practice (Croghan 
& Musante, 1975). In cases using reciprocal 
inhibition, the following responses antago- 
nistic to anxiety were used: feeding (Jones, 
1924a), muscle relaxation (Tasto, 1969), 
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playing (Bentler, 1962; Ney, 1968), and 
emotive imagery (Lazarus & Abramovitz, 
1962). Interestingly, playing was included as 
an integral part of several treatment proce- 
dures (Bentler, 1962; Croghan & Musante, 
1975; Ney, 1968; Weber, 1936). However, 
none of these authors stated explicitly that 
playing was used to teach the child skills of 
interaction with the feared stimulus. This 
lack of emphasis on skills training in case 
studies is interesting in light of Rimm and 
Masters's (1974) suggestion that therapists 
not place sole reliance on decreasing anxiety 
through desensitization techniques when work- 
ing with socially inexperienced young children, 
who often lack the appropriate behavioral 
skills required to cope with the feared stimulus. 
Follow-up data were reported in 10 of the 
15 recent papers. However, such data were 
rarely behavioral, making comparisons among 
cases regarding long-term effectiveness dif- 
ficult, if not impossible. Rather, follow-up 
data were obtained in verbal reports of 
parents or teachers regarding the child's main- 
tenance of his or her new approach behavior. 

In summary, we can draw the following 
tentative conclusions about the clinical treat- 
ment of childhood phobias, as reported in 
case studies. First, given the strikingly low 
number and narrow range of reported cases, 
it seems that childhood phobias may not be 
a significantly large clinical problem, except 
for school phobia. Second, excluding school 
phobia, the most frequently used therapy 
procedures are modifications of those used 
with adults, that is, systematic desensitization 
and implosion/flooding. As has been noted 
elsewhere (Graziano, 1975), serious ethical 
and humanitarian questions are raised in 
using implosion/flooding techniques on chil- 
dren, because children generally have no 
choice in whether to enter and remain in 
therapy. Furthermore, as Ullmann and Kras- 
ner (1975) have cautioned, the successful 
use of implosion techniques requires con- 
siderable clinical skill and sensitivity so that 
the highly aversive treatment experience is 
associated with the avoided object and not 
with the therapist. In contrast with other 
phobias, school phobia is again an exception 
in that its treatment procedures generally 
include the use of both classical and operant 
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techniques. Thus, this set of case studies may 
be taken as a comment on the state of the { 
field: With the exception of school phobia, 
the clinical importance and prevalence of 
childhood phobias is questionable, and the 
procedures used are generally copied from 
work with adult phobics. 

Important therapy and research questions 
raised by these case studies appear to fall 
into three main categories. First, what pro- 
cedures not generally used with adults might; 
be useful with children? As noted by Rimm 
and Masters (1974), young children often 
lack the appropriate behavioral skills required 
to cope with the feared stimulus. Thus, the 
absence of much if any overt skill training 
in the case reports seems a glaring omission. 
Such skill training might take the form of 
guided participation or modeling. Second, 
what treatment combinations are most ef-\ 


fective and efficient for different kinds of у 


phobic children? As pointed out by both © 
Berecz (1968) and Hersen (1971), more con- 
trolled research designs are needed to ade- 
quately test both the efficacy of any one 
procedure and the conditions under which it 
will be most effective, both alone and in 
combination with other procedures. Case 
studies are limited in two senses: (a) One 


has no way to know whether other attempts РЦ 


to treat a phobic child by these authors or 
other authors using similar techniques were 
unsuccessful (i.e, how unique is the success 
of this procedure?); (b) we cannot compare 
the effectiveness of two or more different 
procedures because most reports use language 
that cannot be interpreted in terms of specific 
behavioral gains (e.g. “has remained in 
School and continued to improve"). Thus, 
despite the difficulties, a clear need still exists 
for more controlled designs using experimental 1 
groups or highly systematic multiple studies. 
Finally, these case studies raise the question 
of what criteria for success should be used, 
both at treatment termination and at fol- 
low-up. Should one assess only improvement 
in approach behavior? Or should one also 
assess improvement in social and emotional 
functioning? One rationale for collecting social 
and emotional adjustment data is that these 
data might help researchers make comparisons 
among the various techniques, with the 
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ultimate aim of identifying the most rapid 
# and widely effective combinations. At any 
rate, termination and follow-up data should 
be collected and reported in terms of be- 
havioral observations as well as in terms of 
verbal reports from parents and teachers. 


Controlled Fear-Reduction Studies 


At least 28 controlled fear-reduction studies 


sm Vith children have been published, and there 


» 
land, 1975; Vernon & Bailey, 


— +1969; O'Connor, 
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are several additional unpublished papers such 
as those reported in Gelfand’s (1978) review. 
Although far more rigorous and systematic 
than the clinical case studies, the controlled 
studies share several common limitations. 
First, with few exceptions (e.g, І. C. Miller 
et al, 19724; Obler & Terwilliger, 1970), 
they study what appear to be mild to mod- 
erate levels of fear, and the relevance of their 
\ findings for treating children with severe fears 
"or phobias has yet to be demonstrated. 
Second, as with the case studies, the range 
of fears included has been limited: 9 studies 
focused on dental/medical fears (Adelson & 
Goldfried, 1970; Adelson, Liebert, Poulos, 
& Herskovitz, 1972; Johnson & Machen, 1973; 
Machen & Johnson, 1974; Melamed, Hawes, 
Heiby, & Glick, 1975; Melamed & Siegel, 1975; 
Melamed, Weinstein, Hawes, & Katin-Bor- 
1974; White 
& Davis, 1974). Animal fears were discussed 
in 7 studies (Bandura et al., 1967; Bandura 
& Menlove, 1968; Hill, Liebert, & Mott, 
1968; Kornhaber & Schroeder, 1975; Murphy 
& Bootzin, 1973; Obler & Terwilliger, 1970; 
Ritter, 1968). Fears of social interaction were 
discussed in 5 studies (Evers & Schwarz, 
| 1973; Keller & Carlson, 1974; O'Connor, 
1972; Walker & Hops, 1973). 
' Three studies focused on fears of the dark 
(Kanfer et al, 1975; Kelley, 1976; Leiten- 
berg & Callahan, 1973). Two discussed test 
anxiety (Mann, 1972; Mann & Rosenthal, 
1969), and 1 studied fear of water (Lewis, 
1974). L. C. Miller et al.’s (19722) study 
included a variety of fears, such as fear of 
school, the dark, heights, germs, nakedness, 
and storms. However, if one deletes school 
phobics, the number of children in 5c. 
Miller et al. who presented fears other than 
those typically studied totals only 11. 
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Another general limitation is that most 
studies use a number of different techniques 
in complex treatment packages, and it is 
usually not possible to determine the most 
effective components of the packages. 

The profusion of concepts and methods 
makes it difficult to organize these studies 
for discussion. They present a variety of 
terms, including desensitization, extinction, 
modeling, reciprocal inhibition, and so on. 
It is frequently unclear whether a term, sys- 
tematic desensitization, for example, is used 
to denote a treatment procedure and is thus 
part of the independent variable or whether 
it refers to implicit processes of fear reduction 
within the child and is thus part of the de- 
pendent variable. Likewise, apparently similar 
processes or procedures are labeled differently 
by different authors, such as Ritter’s (1968) 
“contact desensitization” versus Lewis’s (1974) 
“participant modeling.” 

Most authors make their procedures fairly 
explicit, but only imply their models of fear- 
reduction processes. However, it appears that 
all of these authors assumed a disinhibition 
model, that is, the operation of some process 
of gradual weakening of the fear response in 
the course of graduated exposure to the fear 
stimuli. The graduated exposure may occur 
actually, imaginally, or vicariously. 

The methods used to operationalize gradual 
exposure fall into three main groups: (a) mod- 
eling, (b) multivariable, systematic desensi- 
tization or contingency management treatment 
packages, and (c) a cognitive approach using 
verbal-mediation strategies. By far, modeling 
is the most frequently used procedure, ac- 
counting for 20 of the 28 controlled studies. 
Many of the remaining 8 papers seem to 
have included some forms of modeling, but 
did not label them as such. Modeling may be 
implicit in any of the fear-reduction proce- 
dures except, perhaps, for imaginal systematic 
desensitization. 


Modeling 


Modeling is largely a development of the 
last decade; it was stimulated by Bandura’s 
research (e.g., Bandura et al., 1967; Bandura 
& Menlove, 1968), although precursors of 
modeling are clearly seen in Jones (1924a). 
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At least 20 modeling studies of fear reduction 
in children have been published in the past 
10 years. Although there are clear similarities 
among them, they fall into two fairly distinct 
groups with a number of important differences. 
Of these studies, 11, including Bandura's 
research, dealt with fairly common fears that 
occur frequently, even daily, for many children 
in the course of everyday events (ie, fear 
of animals, the dark, water in baths, school 
examinations, and social interactions). In most 
instances the fears are clearly unreasonable, 
in that there seems to be nothing objectively 
harmful in the immediate situation. The 
other 9 studies form a distinct group of 
recent research published in the 1970s. They 
focused on the specific problems of reducing 
children's fears and preparing them for dental 
and surgical procedures. Unlike those in the 
former group, these fears are highly situation- 
specific and include often intense, but tran- 
sitory, anxiety; they occur only at those few 
times when children are faced with dental 
or surgical procedures and thus, unlike many 
other fears, are not frequent or everyday 
issues. Also, the fears in this group appear 
more reasonable or reality based than, for 
example, fears of the dark. It does not appear 
to us to be at all unreasonable for a child 
to struggle against a dentist or surgeon who 
is going to do unexplained things that, as far 
as the child knows, will probably hurt a great 
deal. In fact, we would argue that such 
self-defensive assertion on the part of a 
frightened child who is psychologically un- 
prepared for the coming medical treatment 
is preferable to quiet, but totally frightened, 
submission. This group of modeling studies 
focused on the reduction of such fears in 
these specific situations. They may thus be 
of immense potential importance for untold 
numbers of children. 

One of the studies in this group (see the 
listing above) is Melamed and Siegel’s (1975) 
excellent research using symbolic modeling 
procedures to prepare children psychologically 
for surgery. The experimenters used a 16- 
minute film of an initially fearful child who 
gradually copes with the situation and over- 
comes his or her own fears. They reported 
that the children who viewed the modeling 
film, compared with control group children, 
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showed significantly less “transitory, situ- 


ational anxiety" on all measures (sweat gland ^ 


activity, self-reported medical concerns, and 
overt, anxiety-related behavior). Their data 
also suggested that experimental group chil- 
dren had fewer postoperative behavior prob- 
lems than did the control group children. 
These 9 studies give good evidence that 
brief symbolic modeling may be an effective 
aid in preparing children for dental and 


medical treatment, thus reducing their pos-~ 


sibly high situational fears. The studies are 
limited to some degree in that they studied 
groups of normal children and did not in- 
clude dental or medical phobics per se, and 
thus one cannot generalize these results to 
such chronically fearful children. Further, 
there is little evidence of the long-term ef- 
fects of such modeling and yet no differential 
study of the effect of variables such as cog- 


nitive rehearsal, model and subject similarity, : 


age, and so on. But all of those limitations 
can be avoided in future research. 

In all of these studies the strategy was 
a preventive one; that is, prior to the occur- 
rence of the stressful situation, children were 
prepared so they could more effectively cope 
with the stress when it did occur. There are 
important implications here for preventive 
and "stress inoculation" approaches, as is 
briefly discussed later. 

Overall, the quality of these 20 modeling 
studies is high, but generalization are limited 
for a number of reasons. First, none of the 
Studies typically included severely fearful 
children, although Bandura and Menlove 
(1968) did describe fairly high levels of dog 
fear in some children, and many of the sub- 
jects in the dental/medical fear-reduction 
research seemed quite fearful. Overall, the 
children were not selected for severe or phobic 
levels of intensity or duration of fears. Al- 
though modeling appears to be quite ef- 
fective with mild to moderate fears, its 
usefulness in reducing phobias or severe fears 
has not yet been shown. In order for modeling 
to be effective, the child must attend to the 
model. Children who are highly fearful may 
find watching any interaction with the phobic 
stimulus so aversive that they look away, 
thus avoiding the fear stimulus. Bandura 
and Menlove (1968) commented on encoun- 
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* ‘tering a similar difficulty. Thus, successful 


CLA 


modeling with severely fearful or phobic 

children has yet to be demonstrated. 

All of the studies used behavioral avoidance 
tests or behavioral observations in their pre- 
treatment assessments. These procedures sepa- 
rate children by fear levels in comparison 
with other children in a particular study, but 
do not assess the clinical significance, per se, 
of the fears. Further, although the subjects’ 

‘age range was wide (3-13 years), selective 
factors may have operated, for example, the 
use of children from special university nursery 
schools (e.g., the Bandura studies) ог from 
parochial schools (e.g., Kornhaber & Schroeder, 

1975) or the use of only black children (e.g., 

Lewis, 1974) or only poor center city children 

(Melamed, Hawes, Heiby, & Glick, 1975). 

Thus, the subjects in the experimental studies 
sof modeling are not representative of either 

a clinical population or the normal population 

of children. 

As noted earlier, the range of fears to 
which modeling has been applied is limited, 
with nine studies focusing on fears of im- 
pending dental/medical treatment, six on 
animal fears, four on social isolation, and one 
on fear of water. Thus one must be cautious 
about generalizing from these modeling studies 

» either to the general or clinical populations 
or to different fear stimuli. 

The basic modeling approach common to 
these studies consists of fearful subjects' first 
observing models who demonstrate approach 
to the fearful stimuli. The subjects then at- 
tempt to perform the approach behavior 
themselves. This basic method has been 
varied across several dimensions: The mod- 
eling has been symbolic, as on videotapes 
(eg., Bandura & Menlove, 1968; O'Connor, 
1972), or presented by live models (Bandura 
et al, 1967; White & Davis, 1974) ; subjects 
have observed a single model (Ritter, 1968) 
or multiple models (Bandura & Menlove, 
1968); modeling alone has been compared 
with modeling plus active, graduated contact 
with models and the fear stimulus (Lewis, 
1974; Murphy & Bootzin, 1973; Ritter, 1968) ; 
the models have approached single stimuli 
(e.g., Kornhaber & Schroeder, 1975) or gradu- 
ated variations of the fear stimulus, such as 
. Bandura and Menlove's (1968) use of a 
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variety of dogs; and the similarity of models 
to subjects has been explored (Kornhaber & 
Schroeder, 1975). 

Not all of the meaningful combinations of 
these and other variables have yet been 
examined, nor have sufficient replications of 
successful studies been carried out. However, 
some clear consistencies and at least tentative 
conclusions have emerged. First, therapeutic 
modeling procedures are in general signifi- 
cantly more effective than control conditions 
and can override pretreatment differences in 
“predisposition to emotionality” (Bandura & 
Menlove, 1968) and in model’s similarity to 
the subject (Kornhaber & Schroeder, 1975). 
Further, modeling procedures seem effective 
over a fairly wide age range. 

Second, the research suggests that modeling 
procedures become more powerful as addi- 
tional controlled components are added to 
the basic techniques. Thus it seems likely 
that a highly effective modeling package would 
include multiple live models who approach 
a hierarchical range of varied fear stimuli, 
followed by trials of active contact with 
models in graduated sequences of progres- 
sively bolder contact with the graduated fear 
stimuli (Murphy & Bootzin, 1973; Ritter, 
1968), that is, a modeling plus contact- 
desensitization package. There is considerable 
agreement that live modeling combined with 
progressive contact with models and fear 
stimuli is potentially a very powerful tech- 
nique. To date, however, the complete mod- 
eling package outlined here has not been 
tested either against control or against other 
modeling procedures. 

Third, although live modeling appears to 
be more effective than symbolic modeling, 
the latter may be equally effective (Bandura 
& Menlove, 1968) if it involves many trials 
and includes multiple models and progres- 
sively varied fear stimuli. In other words, 
a symbolic modeling package similar to that 
described above for live modeling may also 
be a powerful fear-reduction technique. The 
significance of this lies in the potential use 
of symbolic modeling procedures in large-scale 
fear-reduction projects, in which direct con- 
tact with large numbers of children is not 
feasible. Thus, to the degree that children's 
fears may be a large-scale problem, symbolic 
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modeling may offer potentially useful re- 
mediation, As of now, the effectiveness of 
such a program presented, for example, in 
educational television broadcasts, has not been 
tested and remains only a possibility. 

Fourth, symbolic modeling has important 
implications for immunological approaches 
(egu stress inoculation ; Meichenbaum & Turk, 
1976). Although Bandura and Menlove (1968) 
pointed out the possibility of such preventive 
Strategies as early as 1968, only recently have 
such approaches been applied to children's 
fears. Poser and King (1975) reported on a 
series of their experiments on the reduction 
of avoidance. In every case, the avoidance 
inhibitor was brought to bear before rather 
than after avoidance responding had occurred. 
One experiment made use of symbolic mod- 
cling to inhibit the acquisition of snake 
avoidance responses in first con- 
fronted with a live snake for the first time. 
These approaches much more atten- 
tion. For example, as is suggested by studies 
such as O'Connor's (1972), might it not be 
possible to “immunize” entering kindergarten 
or nursey school children against early school 
fears? The symbolic modeling studies on 
reducing dental/medical fears also have clear 
implications for fear prevention. Large-scale 
televised symbolic eddie is a potential 

Mrategy that has important com- 
munity mental health implications. It is an 
area well worth careful attention, 
no some — of model and sub- 

t similarity ma t. Kornhaber 
and Schroeder 


although neither attitudinal 
similarity was an effective 


a Ком, 
1971; Hicks, 1965; 
Wheeler & 


model 
similarity in child fear reduction. Clearly this 
issue -the relationship between model) and 
subject similarity and the reduction of chil- 
dren's fears—is an area im which more re- 
search is needed. 
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Sixth, Kornhaber and Schroeder (1975) 
and Melamed and Siegel (1975) also touched 
upon a related and possibly important issue, 
that is, the relative effectiveness of models! 
portraying a mastery versus a coping strategy. 
in fear reduction. Several authors (Blanc 
1970; Rachman, 1972; Ross, 1970) have sug- 
gested that modeling effectiveness might be 
increased if models demonstrated initially 
fearful behavior that gradually changed to 
fearless interaction with the stimulus. Hilly 
et al. (1968), Melamed and Siegel (1975), 
Meichenbaum (1971), and Spiegler, Liebert, 
MeMains, and Fernandez (1969) all had their 
models exhibit some fear during the initial 
approach. Using adult subjects, Meichenbaum 
(1971) directly compared coping (initial dise 
play of fear) and mastery (no fear modeled) 
strategies. Meichenbaum found greater fear- 
reduction effectiveness with the coping strate 
egy than with the mastery strategy. However, |, 
in all three studies there was sufficient con- ^ 
founding of variables to make their results 
uncertain, In the Meichenbaum study, for 
example, the coping models—but not the 
mastery models—also demonstrated a breath- 
control relaxation technique that may have 
accounted for many of the differences, 

Although Kornhaber and Schroeder (1975) 
compared a coping and a mastery modeling’ | 
procedure, their coping models displayed fear 
throughout the modeling and did not demon- 
Strate а progressive reduction of fearful ђе 
havior that matched the demonstration of 
increased approach. Their coping model was 
thus incomplete. Finally, the coping model 
Procedure used by Melamed, Hawes, Heiby, 
and Glick (1975) and Melamed and Siegel 
(1975) was part of the modeling package 
applied to all experimental subjects and was , 
not compared with a mastery model condi- 
tion, thus making it impossible to assess the 
eflects that can be attributed to coping 
strategies per se. To date no researchers have 
systematically investigated and carefully com- 
Pared coping versus mastery modeling pro- 
cedures in child fear reduction. This is an 
obvious area in need of research. 

Seventh, although modeling procedures ap- 
pear to be of a decidedly social nature, they 
seem to have rarely been used to reduce 
basically social fears of children such as; 
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used in the 1960s (Graziano, 1975), modeling 
fearful children has repeatedly demon- 
strated its success in modifying target ђе 
havior; but there are yet few data on either 
the jon or the maintenance of the 
new behavior, Further research on these two 
issues is necessary. 


Ё 


examinations of the effects of several variables 
subject similarity, coping 
tegies, single and multiple 
modeling, etc.). Three of the more important 
questions to be answered concern (a) the 
intensity of fears for which modeling might 
(b) the potential use of large- 
scale symbolic modeling to reduce and, perhaps 


= 


investigation of the degree to which 
tive factors such as covert competency 


Seven studies describing complex treatment 
are included here. Two studies 
(L. C. Miller et al, 19723; 
williger, 1970) included children with severe 
fears. Two other studies (Mann, 1972, Mann 
1969) studied adolescents. re- 
counselors for test anxiety 
be considered at or near 
Obler and Terwilliger used sys- 
tiation to pictures of fear 
stimuli followed by up to 80 hours of in vivo 
desensitization to reduce the severe 
monophobias (of dogs or buses) of 3)" neuro- 
logically impaired” children. 

L. C. Miller et al. (19724) compared re- 
diproal inhibition, psychotherapy, and a 
walting-list control condition as treatments 
in phobic children. The reciprocal inhibition 
treatment condition was actually a broad 
array of behavioral approaches, including 
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meetings with parents, restructuring home 
contingencies, assertive training, relaxation 
and systematic desensitization, and discus- 
sions of problems and progress. The psycho- 
therapy treatments focused on inner experi- 
ence and encouraged the child to talk about 
his or her feelings and conflicts. There seems 
to have been considerable overlap in these two 
treatment conditions: For example, discus- 
Sions with parents and altering home condi- 
tions are similar strategies; and denying the 
child in the reciprocal inhibition group free 
access to television at home ("restructuring 
contingency schedules") to decrease the rein- 
forcement for fear behaviors is similar to the 
strategy of "removing gratifications" so as 
to reduce "secondary gains.” 

"Their results are complex and difficult to 
summarize. Overall, neither treatment was 
more effective than the waiting-list control 
treatment. However, when age is considered, 
the reciprocal inhibition and psychotherapy 
treatment conditions were equally more ef- 
fective than the control conditions for children 
10 years of age and younger, whereas those 
11-15 years old did not seem to respond to 
either treatment condition. In a 2-year 
follow-up (Hampe et al., 1973) the investiga- 
tors found major effects of age and elapsed 
time. The younger children had “changed 
dramatically” immediately following treat- 
ment and maintained their improvement. 
Older children showed a more gradual im- 
provement over the 2-year follow-up, as did 
control children. The researchers concluded 
that age and time are critical factors in 
child fear reduction. Younger children tend 
to lose their fears much more rapidly than do 
older children; in time (up to 2 years), 
children generally lose their fears with or 
without treatment; nevertheless, treatment, 
behavioral or psychodynamic, is justified 
because it “greatly hastens recovery” (p. 451). 

Both studies, particularly Obler and Ter- 
williger's, employed complex treatment pack- 
ages. Although both research groups labeled 
their major procedures systematic desensitiza- 
tion (Obler & Terwilliger, 1970) or reciprocal 
inhibition (L. C. Miller et al., 1972a), that is, 
some variation of Wolpe’s (1974) systematic 
desensitization, they both also included a 
variety of contingency management tech- 
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niques, including shaping and probably some 
modeling. Neither study clearly specified the; 
roles of the therapists, and it is difficult to 
know just what occurred in the various treat- 
ment groups. Further, both studies used 
parents’ reports as the major dependent 
variable. In: fact, Obler and Terwilliger based 
their analysis on a single item from a 10-item 
parents’ questionnaire. As critiqued by Begel- 
man and Hersen (1971), Obler and Ter- 
williger’s study has methodological problems- 
that make its positive findings highly ques- 
tionable. L. C. Miller et al. (1972a) and 
Hampe et al. (1973) are useful studies, in 
spite of the difficulties noted above. They 
present important findings concerning fear 
reduction as a function of age and time, and 
they include a long-term (2-year) follow-up, 
a rare occurrence in behavioral research. In 
essence these three studies together fail to 
support the differential effectiveness of com- 
plex systematic desensitization treatment 
packages for either severe or more moderate 
fears. From L. C. Miller et al. and Hampe 
et al., the safest conclusion is that either 
behavioral or psychodynamic treatment is 
helpful in the short term for younger children, 
who in any event apparently would overcome 
their fears without treatment in the long 
(2 years) term. А 

The articles by Leitenberg and Callahan 
(1973) and Kelley (1976) are clinical analogs 
focused on nursery school and kindergarten 
children selected for their fear of the dark. 
Both studies were well designed and used 
control groups, duration of dark tolerance as 
pre- and posttreatment behavioral measures, 
and visual “fear thermometers” for subjective 
reports by or feedback to the subjects. 
Neither study employed a follow-up, and no ~ 
attempt was made to assess the clinical 
Severity of the fears, leaving the possibility 
that the subjects were only mildly to mod- 
erately fearful. 

Leitenberg and Callahan used a “rein- 
forcement practice" method in which children 
were instructed to remain in a dark room for 
longer periods of time over five trials per | 
session and two practice sessions per week | 
for a maximum of 4 weeks or until the child 
reached a criterion of two. consecutive 
S-minute dark-tolerance trials. For each suc- | 
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cessful trial the children were reinforced with 
praise and prizes. The authors reported a 
significant treatment and control group dif- 
ference on posttraining dark-tolerance tests. 
However, there were only seven children in 
each group, and the mean posttraining dark 
tolerance for the experimental group was only 
about 3 minutes. Although statistically sig- 
nificant, does a 3-minute dark tolerance have 
any personal or clinical significance? 

% Kelley (1976) assigned 40 children who 
feared the dark to three desensitization treat- 
ment groups, one play placebo" control, and 
one no-treatment control group and reported 
no differences between treatment and controls 
or among the three treatment conditions. The 
most important finding of this study is the 
significant effect of a simple demand manipu- 
lation, that is, verbal instructions to remain 
longer in the dark room dramatically increased 


dark tolerance and was “а far more powerful 


‘influence on both behavioral and self report 
change scores than three sessions of therapy” 
(Kelley, 1976, p. 80). In light of Kelley's 
findings and their implications, we suggest 
that Leitenberg and Callahan's “significant” 
reinforced practice effects may have been due 
simply to their verbal instructions to the 
experimental group only to remain in the 

rk room for a longer time on each sub- 


S= sequent trial. Kelley’s finding is particularly 


=> 


important because it seems reasonable to 
hypothesize that demand characteristics are 
especially salient to children who, because of 
their immature status, are often in the posi- 
tion of having to comply with the outright 
and sometimes subtle demands of adults. 
Certainly these issues need to be explored. 
\The critical research has not yet been done, 
"but it may be that dark-tolerance perform- 
‘ance of dark-fearful children can be signifi- 
cantly improved by instruction alone. But 
would this manipulation, perhaps over re- 
peated trials, change other aspects of fear 
such as physiological and cognitive responses, 
and would those changes hold up over time? 
A potentially important area in need of 
research is the effects on children's fear be- 
havior of adults' instructions. 

In two studies, Mann and Rosenthal (1969) 
and Mann (1972) compared direct and vi- 
carious desensitization procedures in the re- 
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duction of adolescents’ text anxiety. Like 
subjects in Obler and Terwilliger (1970) and 
some of those in L. C. Miller et al. (1972a), 
these adolescents, referred by a school coun- 
selor for high test anxiety, appear to have 
been severely fearful. Their major finding 
was systematic desensitization, whether ap- 
plied individually or in groups, whether 
actually or vicariously experienced, was sig- 
nificantly more effective in reducing test 
anxiety than the waiting-list control condi- 
tion. The control subjects, when treated later, 
also improved. 

Most interesting about Mann’s studies is 
that the most effective procedure was a vi- 
carious desensitization condition, which actu- 
ally sounds very much like a symbolic mod- 
eling procedure in which subjects observe a 
videotaped peer model undergoing systematic 
desensitization and gaining control over the 
fear. The subjects in the vicarious condition 
showed greater improvement than those who 
actually experienced direct systematic desen- 
sitization! In Mann’s work, then, although 
called a desensitization procedure, the sym- 
bolic modeling approach was the most ef- 
fective with apparently highly fearful subjects. 
In effect, these two studies of desensitization 
support the effectiveness of symbolic modeling. 

In three modeling studies of shy or with- 
drawn children, investigators tested whether 
the addition of direct shaping procedures 
added any power to the basic symbolic 
modeling procedure. Evers and Schwarz (1973) 
found that it did not. O’Connor (1972) 
reported that although shaping (with praise 
and attention) was as effective as symbolic 
modeling, the new interactive behavior of the 
shaping group soon decayed, whereas mod- 
eling group subjects maintained their gains. 
Somewhat similar findings were reported by 
Walker and Hops (1973). In their study, the 
addition of contingency management (tokens 
for increased classroom interaction) following 
the symbolic modeling film reliably increased 
social behavior beyond that produced by 
modeling alone. However, as Gelfand (1978) 
pointed out, when tokens were withdrawn, 
the new behavior degenerated. Apparently, 
“ natural” reinforcers in the classroom did not 
maintain it. Perhaps a more gradual thinning 
of reinforcements or the use of more social 
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reinforcements would have produced a more 
stable level of interaction. The point here is 
that the success of contingency management 
techniques in reducing withdrawn or shy 
social behavior is still uncertain. 

In summary, seven studies have examined 
complex desensitization or contingency man- 
agement treatment packages in the reduction 
of children's fears, and several other studies 
have included contingency management рго- 
cedures in addition to basic symbolic modeling 
approaches. Overall their findings are equi- 
vocal. One study (L. C. Miller et al., 1972a) 
found no overall beneficial effects of sys- 
tematic desensitization compared with other 
treatments and with nontreated control con- 
ditions. Obler and Terwilliger's (1970) study 
has a number of methodological problems 
that cast doubt on their conclusions. Kelley 
(1976) reported no effects of desensitization 
treatment compared with two control treat- 
ments. Leitenberg and Callahan's (1973) 
results are based on a very small number of 
subjects and, as discussed earlier, are open 
to the interpretation that treatment differ- 
ences may have been due simply to direct 
instructions. Mann and Rosenthal (1969) and 
Mann (1972) reported good results with a 
variety of systematic desensitization treat- 
ments, but, on close inspection, their most 
successful treatment appears to have been 
symbolic modeling rather than Systematic 
desensitization, Finally, three of the modeling 
studies (Evers & Schwarz, 1973; O'Connor, 
1972; Walker & Hops, 1973) found that the 
addition of contingency management did not 
usualy improve performance beyond that 
achieved by symbolic modeling alone, and if 
it „did, the new performance was not 
maintained. 

Unlike in the modeling literature, there 
exists no convincing evidence that approaches 
developed on respondent-based systematic de- 
sensitization or operant contingency manage- 
ment paradigms are effective methodologies 
for reducing children’s fears. 


Cognitive Approaches 


Only one published study (Kanfer et al., 
1975) used a cognitive self-control approach 
in child fear reduction. Children 5 and 6 years 
old rehearsed one of three verbal-mediation 
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responses: (a) sentences emphasizing 
child's control or competence (e.g., “I am a; 
brave girl (boy). I can take care of myself,")3 
(b) sentences aimed at reducing the fear- 
stimulus value of the dark (e.g., “The dark 
is a fun place to be"); or (c) neutral sen- 
tences (e.g, “Mary had a little lamb"). In 
dark-tolerance posttests the “competence” 
group significantly outperformed the others. 
These children were not representative of a 
normal population, and they appeared to be 
only mildly or moderately fearful; we thus 
cannot generalize their results to clinical 
populations. The laboratory dark-tolerance 
test may have had no relationship to real 
fears in the children’s natural environments, 
and no behavioral maintenance data were 
provided. However, despite some shortcomings 
it was a well-executed study with strong 
results. The operation of verbal self-control 
processes was suggested by the authors, / 
although they noted that their study did not 
specify the details of verbal control 
mechanisms. 

Other hints of the possible effectiveness of 
verbal mediation are found in case studies 
(Ayer, 1973; Lazarus & Abramovitz, 1962), 
in which children rehearsed highly competent, 
although very imaginative, strategies for 
coping with fear stimuli. The modeling rex | 
search by Jakibchuk and Smeriglio (1976) 
suggests the necessity of a first-person self- 
speech narrative as an accompaniment to a 
symbolic modeling film. One possible inter- 
pretation of these various hints is that the 
children's own verbal  self-instructions— 
whether originally received from a film, from 
specific sentences given to the subjects to 
rehearse, or from fanciful verbal and pictorial 
images—on how to successfully deal with the~ 
fear stimulus are a central component of the 
successful interventions. Although there are 
yet few appropriate data on the success of 
verbal self-controlling manipulations in child 
fear reduction, certainly enough data and 
hints are available to alert one that more 
research in this direction is indicated. 


| 


Summary of Controlled Studies | 


In summary, experimental fear-reduction 
research with children has been meager until | 


the past 10 years, during which at least | 
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28 studies have been published. Because of 
limited sampling procedures, their results 
cannot be generalized to either normal or 
clinical populations of children. As for the 
latter group, we have virtually no data and 
know surprisingly little about reducing severe 
phobic levels of children's fears. It has yet 
to be demonstrated whether the experimental 
fear-reduction findings are useful in clinical 
practice. 

X Only a limited number of different fears 
hàve been studied: fear of animals, fear of 
bodily injury (e.g., fear of dental treatment), 
fear of darkness, test anxiety, and social 
isolation. The levels of fear intensity have 
also been limited, with only 4 of the 28 studies 
including children with more than mild-to- 
moderate fears. Only 2 of the studies con- 
sidered pretreatment fear duration, as a mea- 
sure of seriousness. With the exception of 
dental/medical fears we have virtually no 
data on fear reduction outside of the labo- 
ratory setting and thus do not know if sta- 
tistically significant changes brought about 
in the laboratory are related to fear behavior 
in the child's natural environment or are of 
sufficient magnitude to be psychologically or 
clinically significant. Although the experi- 
mental reduction of fear behavior has been 
repeatedly demonstrated, neither generaliza- 
tion nor long-term maintenance of the new 
behavior has been demonstrated. 

Three major fear-reduction strategies have 
been employed: (a) Most studies (20 of the 28) 
have used modeling approaches; (b) 7 have 
used a variety of systematic desensitization 
or contingency management treatment pack- 
ages; and (c) 1 study used a cognitive verbal- 
mediation strategy. Of the three strategies, 
‘the most reliably successful has been modeling 
(in terms of replications and fairly well-con- 
trolled methodology). A number of important 
research issues in modeling remain to be 
investigated, including model-subject simi- 
larity, coping versus mastery strategies, mod- 
eling effectiveness with severe fears, adequate 
demonstration of both generalization and 
| long-term maintenance of the new behavior 
| in the child's natural environment, and use 
| of modeling in the prevention of fears. 

А particularly important issue, not yet 
systematically investigated, is the operation 
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of cognitive self-controlling mechanisms in 
successful modeling. Only one study used a 
verbal-mediation, self-control strategy. Its 
success is encouraging and should generate 
a good deal of research. 

Unlike in the modeling literature, there is 
no convincing evidence that systematic de- 
sensitization or contingency management 
strategies are effective in reducing children’s 
fears and in maintaining that reduction. 
Hatzenbuehler and Schroeder’s (1978) recent 
review came to the same conclusion regarding 
desensitization procedures. 

One of the more interesting, if still remote, 
practical implications of the recent research 
is the potential use of symbolic modeling in 
large-scale community mental health fear- 
prevention programs. A number of common, 
vexing childhood fears (e.g., fear of starting 
school and fear of dental/medical procedures) 
might be prevented on a large scale. 

Overall, fear reduction and prevention in 
children, particularly using modeling and 
cognitive strategies, is a rich area for research 
and development. Many needed research di- 
rections in this field are clearly indicated. 


Conclusions 


Reaching back almost 60 years, the study 
of children’s fears is old in the history of 
modern psychology. However, what is most 
striking is not the field’s age but its apparent 
lack of progress. To paraphrase Berecz's 
(1968) conclusion, the literature still gives 
us only hints about the nature of children's 
fears. After what appears to have been a 
good start in the 19205, the field experienced 
a long hiatus, which appears to be ending: 
The research of the past 10 years has returned 
to the behavior modification focus of the 1920s 
and early 1930s. 

This review suggests that children’s fears, 
like other human reactions, proceed through 
complex processes. The many variables in- 
volved and their interactions result in a com- 
plexity that provides many points at which 
to focus research. Briefly, fear stimuli may be 
internal, external, or both and may vary in 
content, number, intensity, and duration. The 
child’s responses involve combinations of phy- 
siological, cognitive, and overt behavioral 
events, all of which may vary in latency, 
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intensity, and duration and with changes in 
stimulus conditions. The child's responses, 
overt or covert, may act on any of the stimu- 
lus and response variables, modify them, and 
thus occasion change in any parts of the 
process. All of these processes are immersed 
in social settings that contribute further 
Sources of variation. 

Despite this rich complexity, the research 
over nearly 60 years has focused almost ex- 
clusively on the endpoints of the paradigm ; 
that is, normative research has tried to 
identify fear stimuli, whereas the more recent 
fear-reduction research has tried to modify 
the overt avoidance behavior. Virtually all 
of the variables between, observable or in- 
ferred, have been left unexplored, leaving one 
with little understanding of fear processes in 
children, 

Future research will do well to follow the 
literature's hints and look more closely at 
the unexplored areas identified in this review, 
such as the possible adaptive value of 
children’s fears, fear-prevention strategies, 
cognitive self-control variables, and develop- 
mental factors. In short, we must recognize 
and test far more complex paradigms of the 
fear process. And finally, we clearly must cor- 
rect our continued lack of information regard- 
ing severe levels of children's fear. 


Reference Note 


1. Garcia, K. Fears and phobias im childhood. Un- 
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Piagetian Tests of the Similar Sequence Hypothesis 
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From the debate over developmental “universals” in Piagetian theory and the 
controversy between developmental and difference theories of mental retarda- 
tion, an important hypothesis emerges—one that is testable via cognitive- 
developmental comparisons between retarded and nonretarded persons. This 
similar sequence hypothesis holds that retarded and nonretarded persons tra- 
verse the same stages of cognitive development in the same order, differing only 
in the rate at which they progress and in the ultimate developmental ceiling 
they attain, Current evidence relevant to this hypothesis is drawn from 3 longi- 
tudinal and 28 cross-sectional studies of developmental phenomena described 
by Piaget. The great preponderance of this evidence supports the hypothesis 
with respect to every subject group, with the possible exception of individuals 
suffering from pronounced electroencephalogram abnormalities. The quality of 
current evidence is critically evaluated, and procedures by which more precise 
tests of the hypothesis might be fashioned are proposed. Overall, the review 
illustrates that developmental research with atypical populations can be a potent 
tool in testing general developmental theory. Conversely, it illustrates the power 
of general developmental theory to enrich our understanding of atypical de- 


velopment. 


In recent years two important theoretical 
issues have stimulated interest in Piagetian 
research with retarded and nonretarded pop- 
ulations. One is the question of whether 
levelopmental “universals” exist. Many psy- 
chologists regard the sequence of develop- 

mental stages described by Piaget (e.g., 1970) 

and elaborated by other cognitive-develop- 

mental theorists (e.g., Kohlberg, 1969) as 
one of psychology’s few current candidates 
for universality (see Weisz, 1978). Piaget 

(1956, 1966) took a psychological universalist 
position, with qualifications, and Kohlberg 
\ (1969, 1971) argued strongly for the invariance 
mo of what he regarded as a cognitive-develop- 
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mental sequence rooted in an inherent logic 
and in universal characteristics of both the 
nervous system and the environment. 

Of course it is impossible to know that any 
given developmental phenomenon occurs 
everywhere without exception, sincé one can 
never test all possible exceptions (Popper, 
1959). However, if one is not to let the claims 
of cognitive-developmental theorists go un- 
challenged, it is important to evaluate the 
extent to which /ranscontextual validity (see 
Weisz, 1978) has been demonstrated for the 
Piagetian account of development. 

One approach to assessing such validity 
across experimental contexts is to examine 
developmental sequences across various cul- 
tures (Buck-Morss, 1975; Simpson, 1974). 
Another approach, of particular interest 
because of the cognitive emphasis of Piagetian 
theory, is to compare groups of children who 
differ markedly in measured intelligence, that 
is, groups of mentally retarded and non- 
retarded children. If children at very different 
IQ levels were to show identical Piagetian 
developmental sequences, then the transcon- 
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textual validity of the Piagetian account of 
development would be substantially supported. 
If retarded and nonretarded children were to 
differ in their sequence of development, then 
universality could hardly be claimed for the 
Piagetian account. 

A second theoretical issue that has sparked 
recent interest in comparative cognitive re- 
search is reflected in the ongoing debate 
between proponents of developmental and 
difference theories of mental retardation. The 
developmental position, set forth by Zigler 
(1969), is intended to apply to retarded 
individuals not suffering from organic impair- 
ment. Zigler has maintained that the retarded 
child passes through cognitive-developmental 
stages in the same order as the nonretarded 
child, with only two differences: The retarded 
child passes through the stages more slowly 
and attains a lower upper limit relative to the 
nonretarded child.! 

A number of theorists hold what Zigler 
labeled the general difference position. One 
aspect of this position is the view that the 
cognitive development of retarded persons 
differs from that of nonretarded persons in 
ways that go beyond mere differences in rate 
and ceiling of development. Milgram (1973), 
for example, maintained that the cognitive 
levels, or stages, of retarded children differ 
from those of the nonretarded in that the 
Íormer are more likely to contain traces of 
developmentally earlier levels and are more 
likely to show regression to those earlier 
levels. (For further discussion of the develop- 
mental position, the specific difference posi- 
tions, and the rationale underlying them, see 
Weisz & Zigler, in press.) 

This theoretical conflict has generated a 
new emphasis on comparative research into 
the processes (rather than the products) of 
learning and reasoning (Weisz, 1977; Weisz & 
Achenbach, 1975) and into processes of reason- 
ing, as described in Piagetian theory (see 
Wilton & Boersma, 1974). 

The growing interest in the pursuit of 
developmental universals, and the growing 
intensity of the developmental versus difference 
debate, have thus combined to lend theoretical 

force to research comparing the cognitive 
development of retarded and nonretarded 
persons along Piagetian lines. This body of 
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research has grown rapidly within the past 
decade; it now appears to be substantial 
enough to serve as a resource in our efforts to 
answer the principal question raised by the 
universality issue and the developmental versus 
difference debate. This question can be stated 
in the form of a testable hypothesis. 


Similar Sequence Hypothesis 


An appropriate label seems to be the 
similar sequence hypothesis. The hypothesis 
holds that during development retarded and 
nonretarded persons traverse the same stages 
in precisely the same order and differ only in 
rate of development and in the ultimate 
ceiling they attain. To be precise, the develop- 
mental position (Zigler, 1969, 1971) generates 
this hypothesis only with respect to nonre- 
tarded and cultural-familial retarded persons 
(thus excluding, for example, brain-damaged | 
and genetically impaired individuals)’ Таж 4 


1An additional postulate of the developmental 
position is that familial retarded and nonretarded 
persons who are equivalent in developmental level 
(often operationally defined as mental age) do not differ 
in the formed cognitive processes they employ in 
learning and reasoning. This particular proposition is 
not germane to the present review and consequently is 
not discussed here. However, Piagetian evidence 
bearing on this proposition is being reviewed (Weisz 
& Zigler, in press). 

?The reasoning underlying this qualification bears 
brief explanation. The developmental position holds 
that mental retardation can be viewed as a develop- 
mental phenomenon most appropriately among persons 
whose retardation does not result from specific physio- 
logical defects. Such investigators as Benton (e.g. 
1962), Cruikshank (e.g., 1967), and Reitan (e.g., 1973) 
have devoted many years to demonstrating idiosyn- 
cratic performance characteristics that distinguish | 
brain-injured individuals from those with intact * 
nervous systems. Furthermore, a number of studies* 
employing the specific kinds of problem-solving tasks 
most often used in research on the developmental- 
difference controversy have revealed effects of orga- 
nicity on retarded children's performance (Balla, 
Styfco, & Zigler, 1971; Balla & Zigler, 1964; Elkind, 
Koegler, Go, & Van Doorninck, 1965; Harter, Brown, 
& Zigler, 1971). In harmony with such findings, 
proponents of the developmental position have adhered 
to the two-group approach (see Zigler, 1969), whereby 
familial retarded individuals are distinguished from 
those suffering from organic impairment (including 
genetic anomalies such as Down's syndrome). There 1$ 
some disagreement among investigators over the need 
for the two-group approach (Ellis, 1969; Milgram 
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cognitive-developmental theory, however, the 
4& claims for the universality of the developmental 
sequence appear to be broader. Piaget (1956) 
held that “the minimum program for establish- 
ment of stages is the recognition of a distinct 
chronology, in the sense of a constant order of. 
succession" (p. 13). According to Kohlberg 
(1969), the claim that there is an invariant 
order of cognitive stages rests upon an 
assumed invariance in certain features of the 
* Environment and of the nervous system and 
ЕЗ upon “а logical analysis of orderings inherent 
in given concepts" (p. 355). These inherent 
orderings are seen as logically essential and 
as independent of individual differences among 
people. Kohlberg continued, “The invariance 
of sequence in the development of a concept or 
category is not dependent upon a prepatterned 
unfolding of neural patterns; it must depend 
upon a logical analysis of the concept itself” 
Чр. 355). Thus, the similar sequence hypothesis 
s» as advanced by cognitive-developmental the- 
orists seems to predict a truly universal 
ordering of stages—an ordering that is the 
same for retarded children of all etiologies 
(including genetic impairment, brain injury, 
and other neurological anomaly) as it is for 
all nonretarded children. There is a conserva- 
tive version of the similar sequence hypothesis 
that applies only to familial retarded and 
M done persons and a liberal version 
that applies to all persons. In the present 
article we present evidence bearing on both 
versions, 

In contrast with both these versions, 
Milgram (1973) has argued that the retarded 
child’s cognitive stages differ from those of 
the nonretarded child. In contrast with the 
liberal version of the hypothesis, Rogers 

„2 (1977) has described a rationale for (though 
\she has not necessarily endorsed) the hypoth- 
esis that profoundly (and thus nonfamilial) 


1973). Moreover, there is a strong Piagetian rationale 
for applying the similar sequence hypothesis to all 
persons, retarded or nonretarded, organically impaired 
or intact (see the remainder of the paragraph for 
details). In what follows we describe this rationale, and 
we go on to review evidence in a manner that bears 
directly on the conservative, two-group-oriented 
version of the hypothesis and on the more liberal version 
in which the similar sequence hypothesis is applied to 
all retarded groups regardless of etiology. 
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retarded children have abnormal develop- 
mental patterns. 


Material Excluded From the Present Review 


The present article is an attempt to synthe- 
size studies relating to the similar sequence 
hypothesis. In selecting studies to be reviewed, 
we excluded studies of reading per se and of 
language development per se. Although both 
areas can be viewed from the perspective of 
Piagetian theory, neither is central to the 
theory ; furthermore, the research in both areas 
is now so voluminous as to warrant separate 
review. We also excluded studies designed to 
accelerate cognitive development, since it is 
not the purpose of this article to determine 
whether retarded or nonretarded children can 
be trained more readily. 

The studies we do include in this review 
vary widely in their sampling procedures, 
their experimental methodology, and their 
approaches to data analysis and reporting. 
Consequently, the studies differ in their level 
of importance vis-à-vis the hypothesis of 
particular interest here. For this reason, we 
reserve the right to vary the level of detail 
in which we describe the studies and give 
relatively greater space to those that seem to 
afford the clearest tests of the hypothesis. 


Tests of the Similar Sequence Hypothesis 
Cross-Sectional and Order-of-Difficulty Evidence 


One approach to testing the similar sequence 
hypothesis is to assess groups of mentally 
retarded children at more than one develop- 
mental level with respect to their performance 
on various Piagetian tasks. If the direction of 
the difference in performance from one 
developmental level to another is the same 
for the retarded as for the nonretarded, or if 
the direction is consistent with the develop- 
mental sequence posited by cognitive-develop- 
mental theory, then the similar sequence 
hypothesis is supported. A second general 
approach to testing the similar sequence 
hypothesis is to rely less upon the develop- 
mental levels of the groups sampled than upon 
the relative-difficulty levels of the various 
tasks or behavioral items being employed. 
Perhaps the simplest, but also the least 
informative, of the variants of this approach 
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is to rank order the items with respect to the 
number of subjects who pass each one; if 
this rank ordering of a retarded sample matches 
either the rank ordering of a nonretarded 
sample or the developmental sequence posited 
by cognitive-developmental theory, then the 
similar sequence hypothesis is supported, 
albeit modestly. A more informative type of 
order-of-difficulty evidence is the type that 
employs scaling procedures, allowing one to 
determine, for example, how many of the 
children who grasp Concept A also grasp 
Concept B and vice versa. Such evidence, 
when combined with Guttman-type (eg., 
Guttman, 1950) scalogram analyses, can 
provide a relatively strong test of the similar 
sequence hypothesis. The studies reviewed in 
this section all employed some type of cross- 
sectional evidence, order-of-difficulty evidence, 
or a combination of the two. 

Development in the sensorimolor period. 
Early evidence bearing on the similar sequence 
hypothesis was provided by the research of 
Woodward (1959, 1961, 1962, 1963). The first 
of her studies (1959) focused primarily on 
a group of 65 institutionalized children and 
adolescents with a chronological age (CA) 
range of 7-16 years who were so profoundly 
retarded that they failed to attain a basal age 
of 2 years on the Terman-Merrill scale. 
Although the author maintained that this 
sample excluded cases involving motor or 
sensory disability, the cases involved a divers- 
ity of medical problems (e.g., 19 subjects 
were epileptic), and 38 of the children were 
“emotionally unstable.” Woodward used three 
means of assessing the sensorimotor stages of 
this heterogeneous group. First, she observed 
their spontaneous mannerisms and their 
manipulation of toys presented individually 
and in a standardized order. Second, each 
subject was presented with three pairs of 
tasks, each pair tapping one of Piaget’s 
(1953, 1955) last three sensorimotor stages 
(there are six stages in all). Third, Woodward 
presented each child with a series of object 
concept tasks in which a piece of candy or a 
toy was first used to attract the subject’s 
attention and was then withdrawn and con- 
cealed to varying degrees. 

All but the object concept tasks were 
analyzed in a way that sheds light on the 
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similar sequence hypothesis. Each task 
classified with respect to the Piagetian senso. 
motor stage it was designed to represent 
then the tasks were ranked ordered with respect 
to the percentage of subjects passing each. 
The difficulty level rankings of these 11 items 
were identical to the Piagetian stage lev 
order, with one exception: A task involvin 
coordination of vision and hearing (Sensori- 
motor Stage 2) proved to be slightly more 
difficult than a task involving manipulation o 
objects (Stage 3); 53 subjects passed they 
manipulation task, and only 49 passed the 
coordination task. Furthermore, when the 
possibly insensitive coordination task was 
removed from the analyses, 59 of the 65 
children passed all of the items at stages 
below their highest stage level response. Given 
the extreme diversity of this sample, the high 
incidence of emotional instability, and the 
apparent tendency of many not to show 
responses of which they were actually capable 
(e.g., some delayed for a half hour before 
grasping an object placed before them), these 
data lend surprisingly strong support to ће 
similar sequence hypothesis. Ds, 
Recently, Rogers (1977) undertook am 
investigation similar to that of Woodward in 
several respects. The subjects, 40 profoundly 
retarded children ranging in age from 8-14 
years, with IQs below 20, were given a seri 
of Piagetian tasks. By means of these tasks, 
each child’s performance was classified into 
Sensorimotor Stage 3, 4, 5, or 6 in each of four 
conceptual domains: object permanence (tasks 
involving searches for a hidden object); 
spatiality (tasks involving visual anticipation 
and rotation of objects), causality (tasks 
involving the use of physical prompts and 
tools, the removal of obstacles, and inference (a 
as to the cause of a jingling sound inside a box), / 
and imitation (tasks involving the reproduction 
of both self-initiated and experimenter-initiated 
movements and sounds). Performance within 
each of the four domains was analyzed using 
scaling techniques, and Guttman’s (1950) 
coefficient of reproducibility and Green's 
(1956) index of scalability were calculated 
for each scale. The object permanence and 
imitation tasks formed highly reproducible 
scales in the orders hypothesized by Piaget 
(1955, 1962, 1972). Causality tasks also 
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formed a highly reproducible scale, although 
Whe item order differed from the predicted 
sequence in one respect: One Stage 6 item 
preceded one Stage 5 item. The author 
attributed this irregularity to a poor choice of 
Stage 6 task (i.e., opening a box to obtain a 
bell when box opening has just been demon- 
strated to the subject), “since the task used 
might have been accomplished using imitation 
rather than problem-solving skills” (Rogers, 
* 41977; pp. 841-842). Finally, the individual 
spatiality tasks did not all form highly repro- 
ducible scales, but when the tasks within each 
stage were combined (and subjects were 
credited with a stage level for passing one or 
more of the tasks from that level), the stages 
did form a highly reproducible scale. Rogers 
concluded convincingly that her findings 
support “the invariant sequentiality of sensori- 
motor stages" (p. 841). 
" The preoperalional-concrele operational tran- 
ie" sition—the Inhelder study of conservation. One 
of the earliest studies bearing on the similar 
sequence hypothesis was carried out by 
Piaget’s associate, Barbel Inhelder, in the 
early 1940s. This study, now published in 
English (Inhelder, 1943/1968), involved the 
assessment of conservation of substance, 
weight, and volume in 159 persons who had 
sbeen. labeled mentally retarded by Swiss 
education officials. The sample was extremely 
heterogeneous (see Jordan, 1976). Ages ranged 
from 73-52 years, IQs ranged from 35-104, 
institutionalized and noninstitutionalized per- 
sons were included, and the range of etiologies 
and physical maladies included such diverse 
states as "defective environment," rickets, 
hearing defect, “abandoned,” and schizo- 
phrenia. The procedure involved semistruc- 
„2 tured clinical interviews with each subject. 
‘Since the procedure was not perfectly standard- 
ized and little in the way of formal data 
analysis was presented, it is difficult to 
evaluate Inhelder's conclusions, including her 
references to “oscillations” in the reasoning 
of retarded subjects, discussed later in this 
article. However, in Piaget's (1968) description 
of the Inhelder study, he explained that in 
the entire sample, 
not one [individual] understood the conservation of 


weight without having the conservation of substance, 
nor the conservation of volume without both weight 
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and substance, while the conservation of substance 
was found without the other two, and the conservation 
of weight was found without the conservation of 
volume. (p. 11) 


Given the marked heterogeneity of the sample, 
such uniform support for the similar sequence 
hypothesis is noteworthy. 

Other studies of conservation and related 
concepts using retarded samples only. Studies 
of conservation and related concepts done 
since the Inhelder investigation have a bearing 
on the similar sequence hypothesis, despite 
the fact that they only.sampled retarded 
subjects. Klauss and Green (1972) assessed 
conservation of number and volume іп 27 
trainable mentally retarded subjects ranging 
in age from 13-19 years and in IQ from 29-57. 
These investigators found that volume con- 
servation presented greater difficulty than did 
number conservation, a finding consistent 
with the pattern apparent in the nonretarded. 
Marchi (1971) tested conservation of mass, 
weight, and volume in 106 educable mentally 
retarded children. Difficulty level evidence 
suggested that contrary to Marchi's prediction, 
the retarded “follow a similar sequence in the 
acquisition of mass, weight, and volume as 
postulated for normals" (p. 6442). 

Roodin, Sullivan, and Rybash (1976) asses- 
sed qualitative identity, quantitative identity, 
and equivalence conservation (see Elkind, 
1967) in 60 institutionalized retarded children 
averaging 13 years of age and about 47 in IQ. 
Dyed water was poured from a standard 100 ml 
beaker; to test qualitative identity, subjects 
were asked, “Is the water in this glass (com- 
parison) the same water that was in that glass 
(empty standard)?" To assess quantitative 
identity, subjects were asked, “15 there as 
much water in this glass (comparison) as there 
was in that glass (empty standard)?" To 
assess equivalence conservation, two standard 
beakers were filled with equal levels of water, 
and the contents of one were then poured into 
a comparison beaker; the experimenter then 
asked, “Is there as much water in this glass 
(standard) as there is in this glass (compar- 
ison)?” Previous reasearch (e.g, Papalia & 
Hooper, 1971) with nonretarded children had 
suggested that the developmental order for 
the attainment of these three concepts would 
be qualitative identity, quantitative identity, 
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and equivalence conservation. In the Roodin 
et al. study, analyses of the number of con- 
servers on each task indicated a parallel order 
of difficulty. 

In a similar study also employing 60 institu- 
tionalized retarded children (age range of 
approximately 10-16 years and average IQ of 
approximately 57), McManis (1969c) in- 
vestigated identity and equivalence conserva- 
tion with three types of material (Styrofoam 
balls, clay, and water). Like Roodin et al. 
(1976), McManis found evidence that the 
developmental sequence of his retarded sub- 
jects replicated that of nonretarded children. 
The notion that identity conservation must 
precede equivalence conservation was sup- 
ported by the finding that no subject who 
failed to achieve identity conservation showed 
equivalence conservation, whereas 139-1890 
of the subjects (precise percentage depending 
on the particular task used) displayed identity 
conservation without equivalence conservation. 

Three studies that examined conservation 
of number, and number concepts generally, in 
mentally retarded groups yielded similar 
findings, despite some differences in methodol- 
ogy. Woodward (1961) investigated numerical 
concepts of 94 institutionalized individuals 

(50 adults with average CA of 19 years and 
44 children and adolescents with average 
CA of 12 years) ranging in IQ from 25-73. 
Tests given to the subjects included assess- 
ments of their understanding of (a) one-to-one 
correspondence and equivalence of correspond- 
ing sets, (b) ways of equalizing unequal 
groups, (c) seriation, and (d) conservation of 
continuous quantity (water and sand). Per- 
formance was scored as indicative of one of 
two preoperational stages or of concrete 
operational thinking. When the stage level 
assignments were plotted as a function of the 
IQs (and thus roughly of the mental ages or 
MAs) of the adult subjects sampled, the 
table reflected precisely what would be 
expected from the application of Piaget's 
stage scheme to nonretarded individuals. 

In the second of the three studies, 
Mannix (1960) administered eight of Piaget's 
(1952) number concept tasks to 48 “educa- 
tionally subnormal" individuals ranging in 
MA from 5-9 years. The tasks included two 
tests of additive composition, one test of 
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coordination of equivalence relations, two 
tests of judgment of correspondence between 
sets of items, and two conservation tasks 
(continuous and discontinuous quantities). 
Responses to these tests were classified into 
Piagetian stage levels, a scalogram was con- 
structed, and the coefficient of reproducibility 
was .94. Mannix's brief report gave little 
information as to the precise nature of the 
scale types; but apparently the scalogram was 


consistent with Piaget's stage theory, because- 


the author concluded that educationally 
subnormal children “pass through the three 
stages of development described by Piaget" 
(Mannix, 1960, p. 181). 

The third of these studies on number 
concepts was conducted with 20 institutional- 
ized mentally retarded persons in New Zealand 
(CA range of 8-17 years; IQs of 29-65). 
Singh and Stott (1975) presented these 
subjects with a series of number conservation 
tasks designed to classify them with respect 
to three Piagetian number stages: Stage 1— 
child fails to attend to relevant cues and fails 
to conserve; Stage 2—child selectively attends 
to only certain relevant cues, can match 
perceptually, but cannot conserve; Stage 3— 
child conserves, showing understanding of 
invariance of properties despite transformation 


in appearance. Data bearing on the similar 


sequence hypothesis are not reported in 
detail, but the authors’ conclusion is quite 
clear: “Retarded children apparently develop 
sequentially in the same order as normals but 
at a slower rate and at a later CA" (Singh & 
Stott, 1975, p. 220). 

One other study that used only a retarded 
sample deserves mention both because of its 


scope and because of an important issue it. 


raises. In this study, Lister (1972) assessed 
six types of conservation among 115 educa- 
tionally subnormal pupils in Great Britain. 
The subjects were aged 8-16 years, and their 
IQs ranged from 47-81. Both difüculty level 


rankings and a scaling procedure strongly | 


suggested the following developmental se- 
quence in the emergence of these types of 
conservation: number, substance, length, 
weight, volume, and area. Although no scalo- 
gram statistics were calculated, only 6 of the 
115 subjects showed a scalogram response 


pattern inconsistent with the preceding order. " 
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‘Lister noted that the order with respect to 
«substance, weight, and volume was consistent 
with previous Piagetian research, whereas the 
suggested order of the remaining attributes 
differed from at least some previous findings 
with nonretarded subjects. Her own interpre- 
tation of the discrepancies was that they 
resulted from experiment-to-experiment varia- 
tions in the specifics of the problems used to 
assess the various types of conservation. This 
А «is a very real possibility, and it is one reason 
why tests of the similar sequence hypothesis 
that expose retarded and nonretarded subjects 
to the same experimental procedures must be 
regarded as stronger evidence than experiments 
that test only retarded children and compare 
the findings with those of different experi- 
ments. We now turn to six studies of the 
former type. 
, Studies of conservation and related concepts 
employing both retarded and nonretarded subjects. 
Four of the experiments in which the per- 
formance of retarded and nonretarded subjects 
was directly compared were conducted by 
McManis (1969b, 1969d, 1969e, 1970). In 
one of these, 90 institutionalized retarded 
subjects (IQs of 47-73) and 90 nonretarded 
elementary school children (IQs of 85-115) 
were tested for conservation of mass, weight, 
jo» and volume of clay, using Piaget and Inhelder's 
(1941) *sausage" technique. About half the 
retarded subjects were organically impaired. 
Analyses of the mean scores for the conserva- 
tion tasks indicated that conservation of mass 
was easiest and conservation of volume most 
difficult for both the retarded and the non- 
retarded group, providing some support for 
the notion that the order of emergence of these 
, types of conservation in groups of both 
„д average and below-average IQ is as follows: 
' mass, weight, then volume. 

In another article McManis (1969e) re- 
ported his assessment of conservation and 
transitivity of weight (clay) and length 
(sticks) in what appears to be the same 
sample used in his 1969d experiment. The 
study was designed to test the hypothesis, 
derived from Kooistra (1964), that for any 
given property (e.g, weight) conservation 
will appear developmentaly earlier than 
transitivity. The results supported this hypoth- 

,., esis for both weight and length in retarded 
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and nonretarded subjects. McManis (1970) 
then explored the relations among conserva- 
tion, seriation, and transitivity (of length) 
within groups of 80 institutionalized mentally 
retarded persons (IQs of 46-72) and 80 
nonretarded elementary school children (IQs 
of 85-116). Among both retarded and non- 
retarded children who showed discrepant 
performance on the conservation and seriation 
tasks, nearly all showed conservation without 
seriation. Among both retarded and non- 
retarded children who showed discrepant 
performance on the seriation and transitivity 
tasks, nearly all showed seriation without 
transitivity. These findings indicate that 
seriation falls developmentally between con- 
servation and transitivity (at least with respect 
to the property of length, as measured in this 
experiment) for both retarded and nonretarded 
persons. 

Ina fourth article based on the same sample 
used in two of the preceding studies (McManis, 
1969d, 1969e), McManis (1969b) tested 
Piaget’s (1952) view that there are three 
hierarchically ordered stages in the develop- 
ment of quantitative comparison processes. 
In the first stage, children are said to consider 
only uncoordinated perceptual relations of 
gross qualitative equality or difference; in 
the second stage, intensive quantity, children 
are said to compare quantities by seriating 
them along more than one dimension (e.g., 
width and height) simultaneously ; in the third 
stage, extensive quantities, children are said 
to be capable of overruling apparent differences 
between two equal quantities by imposing 
equal units of measurement upon them. 
McManis tested his young subjects’ perform- 
ance of these three types of comparison, using 
sticks, colored water, and beads. The analysis 
of scores on these tasks indicated that for 
both the retarded and the nonretarded group, 
gross comparisons were the simplest (they 
were passed by nearly all subjects in both 
groups) and extensive comparisons were the 
most difficult. These findings are consistent 
with the view that for children at both IQ 
levels the developmental order is as follows: 
gross, intensive, and extensive quantity (the 
order posited by Piaget). One other study 
(McManis, 1969a) should be mentioned in this 
connection. McManis’s tests of quantity com- 
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parison were given to 140 institutionalized 
mentally retarded persons, who were divided 
into equal groups representing different IQ 
levels (IQs of 30-49 and of 50-69). The 
procedure and analyses were similar to those 
employed in the preceding study (McManis, 
1969b). The results indicated, as in the 
preceding experiment, that comparison of 
gross quantities was easiest and comparison 
of extensive quantities was most difficult, 
regardless of the IQ level of the subjects. 

Three other comparative studies were 
designed to address the problem of order of 
events across the types of conservation. Gruen 
and Vore (1972) assessed conservation of 
number (poker ships), continuous quantity 
(water), and weight (clay rolled into various 
shapes) in familial retarded (IQs of 55-80) 
and nonretarded (IQs of 90-120) public school 
pupils. Both retarded and nonretarded groups 
were divided into three subgroups of MA: 5, 
7, and 9 years. Evidence on developmental 
ordering was in the form of mean scores for 
the three types of conservation, analyzed 
within each MA level. In one set of analyses, 
conservation judgments alone (i.e., disregard- 
ing the subjects’ verbal explanations) con- 
stituted the dependent variable. With this 
criterion, performance of nearly all subjects at 
MA 9 was correct; for the other two MA 
levels, both retarded and nonretarded subjects 
tended to score significantly better on the 
number task than on the quantity or weight 
tasks. For nonretarded children at MA 7, 
however, the differences were not significant. 
The quantity and weight tasks did not differ 
significantly in difficulty for retarded and 
nonretarded children. 

In a second set of analyses by Gruen and 
Vore, the dependent measure was conservation 
judgment in combination with the subject’s 
explanation for that judgment. Using this 
criterion, there was no significant task effect 
at the MA 5 level. At the MA 7 level both 
retarded and nonretarded subjects did some- 
what better at the number task than at the 
quantity and weight tasks, but the differences 
were only significant for the retarded subjects. 
At the MA 9 level both retarded and non- 
retarded subjects scored significantly higher 
on the number than on the weight task and 
significantly higher on the quantity than on 
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the weight task. Thus, although order-of- 
difficulty patterns were similar for retarded 
and nonretarded subjects, with use of judg- 
ments alone and judgments plus explanations, 
task differences tended to be statistically 
significant among the retarded more often 
than among the nonretarded. In attempting 
to account for this trend, Gruen and Vore 
(1972) made an interesting point: 


McManis (1969[e]) has suggested that there is a 
transitional period (MAs of 7-10) in which the various 
concrete operations are obtained and that retarded 
children progress through this period more slowly 
than do normal children. If this is true, it would be 
expected that the performance of normal children on 
various conservation tasks would vary less from task 
to task than that of retarded children. This also 
suggests that retarded children may be ideal subjects 
for investigating the transition process from preopera- 
tional to concrete-operational thinking. (p. 156) 


We conclude this section of the review by 
discussing two conservation studies by Achen- , 
bach. Building on the work of Charlesworth 
(1969) and Mermelstein and Shulman (1967), 
Achenbach (1973) inferred children's identity 
concepts with respect to color, number, 
length, and continuous quantity from their 
surprise reactions to contrived changes in 
those properties. For example, to test number 
identity concepts two toy Indians were placed | 
in a box, and when the bottom was opened 
three Indians dropped out. Among nonretarded 
subjects (M IQ — 116), there were signif- 
icantly more frequent surprise reactions to a 
change in color than to changes in the three 
quantitative properties, a finding consistent 
with Piaget’s view (see Piaget & Voyat, 1968) 
that children develop identity concepts for 
qualitative properties such as color prior to 
the emergence of identity concepts for quanti- 
tative properties such as number, length, апд, 
continuous quantity. Surprise reactions to 
change in color and number were virtually 
identical in 45 familial and 16 Down syndrome 
retarded subjects (M IQ = 47). The fre- 
quencies of surprise reactions to changes of 
the three different quantitative properties 
for both retarded and nonretarded subjects 
are consistent with the findings of others 
(including Gruen & Vore, 1972) that successful 
performance on conventional length and 
number conservation tasks is simpler than, and 
thus presumably developmentally prior to, 
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¿success on conventional continuous quantity 
| tasks. Thus, once again we see fairly strong 
support for the view that the sequence of 
developmental events for the retarded child 
is similar to that for the nonretarded child. 

A different type of similarity is illustrated 
by Achenbach's (1969) study of nonretarded 
public school children (IQs of 94-168) and 
nonorganically impaired retarded children 
(IQs of 31-78) from public schools and 

«institutions. He assessed children's conserva- 
tion concepts with respect to length, area, and 
volume? (employing 4 tasks for each of the 
three properties) by using optical illusions to 
create discrepancies between the actual and 
apparent sizes of various stimuli. To test 
conservation of length, for example, the 
experimenter presented each child with a 
barbell illusion in which a small metal rod 
„that fit into a groove that just touched the 
“inner edges of two circles was placed into 
another groove that passed through two 
circles and touched their outer edges. The 
effect was to make the rod appear longer in 
the second position than in the first. Subjects 
were then asked whether the rod would fit 
into the original groove. An important feature 
of the study, for our purposes, is that the 12 
tasks were designed to be free of intellectual 
demands in the areas of additivity, numeration, 
conservation of equivalence, or complex verbal 
expression—dimensions along which more 
traditional conservation tasks often vary. This 
made it possible to test the contention of 
Braine and Shanks (1965a, 1965b) that the 
attainment of conservation with respect to 
the various properties would be parallel if the 
performance criteria used were standard across 
the types of conservation. Consistent with the 

4 Braine-Shanks view, Achenbach (1969) found 

4а “total absence of evidence for a horizontal 
décalage” (p. 677) in the three types of con- 
servation, for both the retarded group and 
the nonretarded group. In both groups, there 
were neither consistent nor significant differ- 
ences in success rates for length, area, and 
volume tasks. This finding, together with the 
reasoning of Braine and Shanks, suggests that 
some of the order-of-difficulty and scalogram- 
type evidence reviewed in the preceding 
paragraphs may be more indicative of differ- 
ences in the specific requirements of the 
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contrived tasks employed than of actual 
differences in the order of emergence of the 
various types of conservation. 

Concepts of time. A number of studies have 
addressed the similar sequence hypothesis in 
content areas other than conservation. In 
one study on the concept of time, Lovell and 
Slater (1960) interviewed 50 educationally 
subnormal children (IQs not reported) aged 
8, 9, 10, 11, and 15 years and 50 “average to 
above average" children aged 5-9 years. The 
interview included tasks (some were Piaget's) 
designed to measure concepts of simultaneity 
and equality of synchronous intervals (e.g., 
asking the child to judge whether two dolls 
traveling at different rates but starting and 
stopping at the same time actually traveled 
for the same amount of time). There were also 
tasks involving chronological ordering of 
events and children's concepts of age and 
interior time. Little in the way of statistical 
analysis was reported, but nonetheless, Lovell 
and Slater concluded that the understanding 
of these five concepts of time follows roughly 
the same sequence in retarded as in normal 
children, although the stages in understanding 
are reached some years later by retarded 
children. 

Concepts of space. Two studies that in- 
cluded a retarded sample were specifically 
concerned with spatial concepts. In one of 
these, Houssiadas and Brown (1967) sampled 
40 institutionalized, mentally retarded Aus- 
tralians (M IQ — 55) who showed no evidence 
of mongolism or other specific defects and 
who ranged in age from 8-15 years. These 
subjects were presented with two perspective- 
taking tasks (one pictorial and one using 
manipulation of actual objects) in which they 
were asked to identify their own perspective 
on a perceptual array as well as the perspective 
of another person seated at a different position. 
Although no statistical analyses were reported, 
the pattern of passes and failures on the 
different items was consistent with the view 
that retarded individuals pass through the 
stages identified by Piaget; that is, first there 
is difficulty in identifying one's own perspective 


? The volume conservation task used by Achenbach 
(1969) actually tapped what Piaget (has) called con- 
servation of continuous quantity. 
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and that of another, second there is only 
difficulty in identifying how a perceptual 
array might look from another position, and 
third the individual is able to "coordinate 
perspectives," identifying not only his own 
perspective but that of another person as well. 
Summing up, Houssiadas and Brown (1967) 
concluded, “It is clear that the pattern of 
predominant responses follows the same 
sequence suggested by Piaget, whose data were 
derived from normal children" (p. 213). 

In a more fine-grained analysis of spatial 
concepts, Woodward (1962) tested the same 
institutionalized retarded group of 50 adults 
and 44 children used in the study of number 
concepts described earlier (Woodward, 1961). 
In this sample, 50% of the adults and 61% 
of the children showed some type of organic 
impairment. The spatial tasks included mea- 
sures of the ability to reproduce a spatial 
order under varying degrees of transformation 
(e.g., reproducing a circular array of beads on 
a horizontal rod). Using a similar procedure 
with nonretarded children, Piaget and Inhelder 
(1956) identified seven stages through which 
their children passed as they improved on the 
tasks. Woodward (1962) constructed a table 
of scale types to assess the comparability of 
her results with those of Piaget and Inhelder. 
Although no scalogram statistics were cal- 
culated, the great majority of Woodward's 
subjects fit scale types consistent with the 
developmental sequence posited by the 
Genevans. 

A second task employed by Woodward 
involved drawing copies of 21 geometric 
figures used by Piaget and Inhelder. The 
compatibility of subjects' scores on these 
tasks with a four-stage sequence advanced 
by Piaget and Inhelder (1956) was demon- 
strated by the fact that "subjects classified 
by the features of a given stage showed the 
features of the lower stages in most cases" 
(Woodward, 1962, p. 31). However, once 
again no scalogram statistics were reported, 

and 5 of the 14 performance criteria by 
which stage assignments were made were 
outside the appropriate difficulty level for 
at least some of the subjects. The third task 
employed by Woodward was a “reference 
points" problem in which adult subjects only 
were presented with drawings of a bottle 
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tilted at various angles and said to be about, 
one-fourth full of water. The subjects’ task - 
was to pencil in the portion of the bottle 
occupied by the water. The performance data 
presented for this task were extremely sketchy, 
but Woodward indicated that the order of 
difficulty of the tasks was the same as that 
found by Piaget and Inhelder. In her overview 
of her findings bearing on what we have called | 
the similar sequence hypothesis, Woodward - 
(1962) concluded that for her retarded subjects, | 
“The sequence suggested by Piaget and? 
Inhelder [for nonretarded children] was 

confirmed for all three spatial concepts that 

were investigated" (p. 35). 

Relative thinking. In investigating the “logic 
of relations” in children, Piaget (1928) used a 
“brothers and sisters" problem and a “right ' 
and left" problem. In the former problem, | 
children's understanding of the relation be- | 
tween being and having a sibling was explored" 
by asking such questions as “George has “ 
three brothers, Paul, Henry, and Charles. 
How many brothers has Paul? How many 
brothers are there in this family?" In the 
right and left problem, children were in- 
structed, Show me your right hand, your left. 
Show me my right hand, my left" Lane and 
Kinder (1939) used these two Piagetian 
problems with 50 institutionalized retarded | 
individuals of unspecified etiology who were 
grouped, for purposes of data analysis, into 
four different IQ levels: 38, 51, 64, and 77. 
Instead of scalogram statistics, relative levels 
of the questions were reported for each IQ 
group. These data indicated that the rank 
ordering of difficulty for the 11 questions was 
similar across the different IQ levels—a 


parallelism consistent with the similar sequence j| 


hypothesis. 


Moral judgment. Abel (1941) investigated / 
moral judgment in 74 institutionalized "sub- - 


normal adolescent white girls" (aged 15-21 
years; IQs unspecified). Subjects were ques- 


tioned about seven brief stories concerning | 


immanent punishment (the inevitability of 
punishment following a misdeed), retributive 
justice (punishment orientation, particularly 
of the“‘eye for an eye” variety), and judgments 
of the gravity of a misdeed (using information 
on consequences of the deed and intent of the 
transgressor). Mirroring previous findings with 


nonretarded individuals (Lerner, 1937, 1938), 
ју Abe's findings were that with increasing 
maturity (defined in terms of MA) subjects 
gave nonsignificantly greater weight to intent 
and less weight to consequences in judging 
the gravity of a misdeed and were significantly 
less likely to consistently advocate retributive 
punishment, Unlike the nonretarded persons 
in at least some research, Abel's more mature 
subjects (MAs of 9-11 years) did not show 
any less pronounced a belief in immanent 
‘punishment than did her less mature subjects 
(MAs of 6-8 years). In fact, about 82% of 
both groups showed such a belief, which Abel 
(1941) attributed to the “constraining” in- 
stitutional environment “that controls the 
girls with threats of immanent punishment" 
(p. 386). Except for this one anomaly, the 
Abel data are consistent with the similar 
sequence hypothesis. 
* Studies of multiple concepts. We conclude 
this section on cross-sectional research with a 
discussion of studies that have assessed 
concepts in more than one conceptual domain. 
DeVries (1970, 1973a, 1973b, 1974) assessed 
a variety of Piagetian concepts in bright 
(M IQ ~ 130), average (M 10 ~ 105), and 
retarded (M IQ ~ 72); etiologies not reported) 
children, all enrolled in public schools. The 
tasks included the brothers and sisters and 
right and left problems described earlier; 
tests of generic and sex identity and of con- 
servation of mass, number, length, and liquid; 
interviews on magic and dream concepts; 
object sorting and class inclusion problems; 
and a guessing game (“Which hand has the 
penny?") designed to reveal the level of 
children's role-taking skills. Of all the tasks 
used, data from the guessing game task were 
presented in the most complete manner (see 
ı DeVries, 1970). Using an independent sample 
of 64 high-IQ children, DeVries (1970) 
classified behavior on the guessing game with 
respect to 10 characteristics (e.g., does not 
always hide penny in the same hand). These 
characteristics formed a highly reproducible 
| Guttman-type scale with a reproducibility of 
| .95 and an index of consistency of .66. The 
scale was then used with the bright, average, 
and retarded samples and checked against 
Kohlberg’s (1969) criteria for developmental 


| sequentiality, namely, (a) mean scale scores 


| 
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should increase with age, (b) success on each 
individual scale item should increase with age, 
and (c) the sequence of items should be 
justifiable with a logical rationale based on 
Piagetian theory. DeVries (1973b) maintained 
that her scale met the third criterion, and her 
data (DeVries, 1970) indicate that the first 
two criteria were met within the bright, 
average, and retarded groups separately. 
Similar analyses were carried out with respect 
to theother 14 Piagetian tasks, with a Guttman 
scale constructed for each. Within the average 
and retarded groups each scale met Green's 
(1956) criterion of an index of consistency 
greater than .50, and the lowest coefficient of 
reproducibility was .94 (DeVries, Note 1). 
DeVries (1973b) indicated that all of the 
Kohlberg criteria for sequentiality “were 
applied to each ability group (ie. bright, 
average, and retarded subjects) separately, 
and the order of scale items was the same for 
each ability group on all tasks” (p. 3). 

Stearns and Borkowski (1969) investigated 
conservation of continuous quantity (water) 
and discontinuous quantity (blocks and 
marbles) as well as horizontal-vertical space 
perception in institutionalized retarded in- 
dividuals (IQs unspecified) ranging in age 
from 73-27 years. Consistent with Piaget's 
(e.g, 1964) view (and supporting findings; 
see Elkind, 1961) that conservation of con- 
tinuous quantity is more difficult and emerges 
developmentally later than conservation of 
discontinuous quantity, Stearns and Borkow- 
ski found performance on their test of the 
former concept to be significantly poorer 
than performance on their two tests of the 
latter concept. Scores were also highly similar 
for the tests of horizontal and vertical frames of 
reference; this finding is consistent with 
Piaget's (see Piaget & Inhelder, 1956) view 
that concepts of the vertical and of the 
horizontal are acquired at the same time. 

Finally, we turn to two studies by Lovell and 
his colleagues (Lovell, Healey, & Rowland, 
1962; Lovell, Mitchell, & Everett, 1962). 
The studies reported few relevant statistical 
analyses, but the diversity of concepts exam- 
ined makes them worthy of brief attention. 
In the Lovell, Mitchell, and Everett study, 
groups of nonretarded and educationally 
subnormal individuals (no IQs reported) 


842 


were divided into separate age groups. The 
skills investigated included additive and 
multiplicative classification (of objects and 
pictures differing in multiple dimensions), 
seriation, multiplication of asymmetrical tran- 
sitive relations, hierarchical classification, class 
inclusion, and visual and tactile classification. 
For all tasks the tabled data indicated a 
general improvement in performance with 
increasing age level for both normal and sub- 
normal groups (no significance tests reported). 

In the study by Lovell, Healey, and Rowland, 
the subjects were again groups of nonretarded 
and educationally subnormal persons from 
special schools (IQs unreported) who were 
divided into separate age groups. The groups 
were presented with 12 of the tasks used by 
Piaget, Inhelder, and Szeminska (1960) to 
study the child’s geometric concepts. Within 
normal and subnormal groups separately, 
correlation coefficients were calculated that 
related Piagetian stage levels on the 12 tasks 
to subjects’ age levels. Of the 24 coefficients, 
23 were significant at the .01 level. In both 
Lovell et al. studies, the details of subject 
selection, experimental procedure, and statist- 
ical analyses are so skimpy that the findings 
must be regarded as only suggestive. Nonethe- 
less, although they are not by any means 
definitive, the data are in harmony with the 
similar sequence hypothesis. 


Summary of the Cross-Sectional and Order-of- 
Difficulty Evidence on the Similar Sequence 
Hypothesis 


Thus far we have reviewed 28 studies in 
which cross-sectional and order-of-difficulty 
evidence is reported in ways that have some 
bearing on the similar sequence hypothesis. 
The degree of retardation involved in the 
samples ranged from profound to mild; the 
retarded persons sampled ranged in age from 
childhood to adulthood, were both institu- 
tionalized and noninstitutionalized, and in- 
cluded both cultural-familial cases and individ- 
uals with diverse organic and emotional 
disorders. The nonretarded contrast groups, 
when employed, ranged from slightly below 
average to extremely high in IQ. The studies 
reported also varied widely in their experi- 
mental methodology and in their methods 
of data analysis. Despite this great diversity 
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in methodology and in sample characteristics, 
the data reviewed show rather consistent 
support for the similar sequence hypothesis, 
both in its conservative form, which applies 
only to nonretarded and familial retarded 
persons (Zigler, 1969), and in its broader 
form, in which universality of developmental 
sequence is held to be independent of individual 
subject characteristics such as organic impair- 
ment (see Kohlberg, 1969). 

There were very few exceptions to this. 
generalization: (a) Woodward (1959) found 
that among her profoundly retarded subjects 
1 sensorimotor task out of 11 proved to be 
slightly more difficult than another task that 
Piaget designated as being one sensorimotor 
stage higher. (b) Among Rogers's (1977) 
profoundly retarded subjects, a causality task 
designed to be at Stage 6 proved to be easier 
than one of the tasks at Stage 5, whereas 
individual spatiality tasks within each stag 
had to be combined to yield a highly reproduc? 
ible scale. (c) In Achenbach's (1973) familial 
and Down's syndrome retarded sample, sur- 
prise reactions indicative of color and number, 
identity occurred with equal frequency, where- 
as in his nonretarded sample, color surprise 
was significantly more frequent than number 
surprise. (d) Abel (1941) found no significant 
decline*with increasing MA in belief in im; 
manent punishment in her institutionalize 
female retarded sample, a finding that differs 
from some earlier research with nonretarded 
children. 

These four instances of no support for the 
similar sequence hypothesis are exceedingly 
minor. They may have resulted, as the authors 
of these studies generally suggested from 
idiosyncratic (or misinterpreted) properties of 
the tasks selected or from other measurement 
errors, and in some cases (e.g., Abel, 1941) 
they may reflect the suppressive influence of 
an idiosyncratic environment that merely 
delays the shift from one level of reasoning to 
another. Furthermore, in each of the four 
Studies, the findings of no support Mer s 
outnumbered by findings supporting the similar 
sequence hypothesis. | 

While noting the strong level of support that | 
the reviewed evidence has yielded for the | 
similar sequence hypothesis, we must | 
note that such cross-sectional and level-of- | 
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difficulty evidence, even at its best, can support 

| only indirect inference regarding the actual 
process of development. Considerably more 
direct and potent inference is possible when an 
investigator observes the same individuals at 
more than one point during the course of 
development, that is, in longitudinal fashion. 
Such research is often expensive and complex, 
and consequently it is relatively rare, partic- 
ularly with mentally retarded persons. How- 

evens three longitudinal studies have some 
bearing on the similar sequence hypothesis. 
We now turn to these. 


Longitudinal Evidence 


Development in the sensorimotor period. One 
longitudinal investigation was designed by 
Wohlhueter and Sindberg (1975) as an exten- 
sion of Woodward’s (1959) cross-sectional 
istudy of sensorimotor development in pro- 
foundly retarded persons (described earlier in 
this article). These investigators conducted 
monthly assessments of institutionalized 1-6- 
year-old profoundly, severely, and moderately 
retarded children (no IQs reported). The 
Piagetian object concept tasks used by 
Décarie (1965) were employed for 1 to 1} years 
or until a child performed at the highest of the 
10 substage levels for 2 consecutive monthly 
sessions. Of the principal sample of 49 children, 

20 had progressed to the highest substage level 

by the end of the study; of the remaining 29, 

10 showed a generally monotonic increase, and 

9 seemed to be at a plateau, with object 

concept levels the same for most of the 12 or 
more sessions. Thus, 39 of the 49 subjects 
showed patterns harmonious with the pattern 
of object concept stages posited by Piaget 
, (1955) and found by subsequent investigators 
,using nonretarded samples. For the remaining 
10 subjects, however, there was a variable 
developmental pattern in which substage 
levels appeared to rise and fall from session 
to session, ranging over as many as 3 or 4 
substages during the 12 or more sessions. 

In an effort to determine what character- 
istics might distinguish this group of atypical 
subjects, Wohlhueter and Sindberg, (1975) 
examined medical histories and clinical findings 
for their sample; the distinguishing feature of 
the variable group was that the majority of 
subjects “were found to have EEG abnormal- 
4 


843 


ities, especially. dysrhythmias or a history of 
seizures” (p. 516). This finding raises at 
least two possible interpretations with respect 
to the variable developmental pattern: (a) 
that individuals with brain anomalies asso- 
ciated with electroencephalogram (EEG) ab- 
normalities may show atypical sequences of 
development with respect to the object concept 
and (b) that behavioral and attentional 
abnormalities in individuals with anomalous 
EEG patterns make accurate assessment of 
object concept substages difficult. 

One other unusual pattern was noted by the 
investigators, namely, some children seemed 
to bypass or skip over some of the substages. 
This apparent skipping phenomenon has been 
noted in research with nonretarded children 
as well (see Uzgiris, Note 2). Although this 
is an interesting phenomenon, it is difficult 
to know how often skipped substages may 
actually have been traversed by the children 
in the intervals between experimental sessions. 
Furthermore, both the skipping phenomenon 
and the variability phenomenon might be 
better understood if we were able to rule out 
specific method effects, as could have been 
done if a nonretarded sample had been 
included in this study for comparative pur- 
poses. Nonetheless, the Wohlhueter-Sindberg 
investigation at least raises significant ques- 
tions about the validity of the similar sequence 
hypothesis with respect to certain substages 
in the development of the object concept. 

A longitudinal study reported by Cicchetti 
and Sroufe (1976), however, yields strong 
support for the similar sequence hypothesis 
within the sensorimotor period. The study 
focused on the relation between cognitive and 
affective development in home-reared Down's 
syndrome infants during the period from 
4-18 months of age. Sroufe and his colleagues 
(e.g., Sroufe & Wunsch, 1972) earlier demon- 
strated that among normal infants there is a 
developmental progression from mirth response 
to auditory and tactile stimulation that is 
physically intense or vigorous (e.g., tickling 
the baby's chin or saying “ Boom !") to mirth 
responses to social and visual stimulation 
that is progressively more subtle and complex 
(e.g., the sight of mother sucking on baby's 
bottle). At» monthly intervals the infants 
sampled by Cicchetti and Sroufe were pre- 
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sented with 15 auditory and tactile items of 
the intense or vigorous type and 15 social 
and visual items of the more subtle and 
complex type. As in the earlier research with 
normal infants, the Down's syndrome infants 
laughed earliest in response to the auditory 
and tactile items and latest in response to 
*the more cognitively complicated social and 
visual items" (Cicchetti & Sroufe, 1976, 
p. 923). The responses of the Down's infants, 
of course, came months later (in CA) than did 
the corresponding responses of normal infants. 

Smile responses, a more sensitive index of 
positive affect in the Down's syndrome sample, 
showed the same pattern and revealed even 
more clearly than laughter responses the 
developmental decline in positive affect aroused 
by simpler auditory and tactile items as the 
infants matured beyond 13 months. This 
inverted-U-shaped developmental pattern also 
resembles earlier findings with normal infants. 
Tn stressing the similarity of their findings with 
Down's infants to those with normal infants, 
Cicchetti and Sroufe (1976) pointed out that 
the laughter items were similarly ordered for 
both groups, “category by category and, in 
the main, item by item" (p. 923). Finally, to 
assess the merits of their claim that the 
affective responses they measured were closely 
related to cognitive development, Cicchetti 
and Sroufe calculated correlations of indices 
of affective expression (e.g., earliest laugh, 
total amount of smiling to all items, etc.) 
with the Bayley mental and motors scales and 
the Uzgiris-Hunt object permanence and 
operational causality scales. All 44 correlations 
were statistically significant. 

, Therefore, the Cicchetti-Sroufe investiga- 
tion, unlike the Wohlhueter-Sindberg study, 
provides uniform support for the liberal 
version of the similar sequence hypothesis 
in which the hypothesis is applied to all 
retarded children regardless of etiology. The 
Cicchetti-Sroufe research deserves special 
attention because of its unusually careful 
methology and its emphasis on the integrity 
of the developing infant. The research demon- 
strates a thoughtful means of assessing the 
Piagetian hypothesis that affective develop- 
ment and cognitive development are inter- 
dependent. In addition, it may help to point 
the way to tests of the similar sequence 
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hypothesis in behavioral domains other than 
cognitive development as it has traditionally | 
been construed. 

Research on the preoperational-concrele opera- 
tional transition—the Temple longitudinal study. 
The third longitudinal study reviewed here 
is by far the broadest in scope. In this ongoing 
investigation, Stephens and her colleagues 
(Stephens, 1974; Stephens, Mahaney, & 
McLaughlin, 1972; Stephens et al, 1974) 
have conducted biennial assessments of the. 
performance of retarded and nonretarded 
persons on a variety of Piagetian tasks. The 
sample included 75 retarded subjects (IQs of 
50-75) from special education classes and 
75 nonretarded subjects (IQs of 90-110) from 
the same Philadelphia schools. In the first 
wave of testing, the age range in both subject 
groups was 6-18 years. Results from the first 
two waves of testing have now been reported 
and are discussed here within two content 
categories: ) 

Moral judgment. The Temple battery in- 
cluded 11 measures designed to assess three 
aspects of moral judgment: (a) the relative 
weight assigned to intent versus consequences 
in judging the seriousness of a misdeed, (b) 
awareness of the injustice of punishing an 
entire group for the acts of only one or a few 
members, and (c) the ability to judge the | 
relative fairness of various types of punishment 
including retributive and reciprocal justice. 
To determine whether judgment along these 
three dimensions follows the same develop- 
mental course in the retarded as in the non- 
retarded, Mahaney and Stephens (1974) 
examined changes in scores on the 11 compo- 
nent measures over the 2-year period from | 
the first to the second wave of testing. They. 
found that on 1 of the intent-versus-conse- | 
quences measures the retarded group showed | 
a nonsignificant decline in score (i.e., they | 
made slightly less mature moral judgments) 
according to the scoring criteria adopted by | 
the authors) and that on 1 of the group 
punishment items nonretarded subjects showed 
a nonsignificant decline. On the 9 remaining 
items the direction of change was the same for 
both the retarded subjects and the nonretarded 
subjects; this similarity extended to 2 item* 
on which both groups showed significant деј 
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clines in score, raising questions about the 
scoring of these particular items. 

Inhelder (1943/1968), in a study described 
earlier, referred to certain “oscillations” in the 
reasoning of the retarded. It is of some interest 
to note that Mahaney and Stephens (who was 
the translator of the Inhelder book) reported 
oscillations, that is, instances when “the 
improvement which occurred in one area of 
moral judgment was not maintained when 

.. opinions were solicited on another, but similar, 
iustior" (Mahaney & Stephens, p. 137), 

in both retarded and nonretarded subjects. 

There is some indication that such oscillations 
may have been somewhat more frequent in 
the retarded group. One line of evidence that 
suggests this possibility comes from Mahaney 
and Stephens's analysis of change scores for 
three separate age levels within both the 
retarded and the nonretarded samples. Of the 

99 change scores reported for the nonretarded 

groups, 20 were increases (11 significant), 7 

were decreases (3 significant), and 2 showed 
| no change. Among the retarded subjects, 17 

change scores were increases (7 significant), 
11 were decreases (4 significant), and 1 
involved no change. However, when retarded 
comparison groups were staggered in order to 
broaden the age difference involved in age 
group comparisons, for example, by comparing 

"Phase 1 6-10-year-olds with Phase 2 12-16- 

year-olds, the only items showing a decrease 

with age were the two that showed a decrease 
in nonretarded subjects as well. Overall, the 

report by Mahaney and Stephens (1974) 

Suggests that although growth in moral 

judgment concepts among the retarded may 

be "torporific and sporadic" (p. 141), the 
direction of development is the same for both 
retarded and nonretarded persons.‘ 

\ Conservation, classification, symbolic imagery, 

and formal operations in the Temple study. 

The Temple investigation (Stephens, 1974; 

Stephens et al., 1972; Stephens et al., 1974) 

also included 29 measures of cognitive develop- 

Ment across four broad conceptual domains: 

(a) conservation (of substance, length, weight, 

continuous quantity, and volume, as well as 
| term-to-term correspondence), (b) logic classi- 

fication (class inclusion and class intersection, 
and relative thinking measured by the broth- 
ers-sisters and right-left tests), (c) operativity 
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and symbolic imagery (tests involving imag- 
ined rotations of objects through space, 
transferring from two to three dimensions, and 
changing one's perspective on a stimulus), 
and (d) combinatory logic (Piaget & Inhelder's 
1956, combination of liquids task). Explana- 
tions given by subjects on each of the 29 
items were scored on a 9-point scale that took 
into account, among other things, the degree 
to which the subject wavered between a correct 
and an incorrect answer and the degree to 
which reversibility was shown. Stephens and 
McLaughlin (1974) reported on changes in 
Scores on*the 29 measures over the 2-year 
period separating the two waves of testing. 
"They found that the nonretarded group showed 
improvement on all 29 measures, with 25 
statistically significant; the retarded group 
also improved on all 29 measures, with 26 
statistically significant. This finding indicates 
that the direction of development on these 
Piagetian reasoning tasks was similar in the 
retarded and nonretarded groups. In another 
report following the second wave of the 
Temple study (Stephens et al, 1972), the 
Piagetian reasoning tasks were rank ordered 
with respect to the MAs at which 50% of the 
subjects (in the retarded and nonretarded 
groups separately) made correct responses. 
As Stephens et al. indicated, the order of diffi- 
culty for both subject groups was generally con- 
sistent with previous findings that conservation 
of substance precedes conservation of weight 


4 Whether this apparent similarity in the direction of 
development is actually a function of an invariant, 
stagelike progression is thus far an open question 
because of the questionable nature of the measures 
themselves. In a critique of this portion of the Temple 
longitudinal study, Kohlberg (1974) maintained that 
Piagetian moral judgment measures used in the 
Temple study do not even warrant detailed longitudinal 
analysis, because 


Piaget himself does not consider that his moral 
judgment measures yield genuine stages, nor do they 
pair up with his logical stages in ways compatible 
with his current thinking about cognitive stages . . .. 
Empirical research confirms the fact that Piaget’s 
moral stage measures do not meet the criteria of 
structural stages which his logical stages do meet. 
(p. 142) 


This being the case, it is appropriate to be cautious 
about what one concludes with respect to the moral 
judgment portion of the Temple investigation. 
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and that conservation of weight precedes 
conservation of volume. In addition, although 
the ranks were rather crude because MA levels 
were listed in whole years and because not even 
the most ardent Piagetian would expect all 
29 items to form an orderly developmental 
scale, it is interesting to note that our own 
calculations yielded a Spearman rho of .634 
between the rank order given for the retarded 
group and that given for the nonretarded 
group. 
The preceding data are consistent with the 
similar sequence hypothesis, as far as they 
have been taken, but they соша ђе taken 
considerably further. With the one exception 
mentioned in the preceding paragraph, there 
has been no apparent effort thus far by the 
Temple investigators to check their findings 
against specific developmental stage sequences 
such as the horizontal decalages reported for 
nonretarded subjects in previous Piagetian 
research (Kohlberg & DeVries, 1971; Nassafat, 
1963; Siegelman & Block, 1969; Smedslund, 
1964; Uzgiris, 1968). Moreover, there has been 
a persistent inclination to report data in terms 
of group means, rather than in terms of the 
number of individuals (retarded and non- 
retarded) who show specific developmental 
patterns. This latter type of analysis is the 
unique province of longitudinal research and 
can only be approximated indirectly by scaling 
procedures in research of the nonlongitudinal 
variety. There is some indication (see Stephens, 
1974) that efforts to profile individual 
performance changes over time and to cross- 
validate specific vertical and horizontal de- 
calages will be forthcoming from the Temple 
investigators. Such efforts are needed if the 
investigators are to fully capitalize on the 
power of their longitudinal design. 


Status of Evidence on the Similar Sequence 
Hypothesis 


Only 1 of the 3 longitudinal studies reviewed 
—the Wohlhueter and Sindberg (1975) investi- 
gation of object concept substages—produced 
findings inconsistent with the similar sequence 
hypothesis. In that investigation a distinct 
subgroup of 10 (out of 49) children showed 
apparent atypical developmental sequences, 
and most of these children showed anomalous 
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EEG patterns. This finding may indicate that 
brain wave anomalies can be associated with 
atypical developmental patterns. Alterna- | 
tively, the EEG abnormalities may simply have 
been associated with attentional and other 
deficits that interfered with accurate assess- 
ment of substage levels in children whose actual 
development was consistent with the similar 
sequence hypothesis. The latter interpretation 
has special credence in the area of object 
concept assessment, in which procedures, 
demand that the subject sustain attention to 
an object long enough to seek after it once 
it has been removed from the preceptual field. 
Of the 28 nonlongitudinal studies reviewed, 
only 4 contained a finding inconsistent. with 
the similar sequence hypothesis, and in each 
of these studies the inconsistent finding was 
relatively minor and was alone among a 
number of findings supporting the hypothesis. 
Furthermore, the questions raised generally' 
concerned rather fine-grained steps or sub- 
stages within horizontal decalages on which 
studies of nonretarded subjects alone have 
not always agreed. 

"These facts, plus the measurement problems 
inherent in these experimental procedures, 
make the degree of consistency in the findings 
of these 31 studies rather striking. Positive 
findings have now been reported in conceptual | 
areas that include sensorimotor spatial con- 
cepts, object permanence, causality, imitation, 
affective responding, identity and equivalence 
conservation (of many properties), seriation, 
transitivity, moral reasoning, comparison proc- 
esses (or gross, intensive, and extensive 
quantities), time, space, relative thinking, 
role taking, mental imagery, geometric con- 
cepts, and classification and class inclusion. | 
For the 31 studies spanning this list of concep- | 
tual areas, the great preponderance of the 
evidence is consistent with the hypothesis 
that retarded and nonretarded persons traverse 
the same stages of development in the same 
order, differing only in the rate at which they 
progress and in the ultimate ceiling they 
attain. The hypothesis seems to be generally 
supported in studies of retarded individuals, 
regardless of etiology, with the possible 
exception of individuals suffering from pro- 
nounced EEG abnormalities. 


| 
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, Quality of the Evidence 


\ Having said this, we believe it is important 
to comment on the quality of the available 
evidence and to offer suggestions for improving 
it. Cross-sectional data relevant to the similar 
sequence hypothesis have most often been 
presented in ways that provide only the 
weakest inferential power. A table displaying 
the percentage of subjects at each age level 
who pass each Piagetian item can yield only 
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sequence compared with the information 
generated when each child is classified with 
respect to specific pass-fail patterns, that is, 
with respect to response scale types. When such 
a scaling analysis is combined with calculation 
of scalogram summary statistics (e.g., Green, 
1956; Guttman, 1950), the potential power 
of the nonlongitudinal design is more fully 
utilized. 

Similarly, the bulk of the longitudinal data 
we found was presented only in terms of mean 
difference between experimental groups or 
changes in group means or percentages over 
time. As Hunt (1974) has noted, reporting 
only group summary statistics at Time 1 and 
Time 2 can mask the fact that some individuals 
progressed while others regressed over time. 
Thus, although it is useful to know that the 
Time 1 and Time 2 means differed in the same 
direction for retarded and nonretarded groups, 
such information is no substitute for an 
analysis of the number of individuals in each 
group showing specific developmental patterns 
over time. In both longitudinal and non- 
longitudinal research aimed at testing the 
similar sequence hypothesis, it makes little 
sense to invest the time and energy necessary 
to gather potentially relevant data and then 

4 analyze the data in ways that fail to capitalize 
“on their full potential. 


Suggestions Toward Improved Research 


These problems, and others to which we 
referred earlier in the text, suggest three 
principles that if widely adopted would sub- 
stantially improve the quality of evidence on 
the similar sequence hypothesis. 


Structuring Direct Comparisons 
A problem with many of the studies reviewed 


| , In earlier sections is that their samples included 
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only mentally retarded subjects. In those few 
instances in which findings of these studies 
disagree with findings of other studies sampling 
only nonretarded subjects, the discrepancies 
are difficult to interpret. This is because of 
uncertainty over whether the discrepancies 
reflect actual process differences between the 
retarded and the nonretarded or whether the 
differences in experimental methodology across 
studies are responsible. An obvious way to 
prevent such difficulties is to expose retarded 
and nonretarded children within a similar 
cognitive developmental range to precisely 
the same procedure by including both groups in 
the same study. To fail to do so is to risk 
uninterpretable findings. 


Altending to Etiology 


The marked heterogeneity of many of the 
mentally retarded samples described earlier 
suggests a somewhat opportunistic approach 
to subject selection or perhaps an approach in 
which the etiology of retardation is simply not 
regarded as an important factor. Yet, theoret- 
ical considerations discussed early in this 
article (see also, Weisz, 1976; Zigler, 1969, 
1971) point to the need to give special attention 
to familial retarded children as opposed to 
those suffering from organic impairment or 
genetic disorder. Furthermore, Wohlhueter 
and Sindberg’s (1975) report of atypical 
development in a group of children with a 
high incidence of EEG anomalies suggests 
the potential importance of efforts to identify 
developmentally distinct subgroups within the 
nonfamilial population. Their analysis illus- 
trates that subgroup analyses can be useful 
even when they are post hoc. 


Promoting Uniformity 


Finally, there is a clear need for increased 
uniformity across studies in the kinds of 
statistical analyses carried out and in the 
way statistics are reported. Toward this end, 
we suggest that every cross-sectional study 
addressing the similar sequence hypothesis 
should yield data bearing on the following 
threefold question : 

1. Within the retarded and nonretarded 
groups do the task items form the same scale, 
and does this scale show high reproducibility 
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(à la Guttman, 1950) and a high index of 
consistency (à la Green, 1956)? 

2. Do mean scale scores increase with level 
of cognitive maturity in each separate subject 
group (see Kohlberg, 1969)? 

3. Do mean levels of success on each 

individual item increase with levels of cognitive 
maturity in each separate subject group (see 
Kohlberg, 1969)? 
In longitudinal research, Questions 2 and 3 
should also be asked in a way that only 
longitudinal investigation permits: Over the 
period spanned by the longitudinal study, in 
what percentage of individual subjects (from 
retarded and nonretarded groups) do scale 
scores and individual item scores (a) increase 
either smoothly or monotonically, (b) remain 
stable throughout, and (c) show at least 
some declines. 

"These recommended questions are designed 
to promote greater uniformity, and thus 
greater comparability, among studies address- 
ing the similar sequence hypothesis. In 
opposition to such uniformity, one might argue 
that the degree of consistency in the findings 
of the numerous studies reviewed here is all 
the more impressive precisely because of the 
methodological diversity of the studies. There 
is some truth to this argument, but only in 
those cases in which findings support the 
similar sequence hypothesis. However, we 
have argued that even in those cases apparent 
group similarities in developmental sequence 
may result from a failure to ask the most 
probing questions of one's data. It seems clear 
Írom our review that evidence from the 31 
studies currently available offers rather con- 
Sistent support for the similar sequence 
hypothesis; it also seems likely that the best 
evidence has yet to be gathered. 


Reference Notes 
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Stability of Aggressive Reaction Patterns in Males: A Review 
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Considered in the review are 16 studies on the stability of aggressive behavior 
and reaction patterns. There is great variation among the studies in sample 
composition, in definition of variables, in method of data collection, and in the 
ages and intervals studied. Generally, the size of a (disattenuated) stability 
coefficient tends to decrease linearly as the interval between the two times of 
measurement (T; — T) increases. Furthermore, the degree of stability can be 
broadly described as a positive linear function of the interval covered and the 
subject's age at the time of first measurement, expressed in the age ratio Т,/Т, 
The degree of stability that exists in the area of aggression was found to be 
quite substantial; it was, in fact, not much lower than the stability typically 
found in the domain of intelligence testing. Marked individual differences in 
habitual aggression level manifest themselves early in life, certainly by the age 
of 3. It was generally concluded that (a) the degree of longitudinal consistency 
in aggressive behavior patterns is much greater than has been maintained by 
proponents of a behavioral specificity position, and (b) important determinants 
of the observed longitudinal consistency are to be found in relatively stable, 
individual-differentiating reaction tendencies or motive systems (personality 


variables) within individuals. 


What has become known as the consist- 
ency issue has been the subject of recent 
lively discussions in the professional litera- 
ture of psychology. This issue, which has 
often been presented in an overly simplified 
way, concerns a whole complex of problems, 
and it is obvious that different forms of con- 
sistency can be conceived of (cf. Olweus, 
1974). In the recent discussions, the main 
emphasis has been on the question of cross- 
situational consistency (see Endler & Mag- 
nusson, 1976), which primarily concerns the 
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extent to which individuals in a group retain 
their relative positions on a certain dimen- 
sion or characteristic across various situa- 
tions, conditions, or sources of data at ap- 
proximately the same point in time. This 


issue seems to be far from settled, and in 4 


many ways the debate has been confusing, 
characterized by strong emotional reactions, 
stereotyped presentations of different theo- 
retical positions, and methodological mis- 
takes (for critical analyses, see, e.g., Block, 
1977; Epstein, 1977; Golding, 1975; Olweus, 
1977b). In the general debate, the issue of 
longitudinal consistency or stability—which 
concerns the extent to which individuals in 


a group retain their relative positions on 8 , 


certain dimension or characteristic (or simi- 
lar dimensions or characteristics) for mea- 
surements at different periods of time—has 
attracted much less attention. Mischel (1968, 
1969) however, dealt with the issue at some 
length and concluded that there is generally 
little longitudinal (as well as cross-situational) 
consistency in noncognitive personality di- 
mensions (with the exception of self-descrip- 
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tions on trait dimensions). But the empirical 


7 ‘material presented by Mischel in support of 
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his position appears both meagre and selec- 
tive (see also, Block, 1977). Considering the 
general importance of the question of longi- 
tudinal consistency for personality psychol- 
ogy, it seems valuable to take a closer look 
at the empirical evidence available. Here this 
is done for one particular area of research, 
aggression, which no doubt represents an im- 
portant behavior system in psychology. Ag- 
gression was included among the areas re- 
viewed by Mischel. 

Furthermore, the general usefulness of per- 
sonality concepts involving relatively endur- 
ing internal factors or properties of individ- 
uals has recently been questioned. For in- 
stance, Krasner and Ullmann (1973) tried 
to demonstrate that the concept of person- 
ality is superfluous as a descriptive or ex- 
planatory term. They wrote, “The more we 
know about antecedent, current, and conse- 
quent conditions, the less likely we are to 
use the concept of personality” (p. 489). 
Similarly, but from a partly different point 
of view, Shweder (1975) attempted to show 
that a personality concept in its individual 
difference sense is of little value and rele- 
vance. Also in consideration of these and 
similar views, it is important to examine the 
evidence on the stability or continuity of 
aggressive behavior over time. Such data no 
doubt will help to shed light on the question 
of the general utility of assuming a rela- 
tively enduring personality system or at least 
certain more stable subsystems within the in- 
dividual. 

The general purpose of this article is two- 
fold. One goal is to get a picture of the 
degree of stability obtaining in the area of 
aggression, as manifested in longitudinal 
studies of aggressive reaction patterns in 
males (with the exclusion of self-reports; see 
below). The main conclusions on this point 
are presented under the headings of Gen- 
eral Description of Results and More Spe- 
cific, Descriptive Conclusions. A second pur- 
pose is to interpret these stability data with 
a particular view to the possibility of con- 
sidering them as partly reflecting relatively 
stable, individual-differentiating reaction ten- 
dencies within individuals. Furthermore, the 
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empirical stability data and the suggested 
interpretations are briefly related to the 
views expressed by proponents of a behav- 
ioral specificity position such as Mischel 
(1968, 1969) and by critics of personality 
concepts such as Krasner and Ullmann 
(1973). This goal is pursued in the seçtions 
entitled Interpretation of the Stability Data 
and Conclusions. 

With regard to the second purpose, it 
should be pointed out that I am generally 
sympathetic to the view that personality 
variables in terms of relatively stable, in- 
dividual-differentiating reaction tendencies 
may be important (but by no means the 
sole) determinants of an individual’s aggres- 
sive behavior (see, e.g., Olweus, 1969, 1973b, 
1978). This point of departure is not likely 
to have influenced the present data gather- 
ing and data analysis procedure. However, 
jt seems fair to state at the outset that the 
analysis in the interpretative section, which 
leaves somewhat greater room for judgment, 
has been approached from this perspective. 


Coverage of Review 


This article is à shortened version of a 
review presented elsewhere (Olweus, Note 
1). In the more complete report relatively 
detailed descriptions of all studies considered 
are given. Because of space limitations, such 
descriptions are provided here for only three 
reports—one with preschool subjects, the sec- 
ond with subjects of school age (two studies 
in one article), and the third with adults. 
However, summary data on method of ob- 
servation, age and number of subjects, in- 
terval beteen measurements, and so on are 
presented in Table 1 for all studies reviewed." 

The focus of the present review is on 
longitudinal studies of aggressive behavior 
and reaction patterns, as observed or inferred 
by individuals other than the subjects them- 
selves. It is important to note that the pres- 
ent review thus does not include studies 
centering on the stability of self-descriptions 


1 Readers who are interested in detailed informa- 
tion on the studies not described in the present 
article can obtain a copy of the more complete 
report by writing to the author. 
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on personality questionnaires, self-ratings on 
trait scales, and similar self-report devices. 
The stability of this kind of data has often 
been considered quite substantial (although 
the self-descriptions are often considered 
tenuously related to the actual behavior to 
which they refer; see, e.g., Mischel, 1968, 
1969). Some additional comments on the 
stated guidelines for inclusion in the review 
are in order. 

In most cases the term aggressive was used 
by the original investigator in his or her 
specification of the variable under study. By 
and large, these specifications seem to be in 
accordance with the definition of an aggres- 
sive response, given elsewhere (Olweus, 
1973b) as 


any act or behavior that involves, might involve, 
and/or to some extent can be considered as aiming 
at, the infliction of injury or discomfort; also mani- 
festations of inner reactions such as feelings or 
thoughts that can be considered to have such an 
aim are regarded as aggressive responses. (p. 270) 


Although most studies reviewed concern data 
on aggressive interpersonal behavior, some 
variables deal primarily with aggressive re- 
activity (e.g., "over-reactive to minor frus- 
trations; irritable"; Block, 1971). Occasion- 
ally an author has assigned a variable to the 
aggressive behavior system that is not in- 
cluded here, This is the case with variables 
such as competitiveness, dominance, and re- 
pression of aggressive thoughts, which are 
more indirect manifestations of aggressive 
(and other) tendencies or may be assumed 
to reflect conflict over, or inhibitions against, 
aggressive tendencies rather than the aggres- 
sive tendencies themselves. 

As mentioned, the stated guidelines also 
imply that studies concerned with the longi- 
tudinal stability of inventory responses, self- 
ratings, and so on are not considered in the 
review. In one of the included studies 
(Block, 1971), however, such self-report data 
were used in combination with other sets 
of data as a basis for clinical ratings. Longi- 
tudinal studies in which projective instru- 
ments constituted the only source of data 
Gt Such studies exist in the aggressive-mo- 
tive area) are also excluded from considera- 
tion. The exclusion of stability data derived 
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from the latter two data sources makes 
material for review somewhat more h 
geneous. This, however, does not preclu 
substantial variation in a number of respect 
among the studies considered, as becomes 
evident. 

A further criterion for inclusion in 
review is that the degree of stability of the 
data has been expressed in the form of a com 
relation coefficient or that such a coeffi 
can be derived from the reported data in a 
meaningful way. Accordingly, studies 
which only significance tests between dis- 
crete groups have been presented are ex 


on the degree of stability/change in behavior 
rather than on the direction of possible 


changes, no attention is paid to differences in 
mean levels between different periods 


able in the reports. 
It has been possible to locate 16 studies 
comprising 16 independent male or mixe 
samples of subjects for which stability da 
have been collected. (The 4 studies on mix 
nursery school groups are considered here 
consist of 1 sample each.) These 16 studi 
have been described in 14 publications. 
of the studies is English (Farrington, 19 
two are based on Swedish samples (Olwi 
1977a), and the rest used American sil 
jects.” In some studies, however, the same 
sample was employed for the determination 
of several stability correlations covering dif 
ferent periods of time, A total of 24 stability 
coefficients are available. 4 
In a limited number of studies (six) there 
was also an independent female sample from 
which stability data were collected. The fe" 
male samples are not considered in the pres- 1 
ent context (except for a brief mention in 
Footnote 6). | 
The discussion of the criteria for exclus 
sion of studies from this review may give 
the impression that there exist a large num- 
ber of studies on the stability of aggressive 
behavior and reactions that have not beei 


? It should be mentioned that the search of litera: 
ture was mainly restricted to books and professional 
journals written in English. 
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| ,considered. It is important to emphasize that 
4 this is definitely not the case. 


Correction for Attenuation 


As is well-known, a stability correlation 
between two sets of measurements on a par- 
ticular variable is systematically lowered 
(attenuated) as the result of errors of mea- 
surement. Accordingly, the correlation be- 
tween true or error-free Scores on the same 
ariable is higher than that between fallible 
scores. If the reliabilities of the two sets of 
measurements are known, the formula for 
the correction for attenuation can be used to 
compute the disattenuated correlation, that 
is, an estimate of the correlation between 
corresponding true or perfectly reliable 
scores. The attenuation formula is as follows: 
Та == Fay/ (ros * ryy)), where та is the corre- 

) lation between true scores of x and y, fey is 
the obtained correlation between x and y, 
and rz, and ry, are the reliability coefficients 
of x and y (see, e.g., Lord & Novick, 1968). 
It should be noted that the sampling error 
of coefficients corrected for attenuation is 
greater than that of uncorrected coefficients 
of the same size (Thouless, 1939). Accord- 
ingly, it is reasonable to expect disattenuated 

М, coefficients to occasionally exceed unity as a 
function of sampling fluctuations, particu- 
larly if the size of the sample is relatively 
Small. Correlation coefficients corrected for 
attenuation should always be considered only 
approximate in character. 

In spite of the latter circumstances, dis- 
attenuated coefficients can be very useful and 
under certain conditions are the most ap- 

| propriate measures to employ. If for in- 

june a number of stability correlations are 
to be compared and they are based on data 
of varying reliability, attenuation correction 
will make the coefficients more directly com- 
parable. Furthermore, if the researcher's pri- 
Mary interest is in the relationship between 
the true rather than the obtained scores, 
lor example, in the stability of the under- 
lying function(s) in contrast with the actual 
Predictive power of the fallible measure- 
ments, a disattenuated coefficient is the cor- 
тесі measure to use (see, e.g., Block, 1963, 
,1971; Lord & Novick, 1968; Thouless, 
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1939). For both these reasons, attenuation- 
corrected coefficients are reported in the fol- 
lowing review in addition to the uncorrected 
raw correlations. Most of the theoretical 
analyses and conclusions are based on dis- 
attenuated coefficients. 

Relatively little has been written in the 
psychometric literature with respect to the 
type of reliability coefficient to be used in 
the denominator of the correction formula. 
However, the general principle to follow 
seems clear and is stated as follows by Lord 
and Novick (1968): 


If we are to use the correction for attenuation 
precisely, we must use it only in conjunction with 
an experimental design that assures that essentially 
no more or less error variation is introduced into 
the estimate of rz: and ry, than is introduced into 
the estimate of rz, [notation changed in accordance 
with usage here]. (p. 138) 


Since many sources contribute to varia- 
tions among observations (Lord & Novick, 
1968, p. 140), the application of the stated 
principle to a concrete situation is not always 
simple and straightforward. In general, the 
most serious error in the present context 
would occur if the disattenuated coefficients 
were overcorrected or inflated as a result of 
too low, deflated reliability estimates. In 
some of the studies reported here, several 
reliability estimates were available for pos- 
sible use in the attenuation formula. In addi- 
tion, some studies presented either incom- 
plete reliability information or none at all, 
and so “best guesses” about the reliability 
estimates had to be made. In such decisions 
care was taken to settle on reliability esti- 
mates that seemed too high rather than too 
low, in order to avoid the risk of having in- 
flated, disattenuated coefficients. Exact in- 
formation about the reliability of the mea- 
sures was not available in some studies, of 
course, resulting in a somewhat greater un- 
certainty as to the correct size of the disat- 
tenuated coefficients in these studies. It 
should be noted, however, that the guesses 
made were of an informed character, as re- 
liability information from similar studies was 
used. The number of stability coefficients for 
which best guesses about reliability estimates 
were made amounted to 5 out of 24 (Em- 
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merich, 1966; Farrington, 1978; Jersild & 
Markey, 1935; Martin, 1964; Patterson, 
Littman, & Bricker, 1967). In all probabil- 
ity, the potential error introduced by this 
procedure is not great and in any event, be- 
cause of the strategy adopted, is not likely 
to have resulted in inflated coefficients. 


Aspects of Studies to Be Considered 
in the Review 


To arrive at relatively specific conclusions, 
it was considered necessary to present the 
studies under review in some detail. A de- 
scription of the samples and of the proce- 
dures employed in collecting the data is 
provided, including an at least approximate 
indication of the definition of the variables 
studied or of possible interest. In the few 
cases when a variable of potential relevance 
has been excluded from the review, the rea- 
sons for the decision are given (in the un- 
abridged report, Olweus, Note 1). The ages 
of the subjects at the time of the first (71) 
and later (Т, Ts, etc.) measurements аге 
also reported. In some situations, however, a 
particular problem arises with respect to the 
subjects’ age. This occurs when the material 
on which the judges’ assessments were based 
does not refer to a specific point in time but 
covers a period of several years. To relate 
the size of the stability coefficients to the 
interval between the two times of measure- 
ments (or more precisely, the times to which 
the measurements refer), the “exact” age of 
the subjects must be determined. In estab- 
lishing a rule for the determination of the 
exact age in such cases the following line 
of reasoning was applied. It was assumed 
that a judge who was to assess, for example, 
archival material or a retrospective interview 
embracing several years, would have given 
relatively more weight to information per- 
taining to the end of the period covered. Ac- 
cordingly, it seemed reasonable to fix the 
exact age as 1 year minus the subject’s age 
at the end of the period in question. For ex- 
ample, when the archival material in a par- 
ticular study referred to the period between 
ages 3 and 6, the exact age of the subjects 
was taken to be 5 years. The stated rule 
was applied in 4 out of 16 studies, or to 9 
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of the 24 periods for which stability corre. | 
lations were reported. t 

Furthermore, the studies reviewed were 
scrutinized for information regarding envi- 
ronmental changes during the interval from 
the first to the later time of measurement, 
Such data are of relevance when it comes 
to interpreting the stability results obtained. 
Although information on this point is seldom 
detailed or individualized, it may give a 
rough idea of the degree of environ E 
change characterizing the periods covered. 

The reliabilities of the measures or best 
estimates of such measures are, of course, 
presented in addition to obtained and atten- 
uation-corrected stability coefficients. 

In two of the studies (Block, 1971; Kagan 
& Moss, 1962) it was natural and possible 
to compute stability correlations for com-. 
posite variables, consisting of two compo-y 
nents. This was done by means of Spear- 4 
man’s (1913) formula for the correlation of 
sums (see Olweus, Note 1). 

It should be noted that these composites 
represent aggression variables of greater 
scope and generality than their components. 
No doubt it is of considerable theoretical 
interest to assess the stability of such more 
generalized variables. They may be of par- 
ticular value and relevance if the interval ^ 
separating the two times of measurement 15 
long and there is some uncertainty about the 
conceptual equivalence of the different vari- 
ables being used at the different periods of 
time. 

The review briefly presents data, where 
available, about relationships between the 
aggression variables studied and information 
from other, independent modes of measure 
ment. Although incomplete, this information у 
provides some idea of the concurrent, pre- 
dictive, and construct validity of the data 
and concepts under study. By this procedure, 
the reader obtains a better basis for evaluat- 
ing the adequacy of the conclusions reached. 
The procedure also furnishes some evidence 
on the degree of cross-situational consistency, 
in the sense of the amount of correspond- 
ence between aggression data from different, 
independent sources or modes of measure 
ment. " 


DET 


STABILITY OF AGGRESSION 


p Description of Selected Studies 
Jersild and Markey (1935) 


In this early study the behavior of each 
of 54 children aged 2—4 years (M age= 
3 years) was recorded during 10 distributed 
15-minute periods of free play. The children, 
30 boys and 24 girls, were enrolled in three 
different nursery school groups. Approxi- 
mately 9 months later 24 of these children 

«were observed again with the same method 
and during the same number of sessions. The 
subjects were still divided in three different 
groups, a total of some 54 children. Rela- 
tively marked changes in the peer group 
composition took place from the first to the 
second time of observation, particularly for 
two of the groups (B and C). 

The reliability of the observations, as a 

4measure of the adequacy of the sampling of 

¿ behavior (generalizability across sessions 
and, to some extent, observers), was deter- 
mined in several different ways. For the vari- 
able of primary interest in the present con- 
text, frequency of being the aggressor, the 
Spearman-Brown corrected coefficients varied 
between .31 and .90 at the first period of 
Observation, although the majority of them 

LES above .70. At the second period, the 
"two stepped-up coefficients reported were .81 
and .55. As a best estimate of the reliability 
of the observations, a value of .80 was used 
for both periods of time and was inserted in 
the formula for attenuation correction of the 
Stability correlation. It can also be men- 
tioned that the percentage of agreement be- 
tween independent observers of the same be- 
havior sessions was generally high, amount- 

; Ing to 96% in the case of frequency of be- 
ing the aggressor. Five different observers 
collected data in this study, although one of 
them obtained more than half of all the 
records, 

The rank-order correlation (rho) for sta- 
bility over the 9-month interval was .70 
when calculated for all 24 children. When 
correlations were determined separately for 
the different groups (» varying from 7 to 
15), the coefficients were even higher (.71- 
88). After correction for attenuation the 
| bility correlation of .70 amounted to .88. 
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Teacher ratings on а selection of the be- 
havior categories employed were also secured 
for 35 of the children, permitting a study 
of the relationship between the behavioral 
Observation variables and independent rating 
data. Although the ratings and the behavior 
Observations were separated by an interval 
of several months, the average correlation 
(for two groups) was .54 for frequency of 
being the aggressor. The corresponding cor- 
relation for the related variable, frequency 
of physical acts of combat, was .63. If these 
two variables were combined into a more 
general composite aggression variable and a 
similar composite measure were formed for 
the behavioral observations, the correlation 
very likely would amount to .75 or more (all 
the data necessary for carrying out the cal- 
culations were not available in the report). 


Olweus (1977a) 


Two short-term longitudinal studies con- 
cerning a 1-уеаг and a 3-year interval, re- 
spectively, were conducted by  Olweus 
(1977a) on two samples of Swedish adoles- 
cent boys. In both studies the same two 7- 
point peer-rating scales were used; they 
concerned unprovoked physical aggression 
against peers (“He starts fights with other 
boys at school,” abbreviated start fights) 
and verbal aggression against a mildly criti- 
cizing teacher (“When a teacher criticizes 
him, he tends to answer back and protest,” 
abbreviated verbal protest). Each boy who 
served as a rater assessed all the boys in his 
class by placing cards with the names of his 
classmates below the points of the scale that 
referred to different frequencies of occurrence 
(from very seldom to very often). The rating 
procedure was individually administered. 

In Study 1 the number of raters in each 
class was three on both occasions, at Grades 
6 and 7. In the second study the number of 
raters in different classes varied somewhat, 
the average number being four at Grade 6 
and five at Grade 9. In general, the raters 
were chosen on the basis of random selection 
from each class. Approximately a third of 
the raters on the second occasion had also 
served as raters in Grade 6. To examine if 
memory effects affected the ratings of iden- 
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tical raters, a special analysis was conducted 
in Study 1. This analysis gave no evidence 
that partial use of the same raters on both 
occasions inflated stability correlations. Also, 
the disattenuated coefficients for completely 
independent rater groups at Grades 6 and 
7 were almost identical with the results for 
rater groups who had one rater in common 
(see Olweus, 1977a, Footnote 3). Since no 
memory effects were detected for a 1-year 
interval, such effects can safely be disre- 
garded in Study 2, which covered a 3-year 
interval. 

The subjects of the first study consisted 
of 85 boys from 7 classes who were rated at 
the end of Grade 6, when their median age 
was 13 years, and also 1 year later. In this 
study only small changes in the composition 
of the peer groups took place between Grade 
6 and Grade 7. All classes, however, had new 
teachers at Grade 7. Study 2 comprised 201 
boys from 18 classes who were rated at the 
end of Grade 6 and also 3 years later at the 
end of Grade 9, when their median age was 
16 years, The subjects constituted roughly 
75% of the whole population of school boys 
in the community at these grades. They 
represented a good deal of variation with 
respect to socioeconomic factors (relatively 
representative of greater Stockholm). At 
Grade 6, the 18 classes comprised a total of 
214 boys. Three years later, 13 of these 
boys has disappeared from these schools and 
27 new boys had entered the classes, The 
new boys constituted roughly 12% of the 
boys then in the classes at Grade 9. Two 
classes mentioned later underwent marked 
changes in the composition of the peer group. 
Furthermore, all classes had new teachers 
at Grade 9, and 11 of the classes had moved 
to other school buildings, A certain amount 
of environmental change thus occurred for 
this sample from Grade 6 to Grade 9. 

The Spearman-Brown corrected reliability 
coefficients were estimated as .80 for start 
fights and as .82 for verbal protest in Study 
1 In the second study the corresponding co- 
efficients were .83 and :86, respectively, for 
average ratings of four raters (Grade 6) and 
86 and .88, respectively, for five raters 
(Grade 9). In Table 1, average reliability 
estimates for the two variables were given, 
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The stability correlations in Study 1 
.81 for start fights and .79 for verbal 
test. After correction for attenuation 
coefficients amounted to 1.01 (rounded | 
1.00) and .96, respectively. In Study 2, coy 
ering a 3-year interval, the uncorrected sta. 
bility correlations were .65 and .70, reg 
tively. The disattenuated coefficients were ў 
and .81, respectively. It should be note 
that to make the ratings from different 
classes more comparable, the average ratings , 
were converted within each class to standard 


can also be mentioned that scatterplots of 
the stability correlations in both studies 
Showed the relationships to be regular and 
clearly linear in form. | 

As mentioned, two classes in the second 
study are of particular interest from а sta- 
bility/change point of view. In one of thes e, g 
the original Grade-6 class, consisting of 10 
boys, was split into two at the beginning of 
Grade 8 (5 boys were transferred to ano her 
class for unknown reasons). At the second 
period of rating, the original class had bee 
augmented by 8 new boys (with no previ 
connections with one another), which thus 
Tepresented a very marked change in the 
composition of the class. This change note! 
withstanding, the stability correlations for 
the original 5 boys were very high and wen 
in fact, even higher than the correspond 
correlations for the total sample. Also, tht 
transfer of the 5 boys to a new class сод: 
sisting of 9 boys did not seem to reduce the 
stability of behavior of the latter: Thé 
across-time correlations for the core of 9 boys 
in this class were for both variables higher h 
than the coefficients for the whole sample W) 
(for details, see Olweus, 1977a). 4 

It should also be mentioned that change) 
of school did not seem to appreciably affect 
the degree of stability over time. There меё 
small and inconsistent differences between 
the across-time correlations for the 11 classes 
who moved to other school buildings and 
the 7 classes who did not move. 

As regards relationship to other data, і 
Can be reported that peer ratings on S ам 
fights and verbal protest were used as СШ | 
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terion variables for two factorially derived 
inventory scales of a newly developed multi- 
faceted aggression inventory for boys (Ol- 
weus, 1973b; Olweus, Note 2). In several 
independent samples, substantial correlations 
have been obtained between these two scales, 
the Physical Aggression scale and the Verbal 
Aggression scale, and peer ratings of the 
overt aggressive behavior. For instance, the 
average correlation in two samples of boys 
4, (n=98 and 86, respectively) between the 
“Verbal Aggression scale and verbal protest 
was .49 (.63 after ‘correction for attenua- 
tion; see Olweus, Note 2, p. 39). Correla- 
tions of approximately the same magnitude 
(.45) were obtained between the Physical 
Aggression scale and its natural counterpart, 
start fights (the correlation was .58 after 
correction for attenuation). Both of the in- 
, ventory scales obtained clearly higher corre- 
lations with their matching than with their 
nonmatching rating dimensions, thereby giv- 
ing evidence of discriminant validity. Still 
higher correlations were found when these 
two scales were linearly combined to predict 
the general aggressive behavior dimension: 
start fights(Z) + verbal protest(Z). The 
mean of the coefficients for the two samples 
mentioned amounted to .53, the highest 
à value being .58 (.67 after correction for at- 
tenuation). 

The peer-rating data in Study 2 were col- 
lected within the framework of a large-scale 
Project concerning bully and whipping boy 
problems in the school that has been de- 
scribed in detail elsewhere (Olweus, 1973a, 
1974, 1978). In this context, the form master 
or form mistress of each class was requested 
to nominate possible bullies and whipping 

| boys according to specific criteria. When 
" these teacher nominations were related to 
independent peer ratings, a very convincing 
picture emerged. The bullies (21 boys from 
the entire Grade-6 population) were rated 
as much more aggressive, both physically 
and verbally, than randomly selected control 
boys (60 boys) and the whipping boys (21 
boys). For start fights and verbal protest, 
F(2, 99) = 24.30, p< .0001, «=.56, and 
F = 32.39, p<.0001, «= .62, respectively. 
| Essentially the same findings were obtained 


| 
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in two additional, somewhat smaller samples 
of boys. These results are clearly consistent 
with theoretical expectations. It can thus be 
concluded that a substantial degree of cor- 
respondence has been demonstrated between 
peer ratings on one hand and teacher nomi- 
nations and self-report (inventory) data on 
the other. 

In the article by Olweus (1977a), the 
rating data (four different variables) were 
also scrutinized for the possible presence of 
rater biases, irrelevant method variance, and 
so on. An adaptation of multimethod—multi- 
variable analysis showed strong evidence for 
discriminant validity according to all three 
criteria proposed by Campbell and Fiske 
(1959). All in all, it was concluded that the 
results obtained consistently attested to the 
validity and general adequacy of the ratings 
employed in these studies. 


Block (1971) 


The subjects of this study, discussed in 
Lives Through Time (Block, 1971) were 84 
adolescent boys and men and 87 girls and 
women, participants in the well-known Oak- 
land growth (Jones, 1938) and Berkeley 
guidance (Macfarlane, 1938) longitudinal 
studies. By means of the Q-sort method the 
subjects were assessed for three different 
periods of time: the junior high school years, 
the senior high school years, and when they 
were in their middle 30s (the average age 
was approximately 34 years). For the two 
adolescent periods extensive archival data 
such as school grades, comments and ratings 
by teachers, ratings of social or interview 
behavior by staff members, performance on 
intelligence and projective tests, self-reports 
of areas of agreement or disagreement with 
parents, and so on were available. From 
this material, for each subject “case assem- 
blies? were developed separately for the 
junior high school and senior high school 
periods. The information collected during 


8In Olweus (19735, Table 3, p. 312), the cor- 
relation between verbal aggression(Z) + physical 
aggression(Z) and start fights(Z) + verbal pro- 
test(Z) in Sample B was erroneously printed as 
42. The correct value is .47. 
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intensive interviews (an average length of 
12 hours) when the subjects were in their 
mid-30s constituted the third data set. It 
should be emphasized that these three data 
sets were completely independent. 

The material for a particular subject at a 
particular period was assessed (as a rule) by 
three clinical psychologists, each functioning 
independently. No psychologist evaluated a 
subject at more than one age, and to mini- 
mize the influence of judge bjases, the psy- 
chologists were assigned to cases in sys- 
tematically permuted combinations. The 
judges expressed their characterizations of 
each subject by means of the California Q- 
set procedure (Block, 1961, 1971). The Cali- 
fornia Q set consists of some 100 items or 
variables for psychodynamic descriptions of 
personality that are sorted by the judge into 
a forced-choice (approximately normal) dis- 
tribution. In the Block (1971) study the 
three independent CQ sorts for each subject 
were averaged. Only the two variables most 
directly concerned with aggressive reactions 
and behavior are considered in the present 
context. These variables are Variable 34 
(“over-reactive to minor frustrations; irri- 
table") and Variable 62 (“tends to be re- 
bellious and nonconforming”; see Appendix 
A in Block, 1971; certain other variables, 
ед. Variables 38 and 94, are of some, 
though less direct relevance). In accordance 
with the previously stated rule, the “exact” 
age of the subjects is taken to be 12 years 
for the junior high school (JHS) period, 15 
years for the senior high school (SHS) pe- 
tiod, and 33 years at the time of adult fol- 
low-up. 

Some social data and life events of po- 
tential significance from a stability /change 
point of view can be briefly mentioned. These 
data, however, are not individual and spe- 
cific but broadly characterize the subjects 
as a group. The families of the subjects came 
from predominantly middle and upper classes 
(mainly from Classes 1-4 on a 6-step scale). 
The majority of the subjects (81%) ex- 
Perienced intact families, with both the 
mother and the father Present through ado- 
lescence, On the whole, the sample appears 
Tepresentative of the stable, relatively pros- 
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perous Berkeley community at the time 
the study. The social status of the subj 
as adults was similar to that of their p 
ents, but the overrepresentation of the fj 
highest social classes was more pronoum 
in the subject group. By the time of th 
adult interview, 95% of the subjects ha 
been married and 19% had been divore 
The majority had also become parents, wil 
an average production of 2.5 children 
Roughly half the subjects had served 
the armed forces during World War II; 
other half were adolescents during this pi 
riod. Although the Depression hit the fami 
lies of the subjects when the subjects 
relatively young (at the time of or before 
junior high school period), this crisis 
likely to have been of significance for the per 
sonality development of certain subject 
Particularly for the older subjects (from 
Oakland growth study), the occurrence 
the Depression may have reduced the d 
of stability of some personality charac! 
istics from the high school years to the 
of adult assessment. All in all, these dat 
suggest that “although typicality cannot p 
claimed for the subjects, they nevertheles 
have lived recognizable American lives 
(Block, 1971, p. 24) and have experienc 
the adaptational tasks that face most adi 
persons in connection with marriage, ра! 
hood, and occupational career. In sum, 
seems reasonable to assume that in the liv 
of the subjects under study, there was a g0 
deal of environmental pressure for chan 
during the 20 ог so years from high 5000 
to the time of the adult follow-up. 

For the male sample, the stepped-up in 
terrater reliabilities (Block, 1971, Appendix 
G) were .49 (JHS), .68 (SHS), and * T 
(adult) for Variable 34 and .82 (JHS), 802 
(SHS), and .78 (adult) for Variable 62. The 
raw stability correlations for Variable 
(“over-reactive . . >) were .45 for the JHS 
SHS period (an interval of 3 years) and .2? 
for the SHS-adult period (an interval 0 
18 years; the results for the ]HS-adul 
period were not given in Block, 1971). Afte 
Correction for attenuation, these coefficien! 
became .78 and .40, respectively. For Уай 
able 62 ("tends to be rebellious . . ."), @ 
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„stability correlations for the JHS-SHS and 
SHS-adult periods were .58 and .29, respec- 
tively, and after attenuation correction were 
72 and .37, respectively. 

To calculate the stability correlations for 
the (unweighted) composite of the stan- 
dardized variables, 34(Z) +62(Z), some 
additional information was needed (see Ol- 
weus, Note 1). By means of the formula 
for the correlation of two unweighted com- 

« posites previously mentioned, the uncorrected 
‘stability correlation for the composite vari- 
able, 34(Z) + 62(Z), was found to be .54 
for the JHS-SHS and .44 for the SHS-adult 
periods. The corresponding disattenuated co- 
efficients were .69 and .53, respectively. 

The values for the composite variables are 
reported in Table 1. The reliabilities in this 
table for the Block (1971) study are average 
,values for Variable 34 and Variable 62 (e.g., 
the reliability estimate of .66 for JHS is the 
average of .49 and .82). 

It should be noted that in the present 
study, great care was taken to secure ade- 
quate and valid information by using inde- 
pendent data sets and sophisticated judges 
working independently in permuted combi- 
nations and by a number of additional checks 
on potential artifacts such as rater biases 

č and stereotypes. The longitudinal data were 
also analyzed separately for a number of 
homogeneous personality types derived via 
inverse factor analysis. For information on 
these results, the reader should consult 
Block (1971). 


Summary Data on Studies Reviewed 


from examining Table 1 and Figure 1. In 
Table 1 the studies are divided into two 
groups on the basis of the age of the sub- 
jects at the time of first measurement (71). 
Within each group the studies are ordered 
according to the length of the interval sepa- 
rating the two times of measurement (72 — 
Tı). Figure 1 presents in diagrammatic form 
the disattenuated coefficients as a joint func- 
tion of the subject’s age at the time of first 
Measurement and the interval between the 
two times of measurement. It should be ob- 
served that because of space considerations, 


An overview of the studies can be gained 
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the reference axes are broken and the units 
of measurement are different below and 
above the breaks. 

As mentioned, the number of studies with 
independent samples of subjects was 16. In 
4 studies (Block, 1971; Eron, Huesmann, 
Lefkowitz, & Walder, 1972; Kagan & Moss, 
1962; Kohn & Rosman, 1972) the same or 
partly the same subjects were used in the 
determination of more than one stability 
correlation (but for different intervals); a 
total of 24 stability coefficients are reported 
in Table 1 and Figure 1. The average raw 
and disattenuated correlations were .63 and 
.79, respectively, for the 12 studies with only 
1 coefficient per sample, as compared with 
average values of .55 and .68, respectively, 
when all 24 stability determinations were 
used. Although these two materials are not 
equivalent with regard to the interval cov- 
ered and so on, the use of several stability 
coefficients from a limited number of studies 
does not seem to have resulted in an over- 
representation of high coefficients. In the 
following discussion the focus is on the char- 
acteristics of and the results for the total 
material. 

The average number of subjects on which 
the stability coefficients were based amounted 
to 116. The age of the subjects at the time 
of first measurement varied from 2 to 18 
years, and the subjects were followed for 
intervals varying from half a year to 21 
years. The average interval covered was 5.7 
years. The highest (average) age of a sub- 
ject group at the time of follow-up assess- 
ment was 33 years. 

As is evident from Figure 1, there has 
been a concentration on short-term longi- 
tudinal studies covering intervals up to 1j 
years for subjects below school age. Only two 
stability coefficients are reported for inter- 
vals greater than 1j years, and these coef- 
ficients are based on the same, relatively 
small sample (№ = 36). For subjects of 
higher ages, the intervals covered are more 
evenly distributed. 

The methods of data collection or integra- 
tion were quite varied (see Table 1). In 
three studies using nursery school groups, 
the behavior of the subjects was directly 
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Figure 1. Summary of disattenuated stability correlations for different intervals in years (Та — Та) 
and different ages at time of first measurement (73). 


Observed. Seven stability coefficients were 
based on teacher ratings, seven on clinical 
ratings, two on peer ratings, and five on 
peer nominations. 

The variables assessed refer to reactions 
and behaviors of the subjects in their natu- 
ral settings, such as nursery school and 
elementary school. These settings, in which 
the subjects spent a considerable part of 
their days at the time of the investigation, 
can be assumed to represent important sec- 
tors of the subjects’ lives. The studies by 
Block (1971), Kagan and Moss (1962), 
and Tuddenham (1959), which were partly 
based on archival material collected over 
several years, provide an unusually broad 
basis for assessment. In these studies, clinical 
ratings were used to integrate the rich and 
diverse material. 

Overall, these data suggest a good deal of 
variation among the studies in a number of 
important respects. Although a better cover- 


age of certain age periods and intervals 
would have been desirable, the number of 
studies is considerable and the average sid 
of the samples quite respectable according 
to usual standards in psychological resear 


Some Methodological Comments 
Potential Correlation-Increasing Factors 


In four studies using mixed nurse 
groups (Emmerich, 1966; Jersild & Marke 
1935; Martin, 1964; Patterson et al., 1967); 
the stability correlations were not computed 
separately for boys and girls. On statistica 
grounds this procedure might be expected 
lead to somewhat higher coefficients thal 
if the data for the boys had been analy: 4 
separately. On the other hand, in the Block) 
Block, and Harrington (1974) and Koh? 
and Rosman (1972) studies, small and neg 
ligible differences were found when the sta 
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(bility correlations were obtained separately 
for boys and girls and for the sexes com- 
bined. The children in these studies were 
also 3-4-year-olds. The results of the latter 
two studies thus suggest that combining the 
sexes in calculating a stability correlation 
for children of these ages may not have great 
effect on the size of the correlation co- 
efficient. 
Furthermore, there was consanguinity 
« within the sample in one study (Kagan & 
Moss, 1962), and this fact may have to some 
extent inflated the raw stability correlations. 
However, as described in the unabridged re- 
port (Olweus, Note 1), certain countermea- 
sures were taken in the present analyses to 
reduce or eliminate the risks of obtaining too 
high attenuation-correlated coefficients. 
In some studies, partly overlapping, partly 

«different sets of judges/raters were used for 
the different periods of time (Emmerich, 
1966; Eron et al, 1972; Olweus, 1977a; 
Wiggins & Winder, 1961). However, the 
evidence presented by Olweus (1977a and 
p. 860 in the present article) suggests that 
the inflating effects may be negligible, at 
least when the time interval between the 
rating occasions is a year or more, 


3 А * 
Potential Correlation-Lowering Factors 


Several of the samples studied were prob- 
ably relatively homogeneous as regards ag- 
Bressive reactions and behaviors, Further- 
more, in longitudinal studies there is usually 
sample shrinkage over time, and this shrink- 
age is often systematic. In particular, rela- 
tively aggressive individuals are most likely 
to disappear from the sample studied, as 
| Was found in the Eron et al. (1972) study. 
Such factors tend to reduce the size of the 
Stability coefficients. 

In the majority of studies, the conven- 
tional product-moment correlation coefficient 
was employed to express the degree of sta- 
bility over time. A few studies (Jersild & 
Markey, 1935; Martin, 1964; Patterson et 
al, 1967), however, used Spearman’s rho, 
and this index is likely to be slightly lower 
than the product-moment coefficient (Guil- 
ford, 1956), Furthermore, use of a some- 
| What different technique in forming total 
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Scores out of individual items very likely 
would have given higher stability correla- 
tions in at least one study (Wiggins & 
Winder, 1961). 

A central problem in many longitudinal 
studies is that the conceptual equivalence 
of the variables or measurement instruments 
used at different periods of assessment may 
be considerably less than perfect. The impli- 
cation is that if more equivalent variables or 
instruments had been employed, the stability 
correlations would have been higher. This 
problem applies to some of the studies con- 
sidered in this article, for instance, to those 
in which different variables or instruments 
had to be used at different periods of time 
(Kagan & Moss, 1962; Kohn & Rosman, 
1973). Generally, this problem seems to be 
particularly salient when the interval sepa- 
rating the times of assessment is long or the 
periods assessed represent very different de- 
velopmental stages. 

Finally, and perhaps most importantly, in 
several studies involving different school and 
nursery school classes (Block et al., 1974; 
Emmerich, 1966; Eron et al., 1972; Farring- 
ton, 1978; Kohn & Rosman, 1972, 1973; 
Wiggins & Winder, 1961), the stability cor- 
relations were calculated on the basis of the 
total samples and not as a (weighted) aver- 
age of the within-class correlations. It can be 
shown that if the correlations between the 
means of the classes at the two times of 
measurement are lower than the (weighted) 
average of the within-class correlation, the 
result is a total correlation that is less than 
the average within-class correlation (cf. 
Lindquist, 1940, although no proof is given). 
Often there are good reasons to expect the 
correlation between the class means to be 
lower than the average within-class correla- 
tion (different raters or sets of raters may 
develop different, class-relative rating norms 
and so on), and accordingly, the total cor- 
relation will be an underestimate (for theo- 
retical as well as predictive purposes the 
real interest is in a stability correlation un- 
affected by differences in class means). There 
are few reasons for expecting the correlation 
between the class means to be higher than 
the average within-class correlation in studies 
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of the present type. In the second (as well 
as the first) study by Olweus (1977a), re- 
sults supporting the above argument were 
found. The raw correlations for the total 
sample (N — 201) were .55 for start fights 
and .63 for verbal protest, as compared with 
65 and .70, respectively, when weighted 
average within-class correlations were com- 
puted (equivalent to total correlations based 
on within-class standardized variables). In 
other studies, the effects of eliminating class 
differences in mean level (and variability) 
may be less marked, but nevertheless, such a 
procedure is most likely to systematically 
increase the size of the stability coefficients. 
All in all, the above considerations sug- 
gest that a preponderance of correlation- 
lowering factors were operative in the studies 
surveyed, and very likely, many of the 
stability coefficients reported are underesti- 
mates. In addition, when correcting for at- 
tenuation, a deliberate strategy was adopted 
to settle on reliability estimates that were not 
too low in cases of incomplete reliability 
information, Accordingly, the stability co- 
efficients on which the following analyses are 
based are likely to be underestimates (rather 
than overestimates or “correct” values). 


General Description of Results 


If the disattenuated stability coefficients 
are plotted as a function of the interval in 
years between the two times of measurement 
(T2—T;), a relatively regular picture is 
obtained (Figure 2). The size of the stability 
coefficient tends to decrease as the interval 
Covered increases. The trend can be de- 
scribed by the following regression equation: 
У = .78 — .018x, where y is the disattenuated 
correlation and x is the interval (T2 — 73) in 
years. The spread around the regression line 
(the standard error of estimate, Sys) is 43, 
and the correlation between the two variables 
amounts to —.66, The decrease of the re- 
gression line is relatively slow (although 
significant). For an interyal of 5 years, the 
estimated disattenuated stability correlation 
is .69 and for an interval of 10 years is .60.* 

For comparison, the linear regression line 
for data on intelligence test measurements 
(of the Stanford-Binet type) compiled by 
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Thorndike (1933) is shown in Figure 
(Thorndike's findings were later corroborat 
by other researchers and are widely а 
cepted; see, e.g, Anastasi, 1958). In thj 
case the regression line is based on 36 sf 
bility coefficients covering intervals up to 
years. These coefficients were derived fro 
13 different samples, mainly school-age chil 
dren (the ages were not given). The averag 
sample size was 111. The regression equi 
tion describing the trend in Thorndike! 
data is у = .92 — .022х.° The standard errol 
of estimate, Sys, is .08, and the correlatigl 
between the variables amounts to —.32 (al 
measures were recomputed from Thorndike’ 
data and corrected for attenuation; the as 
sumed reliability was .95, in accordance will 
the first two entries in Thorndike’s table) 
As evident from the figure and the regression 
line, the stability of intelligence measures is, 
generally somewhat higher than the stability 
of aggression variables, but the decreas 
of the regression line is slightly steeper føl 
intelligence than for aggression. Extrapolat 
ing from the regression line for intelligence 
the difference between the estimated сов 
ficients would be only .10 for an interval 0 
10 years (.70 for intelligence and .60 fo 
aggression. Although it may not be feasible] 
to institute very detailed comparisons bë 
tween the two sets of data, since the age 
of the subjects were not given in Thorn 
dike's article, it can be generally asserted 
that there is a substantial degree of stability 
Over time for aggression ° as well as for E 
telligence and that the difference in stability 
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| 
*In calculating the regression. equation each stard 
bility coefficient was given equal weight. Weighting 
of the stability coefficients according to the nit 
of subjects on which they are based makes little 
difference. The same is true for the use of Z valué 
instead of r values. For the raw correlations (Ш 
weighted) the regression equation is y = .63 au 016 
5 Тһе regression equation for the (unweighted 
raw correlations is y= 87 — .020x. Е 
516 can be noted in this context that the stabili 
of aggressive reaction patterns in females 4 A 
Seems to be substantial (if two studies with Ee 
tively small samples are excepted), in contrast Wil б 
what is generally assumed (manuscript in prepa 
tion). 
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Regression line for data on intelligence (Thorndike) 


О 


19 20 21 


Figure 2. Regression line showing relationship between attenuation-corrected stability coefficients 
and time interval (Ta — 71) in years (unbroken line). (The regression line is based on 24 sta- 
bility coefficients [plotted]. For comparison, the regression line for attenuation-corrected stability 
coefficients in the area of intelligence is shown [broken line]. This regression line is based on 36 
Stability coefficients [not plotted] ; Thorndike — Thorndike, 1933.) 


k in the two areas does not appear to be very 
great.* 

Although the goodness of fit of the re- 
Bression line to the data in Figure 2 appears 
quite acceptable, theoretical considerations 
and inspection of Figure 1 suggest that the 
Subjects! age at the time of first measure- 
ment may be an important parameter in 
addition to the interval T2— Tı. One way 
of expressing the stability correlations as a 

| jon function of the subjects' age and the 
interval is to form an age ratio T;/Ts. For 
4 constant age, the age ratio decreases as 
the interval 7, — T, increases. And for a 
Particular interval, the age ratio is lower at 
a low age than at higher ages. It is theo- 
tetically reasonable to expect the stability 
Coefficients to show the same trend as the 
аве ratio, and accordingly, a positive rela- 
tionship can be anticipated between the age 
ratio and the disattenuated stability corre- 
lations, 


, The relationship between the two variables 


shown in Figure 3 is clearly positive, as ex- 
pected, and the goodness of fit of the regres- 
sion line (у = .26 + .617х) to the data is 
even better than in Figure 2. This is mani- 
fested in a somewhat lower standard error 
of estimate (.11 compared with .13) or, 


"In another overview by Thorndike (1940), the 
subjects in his 1933 article are referred to as school- 
age children. If in order to make the present data 
and the intelligence test data as comparable as pos- 
sible, the regression equation is calculated only for 
aggression studies on children of school-age and for 
intervals up to 5 years (the first eight studies in the 
second section of Table 1), the equation is found to 
be quite similar to the regression equation for the 
total set of studies: y = .80 — .027х. It can also be 
noted that if comparisons are made on the pre- 
school level, that is, comparisons concerning the 
stability of aggressive and intelligence test be- 
havior (Thorndike, 1940, Table 1) over intervals 
up to a year or so for 34-year-old children, the 
stability of aggressive behavior is found to be as 
great as or slightly greater than the stability of 
intelligence test behavior (attenuation-corrected co- 
efficients) . 
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Figure 3, Relationship between attenuation-corrected stability coefficients (number of coefficients 


= 24) and age ratio (7./Т,). 


alternatively, in a somewhat higher correla- 
tion coefficient (.77 compared with .66).5 
The picture obtained in Figure 3 indicates a 
good deal of regularity in the relationship 
between the age ratio T,/T» and the dis- 
attenuated stability coefficients (similar re- 
sults were obtained for the raw correlations). 
More specifically, the figure shows that the 
degree of stability in the individuals’ relative 
positions on aggression variables of the type 
considered here can be broadly described as 
a linear positive function of the interval be- 
tween the times of measurement and the sub- 
jects’ age at the time of the first measure- 
ment, expressed in the ratio Т/Т. 

There is thus less stability ог more change 
in the individuals’ relative positions the 
longer the interval (T2— Tı) covered is 
(particularly if the subjects’ age at T, is held 
constant). And for a particular interval (Ts 
— Ту), there is less stability or more change 
the younger the subjects are (although the 
relation between the stability correlation and 
the subjects’ age is less marked than that 
between the stability correlation and the in- 
terval). The effect, so to Speak, of a certain 
interval is greater for younger than for older 
subjects. These conclusions appear quite rea- 
sonable from a developmental as wel] as from 
а common sense point of view. 


Since the ages of the subjects were not 
given in Thorndike’s (1933) article, it is not 
possible to compare the fit of the regression 
lines for interval versus stability coefficients 
with the fit of the regression lines for age 
ratio versus stability coefficients. However, 
data available from other sources (Anastasi, 
1958; Honzik, 1938) suggest that use of thes 
age ratio T,/T. in the intelligence domain, 
leads to better predictions than the use of 
the interval T; — Т, alone. 


More Specific, Descriptive Conclusions 


The substantial degree of regularity mani- 
fested in Figure 2 and in Figure 3 is P 
ticularly impressive considering the great 
variation among the studies in sample omi j 
position, definition of variables, research sel 
ting, method of data collection and integra- 
tion, and the researcher's theoretical orien 
tation. There was also a very great e 
in the ages and intervals studied. After E 
ing emphasized the regularity of the da 


5 The correlation of .77 is in fact slightly igi 
than the multiple correlation between Gr a 
Coefficients on one hand and age and interv ore 
the other; besides, use of the age ratio is m 
meaningful from a theoretical point of view. ( 
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as a general finding, it is appropriate to 
examine the results more closely for a num- 
ber of more specific conclusions. 

]t is obvious that marked individual dif- 
ferences in habitual aggression level mani- 
fest themselves early in life (certainly by 
age 3) and may show (see Figure 1) a 
high or very high degree of stability for 
periods of at least 14 years at this develop- 
mental level (in nursery school and school 
settings). Data from one study (Kagan & 
Moss, 1962) suggest that ratings of aggres- 
sion variables that refer to the period from 
0 to 3 years may have some predictive value 
of aggression variables assessed as long as 
20 years later. However, to what extent ag- 
gressive reaction patterns observable dur- 
ing the preschool years can predict related 
patterns 5 or 10 years later must for the time 

Mbeing remain an open question, since data 
for such an assessment are not available. 

Furthermore, in contrast with the com- 
mon belief that the method of direct ob- 
servation gives evidence of much more be- 
havioral specificity and less stability than 
ratings of different kinds, no such tendencies 
were found in the present material. The aver- 

| age stability correlation for the three studies 
(using direct observation (Jersild & Markey, 
1935; Martin, 1964; Patterson et al., 1967) 
was .81, which can be compared with the 
average value of .79 for the three comparable 
Studies by Block et al. (1974), Emmerich 
(1966), and Kohn and Rosman (1972; first 
Study in Table 1) employing teacher rat- 
ings (the average stability correlations for 
the two sets of studies, using uncorrected co- 
efficients, were .65 and .64, respectively). 
Í The comparability of these two sets of stud- 
les is manifested in equal average age ratios: 
both were .83; all studies were based on 
nursery school or similar groups. Judging 
from these studies, there seems to be no dif- 
ference in degree of stability over relatively 
limited periods of time (up to a year) for 
aggression data collected by means of direct 
behavioral observation and teacher ratings. 

Passing on to the school years, it is obvi- 
Ous that aggressive reaction patterns ob- 
ЅегүаЫе at ages 8 or 9 can be substantially 
, Correlated with similar patterns observed 
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10 to 14 years later (some 25% of the vari- 
ance accounted for). It should also be noted 
that such patterns can with some success 
predict certain forms of antisocial violent 
behavior (violent delinquency; Farrington, 
1978) that occur 10 to 12 years later. 

Aggressive behavior at ages 12 and 13 
may show a high or very high degree of sta- 
bility for periods of 1 to 5 years (from 5076 
to more than 9096 of the variance accounted 
for). Also, for periods as long as 10 years 
the stability is high (some 4596 of the vari- 
ance accounted for). Furthermore, aggressive 
reaction patterns at these ages have con- 
siderable predictive capacity for later anti- 
social aggression, as evidenced by the studies 
of Eron et al. (1972) and Farrington (1978). 

Finally, aggressive behavior (chiefly ver- 
bal) and reactivity in the mid-30s are sub- 
stantially correlated with similar patterns 
observed some 15 to 18 years earlier, when 
the subjects were teenagers. Considering the 
general trend of the stability coefficients and 
the fact that 52 of the 72 subjects used by 
Tuddenham (1959) were included in the 
sample of 171 subjects studied by Block 
(1971), it is likely that the disattenuated 
coefficient of .91 reported by Tuddenham 
is somewhat high because of chance factors. 
It should be noted, however, that the follow- 
up data in Tuddenham's study were com- 
pletely independent of the data on which the 
adult evaluations in Block's study were 
based. 

When evaluating these results, the gen- 
eral adequacy and validity of the data should 
also be considered. One should recall that in 
several investigations a considerable degree 
of correspondence was found between the 
aggression variables studied and teacher 
ratings of the same or related behaviors. 
This was true for teacher ratings and nomi- 
nations versus peer ratings (Olweus 1974, 
1978; Walder, Abelson, Eron, Banta, & Lau- 
licht, 1961; Wiggins & Winder, 1961) as well 
as for teacher ratings versus direct behavioral 
observation (Jersild & Markey, 1935). If 
these sets of data were corrected for attenua- 
tion, the correlation between them very likely 
would exceed .75, indicating a quite sub- 
stantial relationship. In some investigations 
the aggression variables studied also mani- 
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fested relationships of considerable magni- 
tude with self-report data on similar pat- 
terns (Olweus, 1973b, 1978; Olweus, Note 
2) and related, but more antisocial forms 
of behavior (Eron et al, 1972; Farrington, 
1978). In addition, clear associations were 
obtained between two of the peer nomination 
instruments used in the stability studies and 
overt aggressive behavior in a contrived, 
naturalistic setting (Winder & Wiggins, 
1964) and in a controlled, experimental situ- 
ation, respectively (Williams, Meyerson, 
Eron, & Selmer, 1967). Finally, the possible 
existence of rater biases and stereotypes was 
carefully examined in some studies, in par- 
ticular those by Block (1971) and Olweus 
(1977a). In the latter study it was con- 
cluded on the basis of several different anal- 
yses that “the rating data to an overwhelm- 
ing degree reflect characteristics of the boys 
under study, rather than the biases and cog- 
nitive schemas of the raters irrespective of 
ratee characteristics” (Olweus, 1977a, p. 
1310). 

All in all, the above results derived by 
different methods and under a wide variety 
of conditions constitute strong evidence for 
the validity and general adequacy of the 
aggression data on which the stability cor- 
relations were based, They also attest to a 
substantial degree of cross-situational con- 
sistency in the sense that there is a con- 
siderable correspondence between aggression 
data obtained from independent sources or 
modes of measurement at about the same 
point in time. (The issue of cross-situational 
consistency in the area of aggression is not 
pursued further in the present article.) 

It should also be noted that the finding 
of à considerable degree of stability of ag- 
gressive reaction patterns over time seems 
to be in general agreement with what has 
been observed in a number of studies of re- 
lated, but more clearly antisocial forms of 
behavior (e.g., Conger & Miller, 1966. Mc- 
Cord & McCord, 1959; Robins, 1966; Roff, 
1961; Rutter, 1972; Tait & Hodges, 1962). ' 


Interpretation of the Stability Data 


The descriptive conclusion that there is a 
substantial degree of Stability in aggressive 
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behavior cannot, however, without further: 
analyses be taken as evidence for the cor- 
responding stability of some reaction tem. 
dencies or motive systems within individuals, 
It might be argued from a situationist poin 
of view (see Bowers, 1973), for instance, 
that the observed consistency primarily re. 
flects stably different conditions for different 
individuals in the settings studied. Thus, in 
the first place, the stated conclusion can be 
said to apply under typical conditions, that, 
is, under a degree of environmental varia- 
tion (or stability) and pressure for change 
(or nonchange) typically found in the set- 
tings of the subjects for the periods studied 
(cf. Olweus, 1977a). Accordingly, it is im- 
portant to examine the conditions charac- 
terizing the settings and periods under study, 
maybe particularly for the highly aggressive 
individuals, since their relative lack of 
change is a prerequisite to high stability co: 
efficients. 

In the studies on preschool children little 
detailed knowledge of changes in the set- 
tings is available. However, a good deal of 
change in the composition of the peer group 
took place in some studies (e.g., Jersild & 
Markey, 1935; Kohn & Rosman, 1972, 
1973). Furthermore, in the three studies 
using the method of direct observation (see 
Table 1), the behavior was observed during 
Periods of free play involving a minimum 
of situational structure and, in all proba- 
bility, interaction with a number of different | 
peers. Even if evidence has been presented | 
that a nursery school setting can provide | 
reinforcement of aggressive behavior (Раі 
terson et al., 1967), the same authors have | 
also reported (р. 32) that the highly ag- * 
gressive children were the most likely to be 4 
the target of other children’s aggression, that | 
is, to get punished. Furthermore, Jersild and 
Markey (1935, p. 163) reported that the 
nursery school teachers interfered with the 
children’s conflicts in about a third of the 
cases and predominantly in a way that was 
unfavorable to the aggressor. It should also 
be recalled that a small number of children 
accounted for a large percentage of the a8 
gression episodes. It thus seems difficult 19 
explain highly aggressive behavior in we 
Settings as a consequence of situational pu”, 
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jor primarily as a function of reinforcement. 

It appears rather that the highly aggressive 

children in particular were exposed to a cer- 
tain pressure for change in nonaggressive di- 
rections from the teachers. 

As regards the subject groups studied in 
the school setting it can be mentioned that 
in most cases the classes had new teachers 
and had moved to other school buildings at 
the second time of measurement. A relatively 
large percentage (10% to about 25%) of 
the original classmates had also been re- 
placed by new peers. In one of the studies 
(Olweus, 1977a), somewhat more detailed 
information about changes in the composi- 
tion of the peer group was available. As pre- 
viously shown, even very marked changes in 
two classes did not affect the stability of 
aggressive behavior of those boys who were 

jin the class at both times of measurement. 
Furthermore, it can generally be assumed 
that there was at least a certain amount of 
pressure from the teachers and the admin- 
istrative staff in the direction of modifying 
the behavior of the habitually aggressive 
pupils. As evidenced by the substantial sta- 
bility correlations, such environmental pres- 
sure did not seem to be very effective. This 
finding is in good agreement with the general 
experience that it is difficult to reduce ag- 
gressive and antisocial behavior in preado- 
lescent and adolescent males (see, e.g., Ol- 
weus, 1978, chap. 9; Burchard & Harig, 
1976). It thus appears that the behavior of 
highly aggressive boys of these ages is often 
maintained irrespective of considerable en- 
vironmental variation and in opposition to 
forces acting to change this same behavior. 
It may be questioned, however, if there 
| are not particular aversive situations or con- 
ditions in the school environment of the ha- 
bitually aggressive boys that might explain 
their behavior. This question was analyzed in 
some detail by Olweus (1977a), who drew 
9n the extensive findings regarding a particu- 
lar group of highly aggressive boys, the 
bullies previously mentioned (Olweus, 1974, 
1978). On the basis of several lines of evi- 
dence concerning the possible existence of 
frustrations, failures, and rejections in the 
School as well as the presence of other psy- 
i chological, physical, and socioeconomic con- 
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ditions of the bullies, it was concluded that 
“it is very difficult to explain the behavior 
of the highly aggressive boys as consequence 
of their being exposed to unusually aversive 
situations or conditions in the school set- 
ting” (Olweus, 1978, p. 136). 

With regard to the subjects followed up in 
their adult years (Block, 1971; Kagan & 
Moss, 1962; Tuddenham, 1959), the ma- 
jority went through experiences of poten- 
tially great impact on their lives during the 
interval from earlier to later measurements. 
Most of the subjects had married, and a 
certain percentage had divorced; in addi- 
tion, they had started on and also covered 
part of their professional careers. A majority 
of the subjects in Block’s study (and prob- 
ably also in that of Tuddenham) had become 
parents. Most of them had been in military 
service, and roughly half of Block’s (and 
the majority of Tuddenham’s) subjects had 
served in the armed forces during World 
War II. Some of the situations or life events 
mentioned can be primarily regarded as 
forced upon the subjects, at least in some 
respects; others can be mainly considered 
to be a result of active selection on the part 
of the subjects. In sum, it is very likely that 
a good deal of environmental change and 
also pressure for change of highly aggressive 
reaction patterns were imposed on the sub- 
jects in these studies during the 10 to 20 
years separating the earlier and later assess- 
ments. In addition, considerable maturational 
changes can be expected to occur during 
such a long period for subjects who are only 
about 10-15-years-old at the time of the 
early measurement. 

When making an overall evaluation of 
actual and presumed environmental changes 
and pressures for change during the periods 
studied, one is, in fact, even more surprised 
at the degree of stability manifested. As 
previously concluded, changes in the individ- 
uals’ relative positions had certainly oc- 
curred, both as a function of the interval 
covered and of the individuals’ age at the 
time of first measurement. And if more de- 
tailed knowledge of the conditions and life 
events facing the individual subjects had 
been available, maybe more exact predictions 
about changers and nonchangers could have 
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been made. In an overall appraisal, however, 
the primary task confronting the researcher 
seems to be one of explaining the substantial 
stability or lack of change in aggressive be- 
havior found to prevail in spite of consider- 
able environmental variation and in opposi- 
tion to a number of influences acting to 
change this same behavior. 

The relative lack of change is all the more 
remarkable because highly aggressive behav- 
ior often leads to aversive consequences from 
the environment. Even if psychological and 
physical advantages can be gained by ag- 
gressive behavior in a number of situations, 
negative effects (such as punishment from 
the environment) often seem to be equally 
likely. It can also be argued that many ag- 
gressive behaviors, such as bullying or ag- 
gressive attacks in a free-play situation, are 
self-initiated behaviors (cf. Olweus, 1977a). 
As previously pointed out, it is often diffi- 
cult to explain the behavior of the highly 
aggressive individuals as a function of par- 
ticular aversive conditions or strong situa- 
tional pull in the immediate, proximal situa- 
tion in which the aggressive behavior is 
displayed. In the studies surveyed, there is 
little evidence supporting a view that stable 
differences in aggression level are primarily 
а consequence of consistently different en- 
vironmental conditions for different individ- 
uals in the nursery school, the elementary 
school and so on, Overall, the above results 
and analyses strongly Suggest that the ob- 
served stability over time of aggressive 
reaction patterns is, to a considerable mea- 
Sure, determined by relatively stable, individ- 
ual-differentiating reaction tendencies or mo- 
tive systems within individuals, 


Conclusions 


In addition to the previous descriptive 
generalizations, the following conclusions are 
warranted. They pertain directly to the issues 
raised in the Introduction. 

1. The degree of consistency over time in 
aggressive behavior is much greater than has 
been maintained by proponents of a be- 
havioral (situational) Specificity position in 
the personality feld (eg Mischel, 1968 
1969). It should be noted that the agzressive 
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| 
behavior and reaction patterns studied were 
observed or inferred by individuals other. 
than the subjects themselves and that sev. 
eral studies (Jersild & Markey, 1935; Mar- 
tin, 1964; Patterson et al., 1967) used the 
method of direct behavioral observation. It 
should also be emphasized that, generally, 
the substantial degree of stability found can 
hardly be interpreted as mainly reflecting 
consistency constructed in the minds of the 
observers, irrespective of the actual behavior 
of the subjects. 

The across-time stability of aggressive be- 
havior was not much lower than that typi- 
cally found in the intelligence domain. This 
finding is worthy of particular emphasis, | 
since the stability of behaviors associated 
with intelligence and cognitive processes has 
been generally regarded as impressive and 
indicative of “genuine continuity" also by. 
proponents of a behavioral specificity posi- 
tion (eg, Mischel41968, pp. 35-36). To. 
avoid misunderstanding, however, I want to 
make clear that when pointing to similarities 
between results from the intelligence domain 
and those from the aggression area, I re 
strict my comparison to the degree of star 
bility over time. I am in no way implying | 
assumptions about similar developmental and 
operating mechanisms or, for instance, that ! 
the degree of genetic influence is the same 
in the two areas (see Olweus, 1978, chap. 8). | 

2. As previously spelled out, the results 
and analyses strongly suggest that important 
determinants of the observed consistency in 
aggressive behavior over time are to be found 
in relatively stable, individual-differentiating 
reaction tendencies or motive systems, how- 
ever conceptualized, within individuals. This 
conclusion should not be taken to imply that 
situational factors are considered unimpor- 
tant for the evocation of aggressive behavior 
(see, eg., Olweus, 1969, 1973b). Nor does 
it imply that aggressive behavior is inde- 
pendent of rewarding and maintaining COD- 
ditions in the immediate, proximal environ- 
ment. However, it is contended here that the 
explanatory and predictive value of such fac: 
tors has been exaggerated in the last decade; 
the analyses of the present article clearly 
Suggest that relatively stable, internal reat- 
tion tendencies are important determinants | 
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of behavior in the aggressive-motive area and 
should be given considerably greater weight 
than has been done recently. (For an over- 
view of how such internal reaction tendencies 
may develop in the area of aggression, see 
Olweus, 1978, chap. 8.) In line with this 
argument, it also seems quite reasonable to 
assume that the inferred, internal reaction 
tendencies or motive systems within an in- 
dividual are essential codeterminants of what 
the individual will perceive as reinforcing. 
In fact, the analyses presented suggest that 
highly aggressive individuals to a consider- 
able degree actively select and create the 
kind of situations in which they are often 
observed (cf. Bowers, 1973; Wachtel, 1973). 

The stated conclusion can also be inter- 
preted as providing support for some form 
of trait position? at least in the following 
sense: The results indicate that the proba- 
bility of giving an aggressive response in 
potentially aggression-activating situations, 
more or less separated in time, or the 
strength of such responses differs greatly 
among individuals. In my view, however, 
the results do not imply that knowledge of 
an individual's habitual aggression level 
necessarily leads to good predictions of the 
behavior of the individual in a particular 
concrete situation. It has been empirically 
found, for instance, that the relationship be- 
tween aggressive responses in different situ- 
ations (data sources) for certain groups of 
individuals (with high aggression-inhibitory 
tendencies) may even be negative (but in a 
predictable way; see Olweus, 1969, 1973b). 
To make more accurate predictions for par- 
ticular situations, it seems necessary to take 
into account, among other things, the in- 
dividual's cognitive appraisal of the situa- 
tion, the aggressive activation value of the 
situation, the aggression-inhibitory activa- 
tion value of the situation, the strength of 
the individual's habitual aggressive tenden- 
Ces, as well as the strength of the individ- 
uals aggression-inhibitory tendencies (Ol- 
Weus, 1969). Accordingly, I prefer not to 
interpret the consistency results obtained in 
terms of a (simple) trait formulation of ag- 
gressiveness (see also, Olweus, 1973b). 

The preceding analyses and conclusions 
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thus indicate that what are known as per- 
sonality concepts involving relatively stable, 
internal reaction tendencies or properties of 
individuals are useful in predicting and ex- 
plaining aggressive behavior. Data on longi- 
tudinal consistency in other motive systems 
(e.g., Block, 1971) suggest that this is true 
also in areas of psychology other than ag- 
gression. It appears, then, in contrast with 
some recent proposals (e.g., Krasner & Ull- 
mann, 1973; Shweder, 1975), that person- 
ality concepts and variables referring to rela- 
tively stable, individual-differentiating reac- 
tion tendencies or properties may be of great 
value in psychology for many years to come. 


9Jt should be noted that it is difficult to speak 
of trait theory in general, without reference to a 
particular theorist or a particular motive area. There 
are obviously many differences and nuances among 
different theorists. 
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Evolutionary Scales Lack Utility: 
A Reply to Yarczower and Hazlett 


John P. Capitanio and Daniel W. Leger 


University of California, Davis 


Yarczower and Hazlett have Proposed that evolutionary scaling based on 
anagenesis (biological improvement) is an acceptable—even desirable—facet of 
contemporary comparative psychology. We strongly disagree with that thesis. 
Our criticisms are based on (a) their misconception of anagenesis, (b) incon- 
sistencies in the use of the term evolutionary grades, (c) typological thinking, 
and (d) the lack of utility of evolutionary scales. Reversion to evolutionary 
scaling by comparative psychologists would disrupt the ongoing synthesis of 
comparative psychology with other evolutionary sciences. 


In a recent Psychological Bulletin article, 
Yarczower and Hazlett (1977) attempted to 
legitimize the construction of evolutionary 
scales, suggesting that “evolutionary scales 
have a place within comparative psychology 
and do not violate principles of evolution 
when done properly” (p. 1096). Although 
evolutionary scaling may have a limited place 
in analyses of structural features, we believe 
the concept is inappropriate and potentially 
misleading in analyses of behavior. Further, 
it is our contention that the arguments and 
evidence promulgated by Yarczower and 
Hazlett are not only internally inconsistent 
but reflect a serious lack of understanding 
of the principles of evolutionary biology. 

Our criticisms of Yarczower and Hazlett 
revolve about four main issues: (a) the con- 
cept of anagenesis, or biological improvement, 
(b) the appropriate definition of evolutionary 
grade, (c) typological thinking, and (d) the 
utility of evolutionary. scales in modern com- 
parative psychology. 


Anagenesis 


| Although most evolutionary biologists agree 
in principle that anagenesis refers to the 
evolution of increased complexity in some 
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trait, the term has become diluted since i 
inception and “has come to be applied to 
nearly any kind of evolutionary change, 
whether leading to a marked advance or not” 
(Dobzhansky, Ayala, Stebbins, & Valentine 
1977, p. 236). Dobzhansky et al. feel that the 
vast majority of evolutionary events are 
actually adaptive changes at the same level 
of complexity. M 

Yarczower and Hazlett apparently view 
increased complexity and specialization as 
corollaries of evolution; they cited Da n 
and some older works of Simpson and of 


must not include the notion that evolution is always 
Progressive, leading inevitably from simpler to пор 
complex forms of life. To be sure, some of the mos! 
important evolutionary events have been increases in 
Structural complexity, e.g., ... the complex sense И 
organs of vertebrates . , . and animal societies . . +: 
Furthermore, both older and modern theories 
evolution] recognize even degenerations of structure 
as evolution, provided they are products of adaptive 
alterations in population-environment interactions. 
(p. 8) 


The evolution of blindness in cave-dwelling 
fish and the degeneration or structure ап 
behavior in parasites are examples of. the 
adaptiveness of decreased complexity. Wilson 
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(1975) has provided a detailed account of 
the degeneration of the central nervous sys- 
tem and of behavior in parasitic ants com- 
pared with their ancestral free-living relatives. 

Thus evolutionary change is by no means 
unidirectional. Because selection pressures 
often fluctuate, a certain amount of evolu- 
tionary backtracking is to be expected. Natu- 
ral selection can result not only in anagenesis 
but also in degeneration, in stasigenesis (no 
change whatsoever), or, more likely, in change 
without alteration of complexity. It should 
be remembered that the things of value for 
organisms are survival and reproduction; 
evolutionary success is clearly not dependent 
on complexity per se. 


Evolutionary Grades 


The term evolutionary grade, as used by 
Yarczower and Hazlett, differs markedly from 
the definitions given in current major works 
(e.g, Mayr, 1970; Wilson, 1975). Although 
all the above authors have agreed that grade 
refers to the stage of a behavior, anatomical 
structure, or physiological process, Yarczower 
and Hazlett (1977) stipulated that to be 
included in the same grade, animals must be 
“related by parallel evolution” (p. 1091), 

| that is, must be closely related. But according 
* to Wilson (1975), *Phylogenetically remote 
lines can reach and pass through the same 
grades, in which case we speak of the species 
making up these lines as being convergent 

with respect to the trait" (pp. 25-26). 

The concepts of parallel evolution and con- 
vergent evolution, with their implications of 
dosely and distantly related species, respec- 
tively, have been a common source of con- 

, fusion. Kaster and Berger (1977) correctly 
| Pointed out that parallelism and convergence 
can occur in taxa of amy degree of phylo- 
genetic affinity; that is, the process is of 
primary importance, and the degree of re- 
latedness 15 of secondary concern. 

But even if one refrains from judging the 
relative value of the two definitions, Yarc- 
Zower and Hazlett were at least repeatedly 
inconsistent in using the term grade as they 
originally defined it. For example, they wrote 
of birds, mammals, and reptiles as having 
attained the grade Amniota. By no stretch 
of the imagination can these classes be viewed 
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as being closely related; there is no doubt 
however that they have converged on a com- 
mon strategy. Later, Yarczower and Hazlett 
diluted their original statements by following 
up Huxley's (1958) suggestion that the 
“breadth” of the grade be designated by the 
lowest common taxon rank of the least 
related species included in the comparison, 
for example, grade-order ог grade-class (see 
Kaster & Berger, 1977, for a similar sug- 
gestion). Thus, Yarczower and Hazlett de- 
fined evolutionary grade differently from 
Wilson (1975), Mayr (1970), and Dobzhansky 
et al. (1977); but their use of the term is 
consistent with the latter authors' definition. 

1f, for the moment, one ignores Yarczower 
and Hazlett's (1977) examples and takes their 
definition of grade as what they actually 
meant, then they are faced with two major 
problems. First, their notion of parallelism 
must be clarified, because the possibility 
exists that even phylogenetically remote taxa 
may evolve in parallel. Second, if parallelism 
is taken to refer only to “closely” related 
species, they must then consider the distinct 
probability that they were dealing with ho- 
mologous traits—a task they explicitly wished 
to circumvent (p. 1090). 

But if one ignores their definition and 
views their examples as indicative of their 
position, it is quite clear that Yarczower and 
Hazlett were dealing with evolutionary con- 
vergence, that is, similar phenotypic response 
to similar selection pressures. Again, however, 
convergence was explicitly excluded from 
their thesis. E 

The failure of Yarczower and Hazlett to 
unambiguously state their position concerning 
evolutionary grades can only be taken as an 
indication that they have failed to establish 
“a third approach to the study of behavioral 
evolution” (p. 1088); their approach cannot 
be distinguished from the other two ap- 
proaches, namely, the studies of homology 
and analogy. 


Typology 


In discussing the level of a particular grade, 
Yarczower and Hazlett (1977) asked us to 
keep in mind the representativeness of a 
particular animal with respect to its higher 
taxon. Is the cat representative of Carnivora, 
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the squirrel of Rodentia, and so on? One 
might well ask, is any species truly repre- 
sentative of a particular higher taxon? The 
question of representativeness is surely a rela- 
tive one, and disagreement is certain to exist. 
Consider an example (from Tinbergen, 1958). 
If one wished to construct grades based on 
nesting behavior, one might very well choose 
the black-headed gull to represent the gull 
family (Laridae) and then place the gulls in 
the grade colonial ground nesters. Alternatively, 
one might choose the kittiwake as the repre- 
sentative and therefore place the gulls in the 
grade colonial cliff nesters. To do either would 
be an injustice, for one would lose the sense 
of variability the other exemplifies. One 
could eliminate the problem by renaming the 
grade colonial nesters, but by doing so one 
would lose more of the appreciation for di- 
versity within the family. 

The example may be an extreme one, but 
it does not depict a rare occurrence. Closely 
related species can have wide variations in 
any trait—Consider the greatly different social 
systems of the baboon species Papio hama- 
dryas (one-male harem) and Papio anubis 
(multimale troop). And depending on one’s 
definition of relatedness, one finds that within 
the primate superfamily Hominoidea, a wide 
variety of social organizations exist—from 
monogamous pairs to troop-living species 
(Eisenberg, Muckenhirn, & Rudran, 1972). 
Picking a representative would certainly be an 
impossible task. The only mechanism in the 
grade scheme for dealing with typologies of 
this sort is to make the grade ever more 
general. Far from being useful, grades then 
become arbitrary, broad, and meaningless. 


Utility of Evolutionary Scales in 
Comparative Psychology 


Another serious flaw in the Yarczower- 
Hazlett formulation is their almost complete 
mattention to behavior, even though their 
Stated Purpose was to examine behavioral 
evolution. Virtually all their examples come 
from anatomy and physiology—areas in which 
complexity can be measured relatively simply. 
But where behavior is concerned, complexity 
Is an intractable concept. To continue a pre- 
vious example, on the basis of sheer numbers 
it might be concluded that a multimale 
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troop's social structure is more complex 
that of a one-male harem. Alternatively, 
might argue that the loose aggregation 
P. hamadryas baboons is actually m 
complex in that it is composed of highly 
discrete one-male units that manage to 

operate in foraging, traveling, and so o 
with few overt interactions. Similarly, whi 
is more complex, polygyny or polyandry; 
Is scent marking a territory more compl 
than vocally advertising that territory? 
short, when considering behavior one is usur 
ally hard pressed to decide exactly how to 
rank order the candidates in terms of in- 
creasing complexity. 

When faced with the difficult task of 
ranking several behaviors, one should be 
aware of the temptation to simply call the 
behavior found in the more recently evolved 
taxon the more complex. In our opinion 
Yarczower and Hazlett may have fallen into 
this trap when they claimed that probability- 
maximizing, a strategy adopted by the rats 
in Bitterman's (1965) classic studies, is more 
complex and advanced than probability- 
matching, the strategy used by fish. We feel 
that a strong case can also be made for 
probability-matching being the more complex 
behavior. Maximization requires only a greater- 
than/less-than comparison; but in matching i 
the degree of difference between the two 
manipulanda is taken into account. Also, 85. 
Bitterman's data suggest, matchers adjust. 
their response distributions very quickly fol- | 
lowing changes in reinforcement distribution, | 
reflecting a sensitivity to environmental 
change exceeding that of the maximization 
strategy. The lesson of this example should 
be clear: One must not assume a priori that 
the more recently evolved taxon will neces- 
sarily exhibit the more complex form of the 
behavior. 

Given that one assumes Yarczower and 
Hazlett’s rationale for the development of 
grades and then constructs a series of grades 
based on some trait, one is left with the 
final question, What can be said about the 
results? Unfortunately, we feel the final 
answer is, Nothing. At best one arrives at 
the description of behavior in related а 2 
(after arbitrarily defining related and toiling 
over choosing representative examples). M 


| 
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worst, however, one does something more 
nefarious—One gives the impression that the 
grades represent a scala naturae. This im- 
pression. arises from two points: first, that 
there is a rough correlation between com- 
plexity and evolutionary recency, which tends 
to clump “higher” animals in the more 
"improved" grades, and second, that the 
actual presentation of description in this form 
must have some underlying reason for exist- 
ence other than description. In short, little 
is accomplished, except perhaps the genera- 
tion of a lively debate over which grade is 
really the most improved. Indeed, once a 
scale has been developed, the only possible 
use that can be made of it (beyond the 
aforementioned descriptive function) is to 
make inferences that must employ adaptation 
of the particular organisms to their respective 
ecological niches. 

We vigorously disagree with the statement 
(Yarczower & Hazlett, 1977, p. 1092) that 
the construction of a hierarchical progression 
of grades can have evolutionary significance 
beyond that of the study of adaptations. 
The source of our disagreement stems from 
the fact that characters exist as adaptations. 
To speak only of characters in vacuo, divorced 
of their adaptive significance, is meaningless 
in terms of evolution. Thus, to say that 
reptiles, although having "attained" the grade 
Amniota, have not attained the grade Ho- 
meothermy, “which reflects an improvement 
in the mechanism responsible for the regula- 
tion of body temperature" (Yarczower & 
Hazlett, 1977, p. 1096), is to say nothing of 
significance. The concept of not attaining 
this grade is meaningless, since selection pres- 
Sures on reptiles are such that poikilothermy 
is generally favored. 
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In conclusion, we believe that evolutionary 
scales have no place in a modern comparative 
science of behavior. The idea of improvement 
smacks of anthropocentrism and is certainly 
arbitrary. These concepts are without value 
and may be seriously misleading. Scaling can 
only hinder interpretation of comparative 
psychologists! efforts by more traditional, 
evolution-oriented behaviorists at a time when 
these disciplines are reaching common grounds 
for interaction after having been polarized 
for so long at the extremes of the learned- 
instinct continuum. 
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We show that the first three criticisms by Capitanio and Leger of the Yarczower 
and Hazlett article are unfounded and that the fourth is premature: (a) Con- 
temporary leaders in the study of evolution define anagenesis in the same way as 
did Yarczower and Hazlett; (b) their use of the term evolutionary grade was 
internally consistent and consistent with usage by Mayr, whom Capitanio and 
Leger cited as having used it differently; (c) classification of species into higher 
taxa does not represent "typological thinking"; and (d) although analyses by 
grades of social behaviors are more difficult than those of sensory systems, as 
was noted by Yarczower and Hazlett, it is premature to conclude that the effort 


will not bear fruit. 


We answer in turn each of the four criti- 
cisms leveled at the Yarczower and Hazlett 
(1977) article and show that the first three 
are unfounded and that the fourth is pre- 
mature. 

1. “Misconception of anagenesis.” Cap- 
itanio and Leger (1979) claimed that al- 
though at one time anagenesis meant what 
Yarczower and Hazlett claimed it to mean, 
it currently refers to any evolutionary 
change. 

Gould (1976), a paleobiologist at Har- 
vard's Museum of Comparative Zoology, 
has written about the concepts of anagene- 
sis, evolutionary Progress, and grades in a 
book that was received too late for the ideas 
contained within it to be incorporated into 
the Yarczower and Hazlett article. He wrote, 
"The standard evolution tree . - . leaves out 
evolutionary progress (or anagenesis) en- 
tirely .... Grades . . . are successive levels 
of organization defined as stages in the im- 
provement of an organic design for some 
Specified function" (р, 117; italics added). 
Or read what Jerison, author of Evolution 
of the Brain and Intelligence (1973), had to 
say about the terms anagenesis and grades 
in his address entitled “Smart Dinosaurs and 
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Comparative Psychology" (Note 1). He said, 
"For an evolutionary analysis of intelligence, 
we might seek evidence of ‘intellectual’ prog- 
ress from earlier to later species. This analy- 
sis is called ‘anagenetic’ and is about pro- 
gressive evolution" (pp. 1-2; italics added). 
And again, “In anagenetic analysis the ob- 


jective is to identify grades in evolution, | 


recognizing the possibility of higher, or more 
advanced, grades on a particular dimension 
(p. 2). 


In fact, if one reads further in the source | 


used by Capitanio and Leger to define ana- 
genesis (Dobzhansky, Ayala, Stebbins, & 
Valentine, 1977) one reads, “Anagenetic epi- 
sodes commonly create organisms with novel 
characters and abilities beyond those of their 


ancestors" (p. 236; italics added). And | 


again, in describing the relationship among 
anagenesis, cladogenesis, and  stasigenesis, 
anagenesis is defined as "evolutionary 4d- 
Vance or change" (p. 236; italics added). 
Thus, the definition of anagenesis in thé 
Yarczower and Hazlett article is one shat 
by leading contemporary students of evolu: 
tion. 

2. “Inconsistencies in the use of the term 
evolutionary grades.” Capitanio and Leger’s 
(1979) first criticism, although incorrect, at 
least is clear; the second is unclear. We 
believe there are two points they wished E 
make. They wrote that if “parallelism 55 
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taken to refer only to ‘closely’ related species, 
they must then consider the distinct proba- 
bility that they were dealing with homolo- 
gous traits—a task they explicitly wished to 
circumvent” (p. 877). What Yarczower and 
Hazlett (1977) wrote, in fact, was that “in 
the study of anagenesis it is important that 
the animals be related at least by parallel 
evolution” (p. 1090). It is puzzling that 
Capitanio and Leger read this and concluded 
that homologous traits were meant to be ex- 
cluded from an anagenetic analysis. Animals 
must be related at least by parallel evolu- 
tion, and obviously, direct descendancy or 
relationships underlying behavioral homolo- 
gies satisfy the criterion of being related “at 
least by parallel evolution.” 

Capitanio and Leger also suggested that 
Yarczower and Hazlett’s use of the term evo- 
lutionary grade differed from that of Mayr 
(1970) and Wilson (1975). Mayr (1963) 
wrote, “The felicitous term ‘grade’ was in- 
troduced into the evolutionary literature by 
Huxley . . . following Simpson . . . to desig- 
nate ‘a step of anagenetic advance, or unit 
of biological improvement’” (pp. 608-609). 
In a later abridged version of the same 
work, Mayr (1970) offered exactly the same 
sentence, but omitted Simpson’s name. In 
any case, in both versions, Mayr then went 
on to note that “several related lines may 
reach the same adaptive or structural grade 
independently” (1963, p. 609; 1970, p. 365). 
Further, Mayr (1970) wrote that “Simpson 
in particular . . . has pointed out how rapidly 
a new type may reach a new phylogenetic 
‘grade,’ but once this grade is reached the 
type remains essentially stable" (pp. 370- 
371; the 1963 version on p. 617 is almost 
identical), Mayr’s discussion of grades relied 
heavily on Simpson’s (1961) usage, as did 
Yarczower and Hazlett’s (1977, рр. 1091- 
1092), and thus it should come as no sur- 
prise that usage of the term evolutionary 
&rade by Mayr (1963, 1970) and by Yarc- 
zower and Hazlett does not differ and that 
по evidence to the contrary was presented by 
Capitanio and Leger. Wilson’s (1975) sug- 
gestion "that phylogenetically remote lines 
can reach and pass through the same grades" 
(p. 25) does differ in principle from the sug- 
gestion of Yarczower and Hazlett (1977), as 
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well as from that of Simpson (1961) and 
that of Mayr (1970). 

3. “Typological thinking.” Capitanio and 
Leger applied the concept of typology, or 
typological thinking, inappropriately. For its 
correct use, see Mayr (1963, 1969, 1970, 
1976). We answer, however, the criticism 
raised by them. They seemed to object to 
the placing of two instances into a single 
class. They claimed that the uniqueness and 
individuality of each instance is lost when 
one notes the similarity between the two in- 
stances. Yet, surely statements about com- 
monalities among instances are a goal of 
science. Indeed, when Capitanio and Leger 
labeled the gulls in their example as kitti- 
wakes, they could very well have been ac- 
cused of having lost an appreciation of the 
differences between Judy and Fred Kitti- 
wake. Yarczower and Hazlett noted that in- 
clusion in one grade did not mean that the 
same groups of animals would be placed to- 
gether in other grades. The diversity among 
groups of animals indeed was recognized. 
But Capitanio and Leger appear to object 
to classification of species into higher taxa. 
If they do, then surely they must object to 
an important goal of the field of systematics. 
Mayr (1969) wrote that “each species may 
exist in numerous forms (sexes, ages, classes, 
seasonal forms, morphs, and other. phena). 
It would be impossible to deal with this 
enormous diversity if it were not ordered 
and classified" (p. 1). And again, “One of 
the major preoccupations of systematics is 
to determine . . . what unique properties of 
every species and higher taxon are. Another 
is to determine what properties certain taxa 
have in common with each other" (p. 3; 
italics added). 

4. *Lack oj utility of evolutionary scales." 
This final charge reflects a number of mis- 
understandings by Capitanio and Leger as 
well as a legitimate challenge, one that Yarc- 
zower and Hazlett issued themselves. Capi- 
tanio and Leger (1979), in discussing social 
behavior, suggested that “one is usually hard 
pressed to decide exactly how to rank order 
the candidates in terms of increasing com- 
plexity” (p. 878). Yarczower and Hazlett 
(1977) stated explicitly that “it is more 
difficult to obtain agreement about what con- 
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stitutes improvement in social systems than 
about sensory systems. However, difficulty 
alone is not sufficient grounds for rejecting 
the notion of evolutionary scales" (p. 1096). 
Incidentally, Capitanio and Leger appear to 
treat increased complexity as synonymous 
with progressive improvement, but it should 
be noted that in their discussion of an ex- 
ample of progressive improvement in. color 
vision, Yarczower and Hazlett never used 
the word complexity. This word was used in 
a brief review of the history of the concept 

' of anagenesis. It is difficult to define im- 
provement but not impossible. Consider an 
analysis of facial behavior. If facial move- 
ment is treated as the behavioral character 
to be subjected to an analysis by grades, 
then it is not unreasonable to assume that 
improvement is reflected in the increased 
ability to engage in greater varieties and in- 
tensities of facial behaviors, It is clear that 
the evolution of facial musculature is of 
prime importance in understanding the pro- 
Bressive improvement that seems to be re- 
flected in the evolution of facial behavior 
(Chevalier-Skolnikoff, 1973; Huber, 1931/ 
1972). The social significance of the evolu- 
tion of facial musculature and the concomi- 
tant improvement in facial movements and 
facial behavior are considerable. The impor- 
tance of research on facial behavior and fa- 
cial expression (e.g., Ekman, 1973; Ekman 
& Friesen, 1976; Izard, 1971) for an un- 
derstanding of a rich variety of social be- 
havior provides some optimism that it may 
be possible to define improvement in systems 
relevant to social phenomena, 

Capitanio and Leger ( 1979) concluded 
that Yarczower and Hazlett discussed “char- 
acters in vacuo, divorced of their adaptive 
significance" (p. 879).. They came to this 
conclusion from a statement by Varczower 
and Hazlett (1977) that “a hierarchical pro- 
gression of grades . . . has evolution signifi- 
cance beyond that of the study of adapta- 
tions” (p. 1092). Even a casual reading of 
this statement does not lead to the conclusion 
that the adaptive significance of the grades 
is to be ignored. Yarczower and Hazlett dis- 
cussed the differences among the goals of the 
study of behavioral homologies, adaptations, 
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and grades, and they need not be repeated 
here. 

A legitimate question raised by Capitanig 
and Leger asks whether analyses by grades 
can provide any interesting or important 
answers. They concluded that they cannot. 
This question cannot be treated in isolation 
from other attempts to understand the evo- 
lution of behavior. For example, there is a 
lively controversy about whether the search 
for behavioral homologies is likely to be a 
profitable one (e.g., Atz, 1970; Hailman, 
1976; Hodos, 1976). Will the analyses of 
grades be profitable? Mayr (1976) stated, 
"The existence of minor and major grades is 
one of the most interesting phylogenetic phe- 
nomena, even though it is a phenomenon 
which we are still unable to understand ade- 
quately” (p. 450). And again, “То the evo- 
lutionary taxonomist the existence of grades 
seems often more significant and more mean- 
ingful biologically than the mere splitting 
of phyletic lines? (p. 451). Gould (1976) 
noted that 


lemur-monkey-ape-man is a caricature of primate 
Phylogeny; once we are sure that the species we 
study are representative of their grade, this same 
Sequence may well unravel the mysteries of neo- 
cortical function in successive levels of organiza- 
tion of the primate brain. . . . I do not tbink 
that analysis by grades is likely to be abandoned 
even in a pipedream world where phylogenies are 
laid out upon laboratory tables. (p. 121) 


It may well turn out that analysis by 
grades will not bear fruit, but better that 
this be the result of recognizing, testing, and 
rejecting the potential value of analysis by 
grades than the result of ignorance about 
grades' existence, 


Reference Note 


1. Jerison, Н. J. Smart dinosaurs and comparativt 
psychology. Paper presented at the meeting of 
the American Psychological Association, Toronto, 
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Two procedures for protecting the number of false rejections for a set of all 
possible pairwise comparisons were compared. The two-stage strategy of com- 
puting pairwise comparisons, conditional on a significant omnibus test, was 
compared with the multiple comparison strategy that sets a "familywise" crit- 
ical value directly. The analysis of variance test, the Brown and Forsythe test, . 
and the Welch omnibus test, as well as three procedures for assessing the sig- 
nificance of pairwise comparisons, were combined into nine two-stage testing 
strategies. The data from this study establish that the common strategy of 
following a significant analysis of variance F with Student's ¢ tests on pairs of 
means results in a substantially inflated rate of Type I error when variances 
are heterogeneous. Type I error control, however, can be obtained with other 
two-stage procedures, and the authors tentatively consider the Welch F’”—Welch" 
t" combination desirable. In addition, the two techniques for controlling Type 
I error do not substantially differ as much as might be expected; some two- 


stage procedures are comparable to simultaneous techniques. 


Given К independent samples of size mp, 
many experimenters are interested in testing 
the equality of the pairs of means. One can 
consider the complete set of means a family, 
in that if Ho m = p = mp = fon BK, 
is true, then it is also true that р; = шь for 
all possible pairs. The risk of making one or 
more Type I errors on the pairs is identified 
as the "familywise" risk of Type I error 


(FWI). This is contrasted with the risk of. 


making a Type I error on a single contrast, 
which is labeled the per-comparison Type I 
error rate (PCI). Two general procedures can 
be used in this situation. A simultaneous 
multiple comparison procedure such as Tukey's 
(Note 1) wholly significant difference test 
compares all pairs using a critical value (CV) 
that controls FWI. 


This research was orted in part b 
Council Grant 451-7068. and in PAR by epu 
sylvania State University Computation Center. 
Requests for reprints should be sent to H. J. Kesel- 
man, Department of Psychology, University of Mani- 
toba, Winnipeg, Manitoba, Canada R3T 2N2. 


The second, more likely used procedure isa 
two-stage strategy. The control of FWI is 
accomplished by an initial omnibus test such 
as the analysis of variance (anova) F test. 
Only if this first stage is significant does еј 
user proceed to test the pairs. Then the pairs 
can be tested by using a critical value that 
controls only PCI. Such a critical value 5 
always smaller than the above FWI CV. 
Consequently, it automatically follows that if 
one reaches the second stage, the tests on palts 
will have greater power than the tests on pails} 
that directly control FWI via a larger СМ 
However, since this second stage is equiv y 
alent to doing K(K — 1)/2 pairs of £ tests 
the total risk of Type I error rises with K, 
so that only the first stage provides FWI 
| am in this process. This procedure 

only referred to as the protected least 
significant difference (LSD) technique, W% 
introduced by Fisher (1949) and was recom 
mended by Carmer and Swanson (1973) ove | 
most multiple comparison procedures. " 

Unfortunately, Carmer and Swanson 
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only the omnibus ANOVA and investigated only 
the equal sample size and homogeneous 
variance condition in their study. The ANovA 
| js known to be sensitive when unequal sample 
sizes are combined with unequal population 
variances (Box, 1954), so that this omnibus 
test often does not provide acceptable control 
"of FWI when sample sizes are not equal. 

However, two alternate omnibus tests for 

mean equality have been shown to be more 

obust than the anova F (Brown & Forsythe, 
1974; Kohr & Games, 1974). 

The Brown and Forsythe test, like the 
ANOVA, uses just sample sizes in the numerator 
to weight means, whereas the multisample 
Welch (1951) test weights the means by 
sample variances as well as by sample sizes. 
However, both tests, rather than obtaining 
a CV based on the usual ANOVA error degrees 

| of freedom (df), obtain a modified CV that is a 
function of sample variances and sizes. Conse- 
quently, either of these tests may provide the 
stable FWI control needed to make the two- 
stage procedure robust. However, Type I error 
control still may not be achieved when 
Student's / tests are used to assess the pairwise 
comparisons following a significant first-stage 
omnibus test, particularly when the omnibus 
test is the anova; that is, prior literature (e.g. 
Box, 1954; Boneau, 1960) suggests that two- 
stage procedures using a pooled within-cell 
estimate of error variation in either stage 
should not maintain the Type I error rate at 
the significance level when variances are 
heterogeneous. 

Type I error control may be achieved by 
adopting follow-up procedures that are in- 
tended to counteract the effects of variance 
heterogeneity. The contributions by Welch 
(1947) and Hochberg (1976) are applicable. 

The former technique uses the sample variances 

and sizes to obtain a modified CV, whereas by 

adopting Hochberg’s work one uses a follow-up 
test statistic that has a nonpooled estimate 
of the standard error of the mean difference. 

The present study, therefore, compared the 
rates of Type I error for nine LSD two-stage 
Procedures to empirically verify in particular 
that the anova F followed by Student's t 
sequence does not provide control of FWI and 
to find an improved two-stage strategy. To 
Provide recommendations for controlling the 


n 
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overall rate of Type I error for a set of pairwise 
comparisons, we compare our results with 
simultaneous multiple comparison procedures. 


Definition of Test Statistics 


Let Xi represent the ith observation in the 
kth group, where? = 1,...,m,k=1,...,K 
and У n= №. The Xas are independent 
normal variates with expected values иь and 
variances су. The best linear unbiased esti- 
mates of дь, с and о? are 


Х..= X Ха/ль 
=D (Ха — Х. ут — 1), 
MSW = > (Хи ont X.)?/(N — K), 


respectively (MSW = mean square within 
groups). 

The omnibus test statistics are as follows: 
For anova F, 


Em. -X.Y/K—- 1) 
Ке ха Xa - E 
dk 
where Ñ.. = У, n:X .+/N. When the popu- 
lation variances are homogeneous, F is dis- 
tributed as a F variable with К — 1 and 


N — K degrees of freedom. For Brown and 
Forsythe's (1974) F*, 


Уп...) 
P= У (1 њу) ° 
k 
where F* is approximately distributed as F 
with K — 1 and f degrees of freedom and f is 


obtained with the Satterthwaite (1941) ap- 
proximation, 


1/f = > cg (m. — 1); 
сь = (1 — тај У узе 2 а — ny/ №) 2]. 


For Welch's (1951) F”, 


PY = Пра a – Xy – y 
1+ (2K – 2/(€ - 0] 


X (E[1/(m — 518 — 1,/2: ew), 


4 
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where w: = m/s and 


K Ек 
Ex? = У wX. Wk. 


The Welch statistic is approximately dis- 
tributed as an F variable with K — 1 and 


w = (К? — 1)/{3 X [1/ (m — 1)] 
X (1 — тур. we)*} 
degrees of freedom. 
The pairwise comparisons can be assessed 
for statistical significance with the statistic 


t= (X. k= X м) Гезе (c2/ny + cy? ny.) | 


to test the null hypothesis, шь — и» = 0 
(k = Е). The ! designation is not meant to 
imply that this statistic fits Student's / 
distribution, particularly under heterogeneous 
variances, 

The estimated standard errors of the mean 
differences for the pairwise tests are (a) 
Student’s pooled denominator, 


(MSW/n, + MSW/ny); 
(b) Hochberg’s (1976) nonpooled denomi- 
nator, 

[2 max (s/n sy?/ny)]*; 


and (c) the Behrens (1929)-Fisher (1935) 
denominator (used with a modified CV), 


(s/n: + за ти). 


The three forms of ¢ use a Кају CV. The error 
df for Student’s and Hochberg’s /s equals 
N — K. The variable error df (vw) for the third 
denominator is Welch's (1947) solution for 
Yw, Where 


Nis (s/n, + з/п)? 
"o Gm) | (пути 
m—1 пе — 1 


Methods of the Simulation Study 


Pseudorandom normal Observations of sizes 29, 41 
65, and 89 were obtained from the Marsaglia, Mac- 
Laren, and Bray (1964) random number generator.! An 
omnibus test statistic was then computed on the data. 
If the observed value of the omnibus test exceeded a 
5% critical value based on 3 and error df the FORTRAN 
program was used to compute the six pairwise com- 
parisons. This was repeated until 1,000 different 
significant results had been obtained on the omnibus 
tests. The average of the six per-comparison rates of 
Type I error (the average PCI) was then obtained by 
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dividing the total number of false rejections by 6,000; 
(6 comparisons X 1,000 simulations). | 

If the observed value of the omnibus test did not 
exceed its critical value, the computer program was 
set up to return to the random number generator, 
For normally distributed observations with means of 
zero and a common variance of one, approximately 
20,000 calls to the generator are necessary to obtain 
1,000 significant omnibus tests for a 5% level of 
significance. To optimize programming efficiency, each 
omnibus-to-pairwise-comparisons combination was run 
separately. However, the starting numbers for the 
random number generator were kept the same for 
combination, and consequently each of the omnibus 
tests started with the identical set of data. 

A heterogeneous variance condition was also in- 
vestigated. The unequal variances (.104, .790, .810, 
and 2.296) were inversely paired with the unequal 
sample sizes (i.e., smallest g4? with largest n and largest 
a? with smallest m). This particular type of pairing 
was chosen because it delineates the case in which the 
rates of Type I error are inflated in tests using MSW. 


Results and Discussion 


Table 1 presents the results of the simulation 
study. The FWI values associated with each 
omnibus test are the probabilities that the 
omnibus test will yield a significant result and 
that the pairwise tests using PCI CVs will 
result in at least one significant pair. Since the 
latter usually happens whenever the omnibus 
test is significant, the FWI values in each of 
these columns are basically the same. Clearly, 
the major determinants of these FWI values 
are the characteristics of the omnibus tests 
themselves. The average PCIs are the average 
values of the per-comparison rates of Type I 
errors on the six comparisons of each experi- 
ment, given that a significant omnibus test 
was obtained; this is a conditional probability. 

Looking at the top set of values in Table 1 
under the homogeneous variance condition, 
one sees that all of the tests are very similar 
and perform about as expected when this 
major assumption has been met. The PCI 
averages are higher than .05 because they arè 
conditional probabilities (which were com- 
puted here only when the sample means were 
sufficiently divergent to yield a significant] 
omnibus test). In summary, the values in the 
upper section of the table suggest that any 9' | 


1 See Golder and Settle (1976) and Payne (1977) 5 
a description and evaluation of the Marsaglia, 
Laren, and Bray (1964) random number generator: 
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Type I Error Rates for Various Two-Stage Procedures 
E LLLC—ÉÉ——————M————————————— 


Brown and Forsythe's 


ANOVA F (1974) F* Welch's 
(1951) F" 
Average Average 
Procedure used on pairs PCI FWI* PCI ЕР PCI ЕМІ 
т 76 па Am тета) ор = о? = ой = с? 
Student's pooled ¢ 379 :049 .375 .049 .367 .048 
Welch's (1947) 1^" .365 .049 .370 .049 .372 .048 
Hochberg's (1976) t .288 .047 .295 .048 .299 047 
т < т <т < па; of >00 > оў > см 
Student's pooled # 460 1774 .521 .064* 462 049! 
Welch's (1947) 1” 214 A19 .396 .064 .380 .050 
Hochberg's (1976) t 072 .052 A41 .033 211 039 


*FWIs based on 20,476 simulations. 
simulations. ¢FWIs based on 5,660 simulations. 
| 19,896 simulations. 


the omnibus tests can be used as the first step 
in the two-stage procedure when one has 
homogeneous variances. 

The values in the lower section of Table 1 
indicate that the anova F is not to be trusted 
as the first stage when variances are hetero- 
geneous and sample sizes are unequal. The 
FWI values are substantially above .05, except 
when the anova is followed by Hochberg’s 
(1976) conservative test. Probably the most 
commonly used technique is to calculate an 
ANOVA F as the first step and then follow this 
by Student's ¢ tests if the F is significant. It 
is disturbing to find that under the hetero- 
geneous variances condition with nominal 
alphas of .05, the FWI value is about .18 and 
that given a significant F, the mean proba- 
) bility of a Type I error in the comparisons that 
| follow is .46. 

For the unequal sample size-unequal vari- 
ance case, the other two omnibus tests provide 
Teasonable control of FWI, with the Welch 
(1951) F” being slightly superior to the Brown 
and Forsythe (1974) F*. However, although 
either is an improvement over the ANOVA F, 
it would be a mistake to follow either of these 
Omnibus tests with a follow-up test that made 
Use of MSW, since the standard errors based 
on this value would be inappropriate for 
Various pairwise comparisons of the means 
(Games & Howell, 1976, p. 119). Thus, the 


D 


| Note. PCI = per-comparison Type I error rate; FWI = familywise risk of Type I error. 
b FWIs based on 20,508 simulations. 
• FWIs based on 15,487 simulations. 


*FWIs based on 20,912 
t FWIs based on 


choice for a follow-up test is limited to the 
Welch /" (1947) or the Hochberg (1976) ¢ 
procedures. All LSDs using the Hochberg / 
effectively controlled the rate of Type I error. 
However, the rates were conservative when 
the omnibus test used was F* or F". Because 
of this conservativeness we would expect these 
two LSDs to be relatively less powerful. The 
Welch F"-Welch /"" LSD also provided Type I 
control, but was not conservative and conse- 
quently should be relatively more powerful. 
However, the F*-Welch /" LSD proved to be 
slightly liberal and can be discounted if the 
user wants to maintain the number of Type I 
errors at or below the significance level. 
Though not presented here, Type I error 
rates per family were collected to compare 
the Welch LSD(F'"-/^) and simultaneous 
multiple comparison approaches? Interest- 
ingly, the per-family rates were not very 
disparate. Consequently, though our preference 
is for the simultaneous multiple comparison 


2The per-family Type I error rate is equal to the 
number of Type I errors made on the 6,000 comparisons 
(after a significant omnibus test) divided by the total 
number of families, or, here, by experiments run in the 
simulation (Miller, 1966, p. 5). The rates for the 
multiple comparison procedures (Tukey’s, Note 1, 
procedure using the Welch, 1947, CV as suggested by 
Games & Howell, 1976) were obtained in another 
simulation study. 
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approach in testing pairwise comparisons, the 
data indicate that some two-stage least 
significant difference procedures can indeed 
provide Type I error control. 


Reference Note 


1. Tukey, J. W. The problem of multiple comparisons. 
Unpublished manuscript, Princeton University, 
1953. 
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Human Spatial Abilities: Psychometric Studies and 
Environmental, Genetic, Hormonal, and Neurological Influences 


A Mark G. McGee | 
Texas A & M University 


The spatial abilities literature is reviewed. Psychometric consideration encom- 
passes both factor analytic studies that conclusively demonstrate the existence 
of at least two Spatial factors—Visualization and Orientation—and predictive 
validity studies that argue for these factors’ social relevance. Sex differences in 
various aspects of perceptual-cognitive functioning (e.g., mathematics, field in- 
dependence) are interpreted as a secondary consequence of differences with 
respect to spatial visualization and spatial orientation abilities. Sources of var- 
iation in performance on spatial tests including environmental, genetic, hor- 
monal, and neurological are considered, with special emphasis on age and sex 
differences. Evidence that variation in spatial test scores is to some degree 


heritable remains positive; 


however, the X-linked recessive gene hypothesis 


that has served as a tentative explanation for sex differences in spatial abilities 


and for the mode of genetic transmissi 


studies, Neurological studies showing 


ion is not supported strongly in recent 
variations in the lateral organization of 


the human brain provide experimental evidence for a structural source of the 


variation in spatial abilities, and this еуі 


dence is reviewed as it relates to hu- 


man handedness and cerebral bilateralization for spatial and linguistic functions. 


The purpose of this article is threefold: 
(a) to summarize psychometric studies of 
human spatial abilities, (b) to examine the 
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consistencies and disagreements in relation 
to the hypothesis that sex differences in 
various aspects of perceptual-cognitive func- 
tioning (e.g., mathematics, field dependence – 
field independence) are a secondary conse- 
quence of differences with respect to spatial 
visualization and spatial orientation abilities, 
and (c) to review the literature with refer- 
ence to environmental, genetic, hormonal, 
and neurological influences that interact in 
producing individual variation in spatial test 
scores. 


Psychometric Studies 


Early Factor Analytic Studies 


Historically, the identification of the Spa- 
tial factor has roots in the study of mechani- 
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cal aptitude (Cox, 1928; Paterson, Elliott, 
Anderson, Toops, & Heidbreder, 1930; Smith, 
1964; Stenquist, 1922) and practical ability 
(W. P. Alexander, 1935; Kohs, 1923; Mc- 
Farlane, 1925). In one of the earliest studies 
of "practical ability," McFarlane found evi- 
dence for a group factor over and above gen- 
eral intelligence (g). She described individ- 
uals in possession of the practical ability 
underlying this factor as being adept at judg- 
ing concrete spatial relations. 

Since 1925, numerous factor analytical 
studies have yielded a Spatial factor mathe- 
matically distinct from verbal ability. Kelley 
(1928) identified a Spatial factor and de- 
scribed it as the mental manipulation of 
shapes, Brown and Stephenson (1933) found 
that two tests in particular had substantial 
loadings on their Spatial factor: fitting 
shapes, which is a paper-form-board test, and 
a test of dot perception. Koussy (1935) 
identified a group factor (K) among 28 tests 
administered, and he concluded from intro- 
spective reports by participants in the study 
that the mental processes active in the solu- 
tion of problems involving the K factor were 
characterized by the “ability to obtain and 
the facility to utilize, spatial imagery" (p. 
86). A similar conclusion was reached inde- 
pendently by Smith (1938). Murphy (1936) 
factor analyzed scores from numerous verbal, 
nonverbal, and mechanical tests and con- 
cluded that mechanical ability included two 
factors; Speed of Eye-Hand Coordination 
and Mental Manipulation of Spatial Rela- 
tions. Clarke (1936) reported the Spatial 
factor loadings of spatial and verbal tests 
to be inversely related among girls ranging 
in age from 12 to 15 years, a relationship 
that has been replicated with males as well 
as females (Andrew, 1937a, 1937b; Brover- 
man & Klaiber, 1969; Emmett, 1949; Estes, 
1942; Heston, 1943; Morris, 1939; Slater, 
1940; Smith, 1938; Swineford, 1948; Thur. 
Stone, 1944, 1947; Wittenborn, 1945). This 
relationship is less likely to obtain for spa- 
tial-performance tests highly correlated with 
general intelligence (e.g., the Block Design 
subtest of the Wechsler IQ scales, or the 
Kohs Block Design Test that correlates .80 
with Binet IQ Scores, Kohs, 1923). 
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Recent Factor Analytic Studies 


Several recent studies have discussed Й 
presence of a Spatial factor among batter 
of tests administered (DeFries et al., 197 
DeFries et al., 1976; Goldberg & Mered 
1975; Gough & Olton, 1972; Hakstiam 
Cattell, 1974; Yen, 1975; Zonderman, 
denberg, Spuhler, & Fain, 1977), and of 
have shown a Spatial factor that reli 
appears in cognitive test scores of dil 
racial, ethnic, and socioeconomic (8 
groups. Michael (1949), for instance, foun 
a generally similar factor structure a 
ability measures on black and white 
Force cadets, Flaugher and Rock (1972) 
found similar factor structures in a Байё 
of 9 cognitive measures administered | 
black, white, Mexican American, and 0 
ental American high school males. Amon 
Americans of European and Japanese 
cestry, DeFries et al. (1974) found four 
tors in the 15 cognitive tests administe 
Spatial Visualization, Verbal, Memory, 
Perceptual Speed. Humphreys and Та 
(1973) found six factors in 21 ability 
sures from the Project TALENT test batter 
and the factor structure was similar for 9 
grade boys in the top and bottom quartil 
on SES. Backman (1971, 1972) compat 
12th-grade Jewish and non-Jewish whites а 
Oriental Americans in performance on 8 
mental ability factors— Spatial, Verbal, 
glish Language, Math, Perceptual Speed, al 
Memory—and found ethnicity and SES 
have much less influence on the ability 
file than did sex. Similar subtest inten 
relation patterns have been reported 
Nichols (1971) and Scarr-Salapatek (19 
for U.S. whites and blacks. These st 
all agree with the suggestion “that unde 
ing dimensions of ability vary little i 
all across U.S. racial-ethnic groups” (108 
lin, Lindzey, & Spuhler, 1975, p. 179). 9 

An equally important emphasis of 16 
factor analytic research has been that 
disentangling various subabilities that CM 
acterize the Spatial factor. The available у 
dence conclusively demonstrates the € 
ence of at least two Spatial factors: ҮШ 
ization and Orientation, Table 1 presen? 
summary of Spatial Visualization and Spå 
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Orientation factor symbols and descriptions. The first clear evidence for the existence 
Although factor names and symbols differ in of spatial abilities resulted from an impres- 
the four studies cited in Table 1, factor de- — sive series of factor analytic studies initiated 


scriptions are strikingly similar. by L. G. Humphreys of the Army Air Force 
Table 1 
Summary of Spatial Visualisation and Spatial Orientation Factor Symbols and. Descriptions 
Spatial visualization factor Spatial orientation factor 
Investigator Symbol Description Symbol Description 
Guilford V, Ап ability to imagine the rotation SR An ability to determine relation- 
* and Lacey, of depicted objects, the folding ships between different spa- 
(1947) or unfolding of flat patterns, tially arranged stimuli and re- 
the relative changes of position sponses and the comprehension 
of objects in space, the motion of the arrangement of elements 
of machinery. This visualiza- within a visual stimulus 
tion factor is strongest in tests pattern. 


that present a stimulus pic- 
torially and in which some 
manipulation or transformation 
to another visual arrangement 
is involved. 


"Thurstone S, Ап ability to visualize a configur- S, An ability to recognize the 
(Note 1) ation in which there is move- identity of an object when it is 
ment or displacement among seen from different angles or an 

the internal parts of the con- ability to visualize a rigid con- 

figuration. figuration when it is moved into 


different positions. 

S, An ability to think dbout those 
spatial relations in, which the 
body orientation of the ob- 
server is an essential part of the 


problem. 

French (1951) V, Ап ability to comprehend im- S An ability to perceive spatial 
aginary movements in three- patterns accurately and to 
dimensional space or the ability compare them with each other. 
to manipulate objects in the SO An ability to remain unconfused 
imagination. by the varying orientations in 


which a spatial pattern may be 
presented. Dimensionality is 
less important to the factor 
than the rotational position of 
presentations. 


Ekstrom, VZ An ability to manipulate or trans- S An ability to perceive spatial 
French, and form the image of spatial pat- patterns or to maintain orien- 
* Harman (Note 3) terns into other arrangements; tation with respect to objects in 
requires either the mental re- space; requires that a figure 

structuring of a figure into be perceived as a whole. 


components for manipulation 

or the mental rotation of a 
spatial configuration in short 
term memory, and it requires 
performance of serial operations, 
perhaps involving an analytic 
strategy. 


Note. Adapted from Michael, Guilford, Fruchter, and Zimmerman (1957, Table 1, p. 188). 
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(AAF) (Guilford & Lacey, 1947; Guilford & 
Zimmerman, 1947). The results, based on 
repeated analyses from AAF tests adminis- 
tered to thousands of military personnel, in- 
dicated two Spatial factors: Spatial Visual- 
ization (V,) and Spatial Relations (SR). 
Visualization was described as the ability to 
imagine the rotation of depicted objects, the 
folding or unfolding of flat patterns, the 
relative changes of position of objects in 
space, or the motion of machinery. Spatial 
Relations was described as comprehension of 
the arrangement of elements within a visual 
stimulus pattern. 

Thurstone (1938) had isolated a Space 
factor that he designated as a facility in spa- 
tial and visual imagery, but it was not until 
1950, when he found several primary abilities 
in visual thinking and related three of these 
to visual orientation in space, that the dif- 
ferentiation between them became better 
understood. The first of these three factors 
was designated S, (Thurstone, Note 1). He 
asserted that it represented the ability to 
recognize the identity of an object when it 
was seen from different angles. This factor 
was also characterized by the ability to 
visualize a rigid configuration when it was 
moved into different positions. The second 
factor identified by Thurstone (Note 1), So, 
was said to represent the ability to imagine 
movement or internal displacement among 
the parts of a total configuration, Thurstone's 
distinction between abilities to imagine trans- 
formation of wholes (Si) versus parts (So) 
is unique and needs verification, The third 
factor (S4) was identified by Thurstone as 
the ability to think about those spatial re- 
lations in which the body orientation of the 
Observer is an essential part of the problem. 

In 1951, French identified a Visualization 
factor (Vi), described as the ability to men- 
tally manipulate three-dimensional objects, 
and an Orientation factor (SO), described 
as an ability to remain unconfused by the 
varying orientations in which a spatial pat- 
tern may be presented. 

French, Ekstrom, and Price (Note 2) de- 
Scribed V, as an ability to manipulate or 
transform the image of spatial patterns into 

other visual arrangements and spatial orien- 
tation as an ability to perceive spatial pat- 
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terns or maintain orientation with respect to 
objects in space. More recently, Ekstrom, 
French, and Harman (Note 3), in their 
manual for Kit of Factor-Referenced Cogni- 
tive Tests, have suggested that visualization 
ability requires that a figure be mentally re. 
structured into components for manipulation, | 
whereas the whole figure is manipulated in 
spatial orientation, Both spatial orientation 
and visualization require short term visual 
memory, Orientation requires only mental 
rotation of the configuration; however, vis- 
ualization, according to this more recent defi- 
nition, requires both rotation and the per- 
formance of serial operations. The require- 
ment that a figure is perceived as a whole in 
spatial orientation but must be mentally re- 
structured into components for manipulation. 
in visualization (Ekstrom et al., Note 3) is 
reminiscent of Thurstone's (Note 1) distinc- 
tion between abilities to imagine transforma 
tion of wholes (S,) and parts (S;). | 
Corroborative evidence for the existence 
of at least two Spatial factors has been pro- 
vided. Guilford, Fruchter, and Zimmerman 
(1952), for example, found 12 factors іп a 
test battery of 46 tests (including a Spatial 
Relations and a Visualization factor). The 
Visualization factor was represented as al 
ability to mentally manipulate elements of 
a pattern, and the Orientation factor wa 
represented as an ability to determine spatial 
orientation with respect to one's body. 


Summary | 


Factor analytic studies of the Spatial fac- 
tor began with the study of practical and 
mechanical ability during the mid-19205. 
Some investigators have found evidence n 
distinct factors of practical and mechanic 
ability, in addition to a Spatial factor; others 
have refuted this distinction (Dempsteh 
1948; Leff, 1949; Price, 1940; Slater, I. 
Watts, 1953; Williams, 1948). Althoug 
the debate over the existence versus non 
existence of a Spatial factor character 
much of the literature prior to 1930, а Рё 
thora of factor studies since that date na 
Provided strong and consistent support #0 
the existence of two distinct spatial abilities 
visualization and orientation. 


Џ 
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As for visualization, the Sə factor proposed 
by Thurstone (Note 1) is similar to the V; 
factor proposed by French (1951), the V, 
factor reported by the AAF researchers 
(Guilford & Lacey, 1947), and the V, factor 
described by Ekstrom et al. (Note 3). All 
involve the ability to mentally manipulate, 
rotate, twist, or invert a pictorially presented 
stimulus object. The underlying ability seems 
to involve a process of recognition, retention, 
and recall of a configuration in which there 
>. is movement among the internal parts of the 
configuration (Ss) ог the recognition, reten- 
tion, and recall of an object manipulated in 
three-dimensional space (Vi) or which in- 
volves the folding or unfolding of flat pat- 
terns (V,). The Spatial Visualization Test 
of the French et al. (Note 2) Kit of Reference 
Tests of Cognitive Factors, for example, re- 
quires the examinee to mentally fold and 
unfold a piece of paper and choose the al- 
ternative that represents the paper after it 
has been unfolded. The Guilford-Zimmerman 
(1953) Visualization Test consists of a pic- 
ture of an alarm clock and a sphere with 
directional arrows, and the examinee is re- 
quired to visualize. the rotation of the clock 
as it is moved into different positions accord- 
ing to the directions of the arrows. 

As for spatial orientation, the S; and Ss 
factors proposed by Thurstone (Note 1), the 
SO factor proposed by French (1951), the 
SR factor proposed by Guilford and Lacey 
(1947), and the SO and S factors described 
by Ekstrom et al. (Note 3) are similar. All 
involve the comprehension of the arrange- 
ment of elements within a visual stimulus 
pattern and the aptitude to remain uncon- 
fused by the changing orientation in which 
а spatial configuration may be presented. 
The Spatial Orientation Test of the French 
set al. (Note 2) Kit of Reference Tests of 
Cognitive Factors requires the examinee to 
compare cubical blocks and indicate whether 
they are the same or different according to 
symbols written on their faces. The Guilford- 
Zimmerman (1953) Spatial Orientation Test 
requires the examinee to imagine riding in 
а boat whose prow is always visible in the 
foreground of the pictures that comprise 
each item and to choose among the alterna- 
tive new directions of the boat. 
` 
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-Regarding factor analytic studies of spatial 
abilities in general, some qualifications need 
to be made. First, after 70 years of psycho- 
metric research, there is still vast disagree- 
ment about just how best to classify stan- 
dard tests of spatial abilities. Further factor 
analytic studies are indicated. Second, the 
influence of test-item difficulty on factor 
structure needs to be further explored. 
Myers’ (Note 4) suggestion that visualiza- 
tion test items are usually more difficult than 
orientation items has been supported (Zim- 
merman, 1954a, 1954b), although not repli- 
cated. Third, spatial tests consisting of both 
two- and three-dimensional items are used 
with equal frequency, but little is known 
about how dimensionality contributes to the 
factor structure of spatial tests. Fourth, stud- 
ies showing positive correlations between 
tests of spatial visualization and orientation 
(Borich & Bauman, 1972; Goldberg & Mere- 
dith, 1975; Karlins, Schuerkoff, & Kaplan, 
1969; Roff, 1952; Yen, 1975) illustrate the 
need for further factor analytic research to 
clarify the specificity of the Visualization 
and Orientation factors. 


Predictive Validity Studies 


Information from predictive validity stud- 
ies reinforces our judgment concerning the 
practical use of visualization and orientation 
abilities and proves their social significance. 
Historically, predictive validity has been im- 
portant in the selection of candidates to fill 
openings in industry, colleges, and the armed 
forces. Predictive validity is at issue when 
the purpose of an instrument is to estimate 
some important form of behavior (Nunnally, 
1978). The degree of relationship between 
an instrument and its criterion indicates the 
instrument's predictive validity. The use of 
tests to predict success in school is a widely 
accepted practice, and the educational in- 
stitution beyond the elementary school level 
that does not use a standardized testing pro- 
cedure in its evaluation of incoming students 
is now the exception. 

To what practical use can we direct our 
knowledge of human spatial abilities? What 
are the implications of this knowledge for 
use in everyday life? And to what extent is 
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this knowledge useful in predicting socially 
significant and relevant behavior? We have 
two lines of evidence directly relevant to 
these issues: first, the literature regarding 
the value of spatial tests for selecting work- 
ers for industrial jobs and predicting job 
performance and, second, the literature re- 
garding the use of spatial tests for the pre- 
diction of success in vocational-technical 
training programs. 

Selecting workers for industrial jobs and 
predicting job performance. The value of 
spatial tests for use in making personnel 
selection decisions has been well documented 
elsewhere. Ghiselli (1966, 1973), for ex- 
ample, has summarized much of the litera- 
ture relevant to the predictive validity ques- 
tion of occupational aptitude tests. The U.S. 
Employment Service (1957) has listed those 
occupations requiring a high level of spatial 
ability. Four job categories—engineering, sci- 
ence, drafting, and designing—account for 
nearly 85% of all jobs listed. 

Predicting success in vocational-technical 
training programs. The earliest evidence for 
the prediction of success in school work with 
a test specifically designed to measure spa- 
tial ability was provided by O’Connor (cited 
in Smith, 1964). Using the O’Connor Wiggly 
Block Test, he found predictive validities of 
:62 and .42 for shop grades in two groups of 
vocational school boys. 

Paterson et al. (1930), in their massive 
investigation of mechanical ability, found 
spatial tests to be especially useful in pre- 
dicting success in various junior high school 
and technical school courses, The spatial test 
battery (consisting of several tests including 
the Minnesota Paper Form Board Test, the 
Link Spatial Relations, and the Packing 
Blocks Tests) yielded a multiple correlation 
of .60 with success in shop courses and cor- 
related .07 with intelligence. 

Holliday (1940) administered a battery of 
spatial tests to several groups of trade school 
apprentices, engineer apprentices, and shop 
students, The verbal test administered corre- 
lated .07 with proficiency in. technical draw- 
ing, whereas the corresponding correlation of 
the spatial battery was .66. In a subsequent 

investigation, Holliday (1943) administered 
a series of spatial, mechanical, and verbal 
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tests to several groups of toolmakers and 
engineering and trade apprentices. Again it 
was found that the spatial tests yielded 
higher correlations with mechanical drawing 
than did the verbal tests administered, Slater 
(1941) conducted a validity study in which 
he used seven spatial tests in addition to a 
test of verbal ability to predict criterion esti- 
mates of engineering drawing ability and 
general apprenticeship ability. Drawing abil- 
ity correlated .41 with composite scores on 
the spatial tests but only .26 with verbal) 
ability. Shuttleworth (1942) found that tests 
of spatial ability showed higher correlations 
with grades obtained in junior technical 
school than tests of verbal and mechanical 
ability. Although the correlation between 
grades and the verbal test administered was 
only .20, the corresponding correlation for 
the Space Perception, Memory for Design, 
and Form Relations Tests were .46, .45, and 
.44, respectively. Hunter (1945) used the 
Minnesota Paper Form Board and the Otis 
Intelligence Test (Form B) among other 
tests in predicting course success for sopho- 
more and junior machine shop students. 
Correlations against this criterion were 45 
for the spatial test and .28 for the intelli- 
gence test, Smith (1948) administered 11 
tests to first- and second-year pupils in 4 
Scottish secondary school. The spatial bat 
tery included tests of area discrimination, 
completion, fitting shapes, form equations 
classification, form-figure analogies, form 
recognition, pattern perception, and drawing. 
The Otis Intelligence Test (Form В) №8 
also used. The spatial test battery was Pr 
dictive of success in engineering drawing 
(r = 66) and art (r = .39). Correlations bê- 
tween grades in these two subjects and the 
Otis Intelligence Test were —.07 and 19, 
respectively. IM 

Two validity studies reported by smig 
(1964), corroborating the results of md 
presented above, were conducted by Kaigi 
(1949) and by Oxlade (1951). Knight Е 
ported results from а follow-up study Е 
Middlesex Junior Technical School ental 
examinees to determine the validity of |. 
tial tests in predicting successful completi" 
of the program. The battery of tests E. 
ministered included a form relations tese 
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| memory-for-designs test, and a space per- 
teption test. Headmaster's ratings were also 
obtained. The criteria included performance 
in the program at the end of the first and 
second years. Correlations between predictor 
and criterion variables demonstrated the 
superiority of spatial tests over headmaster's 
ratings for selecting candidates. In the first 
year, the predictive value of headmaster's 
ratings and selection tests was similar, but, 
as the course became more advanced and 
technical in its second year, the predictive 
value of the selection tests became greater, 
whereas that of the headmaster's ratings de- 
creased. Martin (1951) found that a spatial 
relations test predicted shop achievement and 
classroom grades among 45 auto mechanics 
in a California technical school. And in a 
related study, Hunt (1973) demonstrated 
the usefulness of a spatial test in predicting 
„course achievement in a computational sci- 
ence curriculum. 

Smith (1964) reported a follow-up study 
of pupils selected for technical education in 
which the major objective was to assess the 
predictive validity of various selection tests, 
including tests of spatial and verbal ability, 
against criteria of success in technical courses 
in a secondary technical school. Data were 
collected on students in a number of techni- 
cal courses at 3 years after and at 5 years 
after the original selection examination. Both 
spatial tests that were used showed substan- 
tial correlations with criterion examination 
Scores in all technical courses. The most 
highly significant regression coefficients for the 
two spatial tests tended to be in the areas of 
metalwork, woodwork, handicraft, and draw- 
ing (geometrical, building, and engineering). 
In contrast, the verbal reasoning test that 
,Was used showed low validity for the pre- 
diction of success in these same areas. Simi- 
ar results had been provided from two pre- 
Viously conducted validity studies. Holzinger 
and Swineford (1946) examined the predic- 
tive validity of a battery of spatial tests 
With grades in various school courses and 
found that test scores correlated .002 with 
English, —.003 with biology, and —.06 with 
foreign language but .23 with plane geom- 
etry, 46 with shops and crafts, and .69 with 
drawing, The spatial tests included in the 
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battery were of visual imagery, cubes, 
punched holes, figures, form relations, pat- 
tern perception, and drawings. 

In related studies the predictive validity 
of spatial orientation and spatial visualiza- 
tion tests has been demonstrated. Hills 
(1957), for example, reported a validation 
study on the relationship between several 
measures of aptitude and success in college 
mathematics. The Spatial Visualization and 
Spatial Orientation Tests (Guilford & Lacey, 
1947) were among the best predictors of 
mathematical grades, showing higher validity 
coefficients than the verbal tests used. Kar- 
lins et al. (1969) investigated cognitive 
factors relating to architectual creativity 
among graduating achitecture students and 
found a significant correlation of .49 be- 
tween scores on Thurstone’s Cubes Test (a 
measure of spatial orientation, Thurstone, 
Note 1) and quality of independent work 
completed. And the space subtest of the 
Differential Aptitude Test battery (Bennett, 
Seashore, & Wesman, 1974) is predictive of 
drafting (7 = 42), shop mechanics (r= 
47), and watch repair (7 =.69), none of 
which is well predicted by verbal tests. 


Relationship Between Spatial Abilities and 
Various Perceptual-Cognitive Tasks 


Differential psychologists have been in- 
terested in spatial visualization and orienta- 
tion because male superiority on tasks re- 
quiring these abilities is among the most 
persistent of individual differences in all the 
abilities literature (Anastasi, 1958; Buffery 
& Gray, 1972; Garai & Scheinfeld, 1968; 
Harris, 1978; Maccoby & Jacklin, 1974; Mc- 
Gee, 1977; O'Connor, 1943; Sherman, 1971; 
Smith, 1964; Tyler, 1965). 

The widely documented sex difference on 
tests of spatial visualization and spatial ori- 
entation as well as on numerous tasks re- 
quiring these abilities does not reliably ap- 
pear until puberty (Drew, 1944; Emmett, 
1949; Fruchter, 1954; Gardner, Jackson, & 
Messick, 1960; Harris, 1978; Maccoby, 
1966; Slater, 1941; Witkin et al, 1954). 
However, in studies in which differences have 
been reported in younger samples, boys typi- 
cally showed superiority in performance (for 
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reviews, see Harris, 1978; Maccoby & Jack- 
lin, 1974; Smith, 1964), which is particu- 
larly puzzling in light of the general matura- 
tional advantage in physical and cognitive 
development enjoyed by girls (Garai & 
Scheinfeld, 1968; Maccoby & Jacklin, 1974; 
Money & Ehrhardt, 1972; Waber, 1976, 
1977). 

In this section we examine the consist- 
encies and disagreements in relation to the 
hypothesis that sex differences in various 
aspects of perceptual-cognitive functioning 
are a secondary consequence of differences 
with respect to spatial visualization and spa- 
tial orientation abilities. 


Visualization Ability 


Recall that spatial visualization involves 
the ability to mentally rotate, manipulate, 
and twist two- and three-dimensional stimu- 
lus objects. It is tempting to consider visual- 
ization as fundamental to good performance 
in various areas of mental functioning. Two 
areas are considered below: imagery and 
mathematical ability. 

Imagery. Galton's (1880, 1883) pioneer- 
ing work on individual differences in imagery 
led the way for researchers who have since 
been intermittently interested in the issue. 
The study of mental transformations of spa- 
tial images is returning to popularity in psy- 
chology and represents an attempt to ex- 
amine the “major representational alterna- 
tive” to language (Neimark & Santa, 1975) 
to more fully understand the mental pro- 
cesses involved in the solution of tasks that 
are difficult to solve by verbally mediated 
processes. For reviews, see Neimark & Santa, 
1975; Pylyshyn, 1973; Lane, Note 5). 

The empirical study of imagery has em- 
phasized the measurement of imagery vivid- 
ness and imagery control (Betts, 1909; Cos- 
tello, 1957; Galton, 1880, 1883; Gordon 
1949; Marks, 1972; Richardson, 1969, 1972: 
Sheehan, 1967a, 1967b). It has been sug. 
gested (Richardson, 1972) that spatial ma- 
nipulation involves imagery control and that 
Imagery vividness and control may be re- 
lated to individual diff 


erences in perform- 
ance on tests of spatia] ability, Those who 
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lack control over their imagery have 
described by Galton (1883) as having 
ficulty in shifting their mental view of 
Object and examining it at pleasure im 
ferent positions" (p. 75). As noted? 
Koussy (1935), solving problems on spat 
tests requires mental imagery and the abi 
to obtain and the facility to use visual: 
tial imagery. The role played by 5 
imagery in tests of spatial visualizing abil 
is unclear, however. 

Although no studies have examined the 
relationship between imagery міуій 
imagery control, amd performance on tesi 
spatial ability, the work by Shepard and 
associates on mental transformation of visi 
images has gained considerable atten 
They have shown that the reaction 
required for performing an instructed 
formation is a linearly increasing functior 
the number of rotations or foldings requi 
for determining whether two different 
ented objects have the same or di 
shapes (Shepard & Metzler, 1971) and 
determining whether arrows on two § 
will meet when a flat six-sided figure 
folded into a cube (Shepard & Feng, 197 
This research has advanced our underst 
ing of the nature of the underlying те 
process involved in the solution of spá 
visualization tasks such as those that ар 
on paper-folding and surface develop 
tests of spatial ability. Further, it sù 
the principle of “second-order isomorpAls 
(Shepard & Chipman, 1970), meaning 1 
the events occurring during “imagin 
of an external process (e.g. paper fold 
are similar to the events occurring e 
the “perceiving” of an external process: е 
orderly relationship between (a) time 
quired to recognize that two-persP 
drawings portray objects of the same 
dimensional shape and (b) the angular 6 
ferences in the portrayed orientations Oii 
two objects (Shepard & Metzler, 1971 
plies that while one is in the course of 
ining the external process, one passes th 
an orderly set of internal states of SP 
relation to the successive states of the 
ternal process (Shepard & Feng, ! 
Metzler and Shepard (1974) and © 
and Shepard (1975) have described exp 
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ments with college age males and females 
‘that confirm their earlier reports. No sys- 
tematic attempt to examine sex differences 
has been made, although the slope of the 
reaction time functions tended to be higher 
for females than males (Metzler & Shepard, 
1974), indicating that males require less 
time than females to solve Shepard's mental 
transformation tasks. 

Mathematical ability. One definition of 
mathematical ability has been proposed by 
Hamley (1935), a mathematician and a 
psychologist. According to this definition, 
mathematical ability is a compound of gen- 
eral intelligence, visual imagery, and ability 
to perceive number and space configurations 
and to retain such configurations as mental 
patterns, The extent to which spatial ability 
enters into mathematical ability is suggested 
by several validity studies. Hills (1957) in- 


. vestigated the relationship between various 


aptitude tests and criterion performance in 
college mathematics. The test battery in- 
cluded two spatial tests—one of visualization 
and one of orientation—from the Guilford- 
Zimmerman Aptitude Survey (Guilford & 
Zimmerman, 1953). Subjects were 148 stu- 
dents in three institutions. The two spatial 
tests had relatively high correlations with 
course performance (for visualization, 7 = 


' 23; for orientation, r = .22) compared to 


the verbal and reasoning tests administered 
(r = .06 each), which suggests a higher rela- 
tive importance of spatial ability than verbal 
ability in college mathematics. And the space 
subtest of the Differential Aptitude Test bat- 
tery (Bennett et al, 1974) is predictive of 
success in school geometry (7 = .57) and 
quantitative thinking (7 = .69). More re- 
cently, Eisenberg and McGinty (1977) have 
shown that spatial visualization test scores 
are higher among students in calculus courses 
than in other college courses. 

Corroborative evidence has been provided 
by Smith (1948) and Werdelin (1961). Male 
Superiority in understanding geometric prin- 
ciples and concepts has been reported (Saad 
& Storer, 1960), and Smith (1964) has sug- 
gested that the sex difference “may be an- 
other manifestation of the sex difference in 
Spatial ability, reflecting a greater capacity 
On the part of boys to perceive, recognize 
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and assimilate patterns within the conceptual 
structure of mathematics" (p. 123). 


Orientation Ability 


Recall that spatial orientation involves the 
comprehension of the arrangement of ele- 
ments within a visual stimulus pattern, the 
aptitude to remain unconfused by the chang- 
ing orientations in which a spatial configura- 
tion may be presented, and the ability to 
determine spatial orientation with respect 
to one's body. This definition raises the ques- 
tion of whether individual differences in vari- 
ous aspects of perceptual-cognitive function- 
ing are a secondary consequence of differ- 
ences with respect to spatial orientation 
ability. Empirical research in four areas is 
particularly relevant to this question: field 
dependence – field independence, sense-of- 
direction, Piagetian, and maze tasks. 

Field dependence – field independence. One 
of the more familiar tasks on which sex dif- 
ferences have consistently been found is that 
of field dependence – field independence (Wit- 
kin, 1950; Witkin, Dyk, Faterson, Good- 
enough, & Karp, 1962; Witkin et al., 1954). 

Two tests—the rod-and-frame test and the 
Embedded Figures Test—seem to have a 
strong spatial component. The rod-and-frame 
test, for instance, requires the examinee to 
adjust a rod to the vertical position in the 
absence of cues other than the luminescent 
square frame that surrounds the rod, The 
frame position as well as the position of the 
examinee may be tilted in various orienta- 
tions. Adult females tend to be more de- 
pendent on the field in determining the ver- 
tical position of the rod than males (Bogo, 
Winget, & Gleser, 1970; Corah, 1965; Gross, 
1959; Kato, 1965, Morf, Kavanaugh, & 
McConville, 1971; Okonji, 1969; Saarni, 
1973; Schwartz & Karp, 1967; J. Silverman, 
Bucksbaum, & Stierlin, 1973; Vaught, 1965; 
Witkin et al., 1962; Witkin, Goodenough, & 
Karp, 1967), and the sex difference is ap- 
parent in adolescents as well (Graves & 
Koziol, 1971; Keogh & Ryan, 1971; J. Sil- 
verman et al., 1973; Canavan, Note 6). 

The Embedded Figures Test requires the 
examinee to view and store in memory a 
simple geometric form and then to recall the 
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form by identifying it in a more complex 
geometric figure. The obvious spatial ele- 
ment in this task may account for the sex 
difference that is so widely documented for 
adolescents as well as for adults (e.g., Bieri, 
Bradburn, & Galinsky, 1958; Bigelow, 1971; 
Corah, 1965; Goldstein & Chance, 1965; 
Goodenough & Eagle, 1963; Graves & Koz- 
iol, 1971; Keogh & Ryan, 1971; Nash, 1973; 
Okonji, 1969; Witkin, 1950; Witkin et al., 
1954). Sex differences on these tasks are less 
reliable among children under the age of 11 
or 12 years (Maccoby & Jacklin, 1974; Wit- 
kin et al., 1954). 

Sherman (1967) has provided the major 
theoretical articulation of the relationship 
between sex differences in spatial abilities 
and sex differences in field dependence, argu- 
ing that the sex difference in field dependence 
is an artifact of the sex difference in space 
perception. A sizable body of literature sup- 
ports her hypothesis. Correlational studies 
have consistently demonstrated a strong re- 
lationship between tests of spatial orientation 
and measures of field dependence. Gardner 
et al. (1960) found correlations of Em- 
bedded Figures and  rod-and-frame test 
Scores with the Guilford-Zimmerman Spatial 
Orientation Test of .53 and .35, respectively. 
Thurstone (1944) reported correlations of 
43 and .41 between two forms of the Gott- 
schaldt Figures Test, similar to Witkin's Em- 
bedded Figures Test, and the Space Test of 
the Primary Mental Abilities test battery. 
This finding has been replicated by Podell 
and Phillips (1959), 

Factor analytic studies indicate that tests 
of spatial abilities and field dependence — 
field independence emerge together in a fac- 
tor similar in description to the Spatial Ori- 
entation factor discussed previously (Gard- 
ner et al., 1960; Hyde, Geiringer, & Yen, 
1975; Podell & Phillips, 1959; Thurstone 
1944) and that sex differences in field de- 
se are eliminated after removing dif- 
erences in spatial abiliti 
1975). р: abilities (Hyde et al, 

The presence of a Spatial component in 
tests of field dependence — field independence 
seems to be а prerequisite for the appearance 
of sex differences. Measures of field depen- 
dence other than the rod-and-frame and 


MARK G. McGEE 


Embedded Figure tests that do not have 
a spatial component (e.g., the rotator-match 
brightness constancy task and the body 
steadiness task) have not shown sex differ. 
ences (Witkin et al., 1954). In light of the 
spatial nature of both the rod-and-frame and | 
Embedded Figures tasks, the sex difference is) 
understandable and should be narrowly ine 
terpreted, not generalized into an all en 
compassing statement about cognitive style 
(Harris, 1978). 

Sense of direction. Spatial orientation 
ability is probably important in tasks re- 
quiring sense of direction. Berry's (1966) 
cross-cultural study comparing Eastern Ca- 
nadian Eskimos from Baffin Island with 
members of the Temne tribe in Africa sup- | 
ports this hypothesis. The directional sense 
among Eskimo males and females fostered 
by extensive travel in hunting is reflected in 
higher performance on several tasks requir- 
ing spatial ability, including the Morrisby 
Shapes Test and a test of field dependence- 
field independence. | 

Spatial ability may enter into another task 
of directional sense—map reading. Money, | 
Alexander, and Walker (1965) administered 
their Road Map Test of Direction Sense (0 
over 1,000 children ranging in age from 7 to 
18 years. The task consists of a schematic. 
outline map of several city blocks with @ 
standard route through the streets. Тһе ex- 
aminee is required to mentally follow the 
route, indicating verbally the direction of 
various turns (left or right with reference 
to point of origin). Males on the average 
performed significantly better than females 
and the differences were greatest between 
boys and girls of older ages. 

Piagetian tasks. Tuddenham (1970) has 
developed several quantitative Piagetian 
tasks that require some spatial facility 2? 
that show sex differences favoring mila 
The tasks are Perspectives, Water Level 
Tracks, Geometric Forms, and House P E | 
On the Perspective task, examinees are І 
quired to select from among several photo 
graphs the one that shows how a small p 
would look from various vantage Ww 
The Water Level task involves problems t’ ‘if 
deal with the principle that water гета! 
gravitationally horizontal regardless of 
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tilt of the water's container. The Geometric 
“Forms task involves the identification of flat 
patterns that can be folded to produce simple 
three-dimensional forms. Tracks, which in- 
volves the least amount of spatial skill, re- 
quires the examinee to correctly place a 
small car painted red on one side and blue on 
the other at various places on a spiral track. 
The House Plans task requires the construc- 
tion of block buildings. Males' mean scores 
on all tasks except Tracks were higher than 
females’ mean scores (Tuddenham, 1970). 
The Water Level task, developed initially 
by Piaget and Inhelder (1956), was used by 
Thomas, Jamison, and Hummel, (1973) who 
found that 31% of adult females but 84% 
of males had mastered the principle that 
water remains gravitationally horizontal re- 
gardless of the tilt of the water’s container. 
Early studies by Piaget and Inhelder (1956) 
ı determined that mastery of this principle 
occurred by about 12 years of age and that 
girls lag behind boys at various age levels 
(Thomas et al, 1973; Liben, Note 7; 
Thomas, Note 8). And Harris (1978) has 
suggested that the female lag in this prin- 
ciple's attainment is probably due to the spa- 
tial element of the task. Harris' hypothesis 
has been tested and supported by Geiringer 
, and Hyde (1976). They found correlations 
between average errors on the Water Level 
task and performance on a test of spatial 
orientation of —.83 for 12th-grade males 
and —.97 for 12th-grade females. Corre- 
sponding correlations for 5th-grade males 
and females were somewhat lower (—.65 for 
males and —.42 for females) and, although 
a significant sex difference was found in 
performance on both tests among 12th grad- 
ers, none was found among 5th graders. 
Analysis of covariance revealed that sex dif- 
ferences among 12th graders on the Water 
Level task disappeared once differences in 
Spatial orientation ability were removed. 
Maze tasks. Maze tasks were used as 
early as 1918 by Porteus in his attempt to 
design an alternative to measures of general 
intelligence and verbal ability. Designed for 
Use by human subjects, the Porteus Maze 
Test (Porteus, 1918) has become an im- 
Portant and widely used testing device within 
the discipline of psychology (Riddle & Rob- 
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erts, 1977). A sex difference showing male 
superior performance on the Porteus Maze 
Test has been a persistent finding in hun- 
dreds of studies since 1918 (Porteus, 1965). 
As in studies using standard tests of spatial 
ability, the sex difference does not reliably 
emerge until after age 11 or 12 (Batalla, 
1943; Langhorne, 1948; McNemar, 1942; 
Porteus, 1965) and has failed to emerge in 
children younger than 6 years of age (Matt- 
son, 1933; McGinnis, 1929). - 
Reference is seldom made in discussions of 
human spatial abilities to the considerable 
body of evidence that has shown a consistent 
superiority in maze-learning tasks for male 
rats (eg. Barnes et al, 1966; Barrett & 
Ray, 1970; Cowley & Griesel, 1963; Daw- 
son, 1972; Hubbert, 1915; McNemar & 
Stone, 1932; Sadownikova-Koltzova, 1926; 
Tomlin & Stone, 1933; Tryon, 1931). 


Summary 


We have illustrated the complexity of the 
problem suggested earlier—that of determin- 
ing the depth and breadth of the field within 
which a Spatial factor may be found. Spatial 
visualization seems to be required in various 
perceptual-cognitive tasks involving the men- 
tal transformation of visual images, and it 
has been shown to be important for success 
in college mathematics, especially geometry 
and algebra. 

Spatial orientation enters into such tasks 
as field dependence - field independence, map 
reading, and sense of direction. Various Pia- 
getian tasks and maze tasks requiring an ap- 
titude for remaining unconfused by changing 
orientations of a spatial configuration must 
certainly involve a strong spatial orientation 
element, There is an obvious need for fur- 
ther research to clarify the issues presented 
and to further specify the scope within which 
the Spatial Visualization and Spatial Orien- 
tation factors can be found, 


Sources of Variance in Spatial Test Scores 


An overwhelming impression conveyed by 
surveying the spatial abilities literature of 
the 1960s and 1970s in contrast to the pre- 
ceding 5 decades is the redirection of interest 
from factor analytic studies that have con- 
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clusively distinguished two spatial abilities, 
visualization and orientation, to both corre- 
lational and experimental studies aimed at 
determining sources of variance in spatial 
test scores. 

One conclusion concerning the factors that 
account for individual differences in spatial 
test scores is certain—the empirical evidence 
is compatible with a relatively broad range 
of intellectual positions. This evidence is re- 
viewed in four categories, environmental, 
genetic, hormonal, and neurological, with 
emphasis on age and sex differences. 


Environmental Influences 


The importance of experiential factors in 
the development of spatial skills has been 
suggested by Berry (1966) in a previously 
discussed study. Berry compared Eastern 
Canadian Eskimos from Baffn Island with 
members of the Temne tribe of Africa on 
a number of perceptual-cognitive abilities, 
Eskimos were less field dependent than the 
Temne and obtained higher mean scores on 
tests of spatial abilities. Although males in 
the Temne tribe performed significantly bet- 
ter than Temne females on these measures, 
there were no significant differences between 
male and female Eskimos. Unlike Temne 
females, Eskimo females tend to share 
equally with males in experiences of hunting. 
To survive, Eskimo hunters must travel ex- 
tensively on both land and sea and are re- 
quired to orient themselves in a relatively 
featureless array of visual stimuli. Wander- 
ing from the home in the activity of hunting 
presumably fosters a "directional sense" and 
facilitates spatial skill (Berry, 1966). Al- 
though the results seem to Suggest the im- 
portance of experiential factors in the de- 
velopment of spatial skills, cross-cultural 
differences are not necessarily environmental, 
particularly for long isolated groups such as 
the Temne of Africa and Canadian Eskimos. 

Differential training and Spatial perception. 
If experiential factors Were important in 
fostering high spatial abilities, we might ex- 
pect that training received in this area would 
result in improved performance on tests of 
spatial visualization and orientation. There 
1s some evidence that improvement in per- 
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ceptual judgments occurs as a function of 
controlled practice and training (Gibson, 
1953; Goldstein & Chance, 1965; Kato, 
1965; Salkind, 1976; Santos & Murphy, 
1960; Van Voorhis, 1941; Witkin, Note 9), 
although ordinary school curriculum offerings 
are not always effective in developing spatial 
perception to the asymptote of an individ- 
ual's ability (Brinkmann, 1966; Brown, | 
1954; Mendicino, 1958; Ranucci, 1952), | 
Blade and Watson (1955) reported 4 | 
nificant increases by students on a test of | 
spatial visualization during an engineering 
course. Positive effects have also been dem- 
onstrated by Brinkmann (1966). A small 
number of eighth-grade boys (n = 14) and | 
girls (n — 13) were instructed in selected 
concepts of geometry and mental pattern 
folding during mathematics classes over a 
3-week period. Posttraining test scores on i 
the Space Relations Subtest of the Differen-' 
tial Aptitude Test (Bennett et al, 1974) 
were significantly higher than pretraining 
test scores for the experimental groups but 
not for a control group. Dailey and Ney- 
man (Note 10) attempted to train vocational 
high school students on items similar to 
those found on two- and three-dimensional 
tests of orientation and visualization. Розе | 
training test scores obtained at the end of] 
the academic year were significantly higher 
than pretraining test scores. These gun 
were shown to be greater for the trained 
group of students than for a control group: 
Conflicting evidence in the literature | 
provided by Faubian, Cleveland, and Hassel 
(1942) who reported no differences on the 
Surface Development Test between a group 
of Air Corps recruits who had received trai | 
ing in drafting and blueprint reading and 4 | 
matched control group. Churchill, Curti x 
Coombs, and Hassell (1942) found slight be 
insignificant gains on the same test ee 
9-week training course in engineering dI al 
ing. Myers (Note 11) found that B 
cadets who had received training in mechi 
cal drawing scored no higher on spatial We 
than cadets who had received no such tral | 


ing. And Ranucci (1952) and Brown (1954) | 


have independently shown that courses " 
high school geometry did not result in 
creased performance on spatial tests. 
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An increase in mean performance levels 
"after training on spatial tasks, even if it 
were a consistent finding, does not explain 
the sex difference. If we assume that the 
female deficit in human spatial abilities re- 
sults from differential learning and that 
males are closer to the asymptote of their 
ability than females, then females should 
respond more favorably than males to train- 
ing. This hypothesis is not strongly sup- 
, ported by available evidence. Unfortunately, 
sex differences in response to training were 
not systematically examined in the studies 
reviewed previously. However, a few studies 
are available. Preliminary results from Drau- 
den's University of Minnesota dissertation 
research show no such affect (Drauden, Note 
12). Teegarden (1942) examined the effect 
of increased time limits om test sc.res and 
found that increasing the time allowed to 
‘complete a form-board test did not signifi- 
cantly effect female performance. McGee 
(19782) found no evidence for a differential 
response to training and practice on the 
Mental Rotations Test by females in com- 
parison with males. Smith (1948) also found 
that females did not differentially respond 
to training. Female spatial test performance 
was not raised to the level of male per- 
„formance as the result of training received 
in technical school. Thomas et al. (1973) 
reported sex differences for mastery of the 
principle that the surface of still water re- 
mains horizontal regardless of the tilt of 
the water’s container. Whereas 849% of the 
males (n = 62) were aware of the principle, 
only 31% of the females (n= 91) were 
Similarly aware. Training was successful in 
teaching the concept of "horizontality" to 
Only 12 of the 63 females who were initially 
у Unaware. And McGee (Note 13) found that 
first-grade boys benefited more than girls 
m training on the Piagetian Water Level 
ask. 

Empirical research does not strongly sup- 
Port the hypothesis of a differential response 
to training by females than males. However, 
аз pointed out by Sherman (1967), because 
of the unknowns involved in assuming what 
15 relevant activity in increasing spatial abili- 
ties, it is difficult to know whether the sexes 
0 in fact receive differential practice. Many 
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Very few girls are found in the high school classes 
of mechanical drawing, analytical geometry, and 
shop. Spare-time activities of tinkering with the 
car, sports, model building, driving a car, direction 
finding, and map reading are sex-typed and might 
also be sources of differential practice. (Sherman, 
1967, p. 295) 


It seems highly likely that these and similar 
activities are involved in fostering spatial 
skills. To uncover this effect experimentally 
will require research designs more sensitive 
than those that have so far been used. 


Genetic Influences 


Accumulated evidence shows that spatial 
abilities are equally as or more heritable than 
verbal ability (Blewett, 1954; Block, 1968; 
Bock, 1973; DeFries et al, 1974, 1976; 
McGee, 1978c; Osborne & Gregor, 1968; 
Park et al, 1978; Vandenberg, 1962, 1967, 
1968, 1969, 1971; Vandenberg, Stafford, & 
Brown, 1968; Williams, 1975; Bramble, 
Bock, & Vandenberg, Note 14; Thurstone, 
Thurstone, & Strandskov, Note 15) and 
much less correlated with traditional mea- 
sures of environmental quality such as level 
of education and SES (Bock & Vandenberg, 
1968; Marjoribanks, 1972; McGee, 1977; 
Vandenberg, 1971). A number of studies have 
suggested that spatial abilities may be en- 
hanced by an X-linked, recessive gene (Bock 
& Kolakowski, 1973; Goodenough et al., 
1977; Guttman, 1974; Hartlage, 1970; Staf- 
ford, 1961; Yen, 1975) and this hypothesis 
has served as a tentative explanation for the 
mode of genetic transmission and the sex 
difference in spatial test performance. 

Spatial abilities and X-linked inheritance. 
Traits effected by the transmission of a 
single gene on the X chromosome are said 
to be X-linked and are determined to be 
either dominant or recessive based on the 
relative frequency of effected males and 
females in the population. If the X-linked 
trait were recessive, more males than females 
would be affected, whereas if the X-linked 
trait were dominant, more affected females 
than males would be expected. This is true 
because in a population at equilibrium, one 
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third of the X-linked genes are carried by 
males and two thirds are carried by females, 
since females inherit two X chromosomes 
(one from each parent) and males inherit 
only one. A recessive, X-linked trait will be 
expressed in hemizygous recessive males and 
homozygous recessive females but not in 
hemizygous dominant males nor in heterozy- 
gous or homozygous dominant females. Fe- 
males with a double recessive genotype would 
be expected to occur in the population with a 
frequency g*—the square of the frequency 
of males carrying the single recessive allele. 

Where, the frequency of the recessive, spa- 
tial-enhancing allele q equals either 0 or 1.00, 
the absolute sex difference (g – 9°) will be 
0 (Jensen, 1975). As the value q departs 
from 0 or 1.00, the absolute sex difference 
will increase. A gene frequency of .5 of the 
spatial-enhancing allele maximizes the sex 
difference, which sets the ratio of enhanced 
females to males at 1:2. Thus O’Connor’s 
(1943) observation that only one fourth of 
all females score above the male median on 
tests of spatial ability, a finding replicated by 
numerous investigators (e.g., Bock & Kola- 
kowski, 1973; Bouchard & McGee, 1977; 
Loehlin, Sharan, & Jacoby, 1978; Yen, 
1975), is in accordance with the X-linked, 
recessive model. 

In addition to fulfilling the basic require- 
ment of explaining the greater proportion of 
spatializing males than females, the genetic 
X-linkage model predicts a characteristic 
pattern of family correlations different from 
that expected for an autosomal, polygenic 
trait. The model predicts a higher father- 
daughter than father-son correlation and a 
higher mother-son than mother-daughter 
correlation. Opposite sex pairs of siblings will 
tend to show less similarity than pairs of 
sisters, whereas the correlation between pairs 
of brothers should be an intermediate value. 
Sisters will be most similar because for 
brothers, the one X chromosome may be 
either of the mother's two, whereas for sis- 
ters the paternal X chromosome is identical 
(Hogben, 1932; Mather & Jinks, 1963: 
McKusick, 1964). Thus, under random mat- 
ing and a recessive gene frequency of .5 
(the frequency that best explains the mean 
sex difference and the Shape of the male and 
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female distribution), the expected order of 
family correlations is as follows: r sister- 
sister > r mother-son = ғ father-daughter > 
ғ brother-brother > r mother-daughter >r 
brother-sister > z father-son. The theoreti- 
cally expected correlation, based on X-link 
age, of .00 between fathers and sons is not 
easily predicted from environmental hypoth- 
eses in which modeling effects and shared 
experiences would ordinarily be expected to 
lead to higher same-sex than opposite-sex 
parent-child correlations. Consequently, the 
original studies supporting the X-linkage hy- 
pothesis (Bock & Kolakowski, 1973; Наж 
lage, 1970; Stafford, 1961) generated con- 
siderable interest, despite the fact that they. 
were based on rather small samples and 
failed to provide correlations among siblings. 

Several recent tests of the hypothesis have 
been conducted. This research is summarized. 
in Table 2, which shows age-corrected family’ 
correlations for tests of spatial visualization 
and spatial orientation abilities, along with 
results on two miscellaneous tests that in- 
volve a spatial component but that are not 
ordinarily recognized as measures of visual: 
ization or orientation. Several of the studies 
used numerous tests, and some of the tests 
were used in more than one study. Thus, 1 
addition to comparing similarities (and disi 
similarities) among tests, it is possible t0 
examine the extent to which the same tests 
yield similar correlation patterns across 
samples. 

Only two studies in the literature (Bow 
chard & McGee, 1977; Loehlin et al., 1978) 
report the complete array of family correla 
tions consisting of both parent-child an 
sibling correlations, In each of these studies 
the father-son correlation equals or excee® 
the mother-son correlation; therefore he 
results do not conform to the expected và 
tern. In Bouchard and McGee's 090 
study the difference between the bus 
brother and the sister-sister correlation 
highly significant (р < .005) in the direc a 
Opposite that predicted by the XR 
model. The expected pattern of parent pild 
correlations (with opposite-sex parent 
correlations highest, the mother-daug E. 
correlation intermediate, and the father- 
correlation lowest), although obtained 12 
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| original studies supporting the X-linkage hy- 

j spothesis (Bock & Kolakowski, 1973; Hart- 
lage, 1970; Stafford, 1961), has not been 
obtained for any spatial tests used in four 
recent, larger family studies (Bouchard & 
McGee, 1977; DeFries et al, 1976; Loehlin 
et al., 1978; Park et al., 1978). These results 
provide weak evidence, if any, for the spa- 
tial-enhancing effect of the X-linked reces- 
sive gene postulated by Stafford (1961). The 
idea, however, that different assessments of 
s spatial abilities (e.g., two-dimensional vs. 
three-dimensional tasks, rotation vs. trans- 
formation of spatial objects, analytic vs. 
gestalt processing, visualization vs, orienta- 
tion) have different genetic structures should 
appeal to those investigators who are reluc- 
tant to abandon the search for the spatial 
gene. An important contribution to this lit- 
erature would be a family study providing 
,the full array of intrafamilial correlations 
for spatial visualization and spatial orienta- 
tion tests, including the three tests that have 
Shown the pattern of correlations predicted 
by the X-linked model: the Identical Blocks 
Test (Stafford, 1961), the Differential Ap- 
titude Space Test (Hartlage, 1970), and the 
adapted Guilford-Zimmerman Spatial Visual- 
ization Test (Bock & Kolakowski, 1973). 
It is not at all clear, from the few attempts 
'to address these issues (Guttman, 1974; 
Loehlin et al, 1978; Yen, 1975) just what 
kinds of spatial test performance the X- 
linked gene should be expected to influence 
most. Clearly, the final word is not yet in" 
(Loehlin et al., 1978, p. 40). 


Hormonal I. nfluences 


. To what extent are hormonal differences 
"in males and females responsible for the ob- 
Served sex difference in spatial test perform- 
ance? A small body of studies has addressed 
this question. Petersen (1976), for example, 
demonstrated a curvilinear relationship be- 
tween physical androgenicity and certain as- 
Pects of cognitive functioning, namely, verbal 
fluency and spatial ability, Measures in both 
boys and girls at ages 13, 16, and 18 were 
made of sex hormone influence inferred from 


“gree of secondary sex characteristic de- 
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velopment, which is known to be under 
gonadal hormone influence (Tanner, 1969). 

Scores were available from two measures 
of spatial ability (including the Primary 
Mental Abilities Space Test, Thurstone & 
Thurstone, 1965) and from two measures of 
verbal fluency, Her results showed that less 
androgenized (less masculine) males (as- 
sessed on the basis of various physical char- 
acteristics including hip size, shoulder width, 
and muscle strength) scored higher on the 
spatial tests than boys with a more androge- 
nized (more masculine) body type. The posi- 
tive correlations were highest for the 18 year 
olds, lowest for the 13 years olds, and inter- 
mediate for the 16 year olds. These findings 
support those presented by Broverman, Klai- 
ber, Kobayashi, and Vogel (1968), who 
found that more androgenized males scored 
higher on verbal fluency than on spatial 
ability tests. For females, however, Brover- 
man et al. found different results: Females 
with high androgen levels (again rather 
crudely determined on the basis of physical 
characteristics such as narrow hips, wide 
shoulders, solid muscles, and small breasts) 
had higher spatial scores than females with 
low androgen levels, 

Other evidence has accumulated that 
highly masculinized males have a tendency 
toward lower spatial scores. For example, 
Klaiber, Broverman, and Kobayashi (1967) 
tested male college students using two simple 
repetition tasks and two spatial tasks. Scores 
on these spatial tasks were correlated with 
measures of masculinity (large chest and 
biceps and pubic hair distribution). Results 
showed a positive correlation with perform- 
ance on the simple tasks of repetition but a 
negative correlation with performance on the 
spatial tasks. In another study, Ferguson 
and Maccoby (cited in Maccoby, 1966) 
found that boys with high spatial scores 
were rated by their peers as less masculine 
than boys with low scores. 

How are these data to be interpreted? 
High body androgenization is associated with 
low spatial scores among males (Ferguson & 
Maccoby, cited in Maccoby, 1966; Klaiber 
et al, 1967; Petersen, 1976) and with high 
spatial scores among females (Broverman 
et al, 1968; Petersen, 1976). It might be 
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that spatial abilities are facilitated not by 
any absolute level of androgen but rather by 
an ‘optimal estrogen-androgen balance. Un- 
fortunately there exists a paucity of evidence 
to support this contention. One test comes 
from a study of Kwashiorkor feminized males 
(Dawson, 1967a, 1967b). West African chil- 
dren suffering from Kwashiorkor (a disease 
resulting from prolonged subsistence on diets 
deficient in protein) and the accompanying 
protein-deficiency-induced endocrinological 
dysfunction of the liver, which prevents the 
normal inactivation of the production of 
estrogen in the male (Stuart-Mason, 1963), 
exhibited lower spatial and numerical ability 
and greater field dependence, as compared to 
normal controls, in addition to demonstrating 
greater verbal ability. And the curvilinear 
nature of the relationship between body 
androgenicity and spatial ability found by 
Petersen (1976) indicates that at least a 
minimum androgen level is required for nor- 
mal spatial ability. It follows from these 
findings that the superior spatializer of either 
sex is less sexually differentiated than are 
nonspatializers, That is, the estrogen-andro- 
gen balance would be optimal and conse- 
quently spatial abilities would be highest for 
males low in androgen and for females high 
in androgen. We might even suggest that for 
females, the more androgen one has the 
better, thus explaining why individuals with 
Turner's syndrome (phenotypic females, the 
majority of whom has the single X chromo- 
somes XO rather than the normal female 
XX pairs and no gonadal hormones) demon- 
Strate poorer spatial abilities (Alexander, 
Ehrhardt, & Money, 1966; Alexander & 
Money, 1966; Alexander, Walker, & Money, 
1964; Garron, 1970, 1977; Money, 1963; 
Money & Granoff, 1965; Serra, Pizzamiglio, 
Boari, & Spera, 1978; Silbert, Wolff, & Lili- 
enthal, 1977), poorer direction sense (Alex- 
ander et al, 1964), and greater field de- 
pendence (Serra et al, 1978) than both 
males who also have single X chromosomes 
and genetically normal females, 

The questions raised in this section point 
to the obvious need for research aimed at 
clarifying the relationship between somatic 
androgenization and the evidence reviewed 
earlier for an X-linked recessive gene influ- 
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ence on spatial skill enhancement, The fin 
ing of better spatial skill in late-maturin 
less androgenized boys than in early-matu 
ing, more androgenized boys (Broverma 
et al, 1968) suggests the operation of a 
X-linked gene controlling the timing of r 
lease of androgen rather than the expressio 
of spatial skill directly (Bock & Kolakowsk 
1973). 

The suggestion made is that there exists. 
sex-linked influence on within-sex variatio 
in somatic androgenicity. However, this | 
a question that awaits empirical investiga 
tion. Future research might be aimed at dé 
termining the extent of this influence o 
between-sexes variation in spatial abilities 
The methods of measuring body androgeniza 
tion used by Petersen (1976) and by Brover 
man et al. (1968), based on the analysis 0 
physical characteristics including muscle ae 
velopment, body shade, genital or breast size 
and pubic hair distribution obtained by rat 
ing photographs, are crude and imprecise in 
dices of sexual differentiation controlled bj 
estrogen-androgen balance. Until more direc 
methods of hormonal assay are employed 0 
larger samples, the precise nature of tht 
relationship between spatial abilities and 
hormonal balance will remain an open que 
tion. 


Neurological Influences 


Neurological studies showing variations F 
the lateral organization of the human |. 
provide experimental evidence for а stru 
tural source of the variation in human spati 
abilities. Recent work on hemispheric a 
cialization suggests (a) that the right a 
bral hemisphere is specialized for s. 
processing and (b) that males have "m 
hemisphere specialization than females. i 
Conclusions are supported from sever 
ferent types of evidence to be reviewed: ine 

Hemispheric specialization. Langue es 
tion was the first higher mental Р | 
found to be asymmetrically rp 
the human brain, and it remains t sali 
documented case of hemispheric КЕТ. 
tion (Nebes, 1974). Recent studies x | 
tablished anatomical bases for the SP?” 
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у paation of both verbal and nonverbal infor- 


mation processing in human subjects. Kimura 
(1961) was probably the first investigator 
to employ Broadbent’s (1954) technique of 
dichotic listening for the examination of 
hemispheric specialization when she demon- 
strated that when pairs of contrasting digits 
were presented simultaneously to the right 
and left ears, those presented to the right 
ear were more accurately reported. Right ear 
advantage for the processing of easy-to-ver- 
balize stimuli (e.g., numbers, words, and 
letters) has since been confirmed (Milner, 
Taylor, & Sperry, 1968; Sparks & Gesch- 
wind, 1968; Studdert-Kennedy & Shank- 
weiler, 1970), 

Conversely, a left ear (right hemisphere) 
advantage for the processing of difficult-to- 
verbalize stimuli (e.g., melodies, sonar sig- 
nals, and abstract patterns of sound) has 
also been demonstrated (Curry, 1967; Ki- 
mura, 1964, 1966; Shankweiler, 1966; 
Spreen, Benton, & Fincham, 1965; Vignolo, 
1969; Chaney & Webster, Note 16). 

Other data demonstrate convincingly that 
each cerebral hemisphere primarily subserves 
its contralateral limb and binocular visual 
hemifield (Buffery & Gray, 1972) and that 
in about 96% of the normal adult popula- 
tion, cerebral dominance for verbal func- 
tions (i.e. tasks requiring semantic memory, 
manipulation, and production) is subserved 
by the left hemisphere, whereas the right 
hemisphere predominates in subserving non- 
verbal functions (i.e., tasks requiring per- 
ception and manipulation of visual images) 
(Bogen & Gazzaniga, 1965; Buffery, 1968; 
Buffery & Gray, 1972; Levy, 1976a; Searle- 
man, 1977; Witelson, 1976). 

Several investigations of patients with uni- 


x 


4 lateral brain lesions have demonstrated spa- 


tial abilities to be more affected by right 
than by left cerebral injury (Kimura, 1967; 
McFie, Piercy, & Zangwill, 1950; Milner, 
1962). Costa and Vaughan (Note 17) found 
that right-lesion patients (m = 18) scored 
Significantly lower (р < .05) than left-lesion 
Patients (т = 18) on the Block Design sub- 
test of the Wechsler Adult Intelligence Scale, 
With both normal and extended time limits. 
Similarly, it has been demonstrated that pa- 


Чеп who have suffered the loss of their 
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left temporal lobe show impaired memory 
for verbal materials but nonsignificant per- 
formance decrements on tasks such as mem- 
ory for faces (Milner et al., 1968) and maze 
learning (Corkin, 1965; Milner, 1965). In 
a more recent study, Kershner and King 
(1974) examined laterality of cognitive func- 
tions among hemiplegic children and found 
similar results. Twenty-one children were 
administered the Wechsler Intelligence Scale 
for Children (WISC) and the Reitan-Indi- 
ana Neurological Test. The left hemiplegics 
(right-brain-damaged) children (n = 7) were 
poorer than normals on visuoperceptual per- 
formance tasks (р < .05) but showed no 
significant impairment, relative to normals, 
on any of the WISC verbal tests. Right 
hemiplegics (left-brain-damaged) children 
(п — 7) were poorer than normals in verbal 
intelligence (p < .05). 

Although sample sizes are small among 
clinical populations, clinical studies of the 
effects of unilateral brain damage, reviewed 
previously, and commissurotomy (Sperry, 
1968, 1973; Sperry, Gazzaniga, & Bogen, 
1969) on verbal and spatial tasks corrobo- 
rate tachistoscopic perceptual data and pro- 
vide direct evidence for the conclusion that 
the right cerebral hemisphere is specialized 
for spatial processing. 

Hemispheric specialization and sex differ- 
ences in spatial abilities. To what extent do 
sex differences in hemispheric specialization 
underly male superiority on tasks requiring 
spatial abilities? A review of clinical and 
experimental data indicates that the right 
cerebral hemisphere is specialized for spatial 
processing and that the cerebral hemispheres 
of males and females tend to show differences 
in specialization for verbal and spatial func- 
tions. The conclusions that males have 
greater right hemisphere specialization than 
females is supported by data from tachisto- 
scopic perceptual studies (Ehrlichman, 1972; 
Kimura, 1969, 1973; McGlone & Davidson, 
1973), clinical studies (Lansdell, 1962, 
1968a, 1968b, 1973; McGlone & Kertesz, 
1973), and studies of anatomical differences 
between the sexes (Geschwind, 1974; Lans- 
dell & Davie, 1972; Wada, 1974; Witelson & 
Pallie, 1973). Much of this literature has 
been reviewed elsewhere (cf. Buffery & Gray, 
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1972; Harris, 1978; Harshman & Reming- 
ton, 1976). у | 

The major opposition to the conclusion 
that males have greater right hemisphere 
specialization and thus greater spatial ability 
than females has been proposed by Buffery 
and Gray (1972). Their theory that females 
are more lateralized than males for both 
language and spatial skills is supported 
mainly from developmental data on children. 
However, as Buffery and Gray themselves 
point out, "sex differences in children are 
difficult to interpret when there is an ad- 
vantage in favor of girls, since this may 
always be due to their general maturational 
advantage over boys" (p. 131). Moreover, 
recent developmental studies of hemispheric 
specialization are consistent with the con- 
clusion that boys have greater right hemi- 
Sphere specialization than girls and that 
girls are more bilateral in their cerebral 
representation of verbal and spatial func- 
tions, 

Knox and Kimura (1970), for example, 
studied dichotic listening to nonverbal stim- 
uli (environmental and animal noises such 
as dish washing, phone dialing, clock ticking, 
dog barking). Subjects were 80 right-handed 
children between 5 and 8 years of age. Males 
showed a greater left ear (right hemisphere) 
superiority than females across all ages. 

Witelson (1976) presented children rang- 
ing in age from 3 to 13 years with a tactual 
version of the dichotic recognition technique. 
All children were originally assessed as being 
right-handed. Examinees were instructed to 
touch unfamiliar four- to eight-sided shapes 
and then to identify the forms by pointing 
to a visual display of a group of shapes. 
Boys (n = 165) at age 5 and beyond showed 
а significant left-hand (right hemisphere) 
advantage; there was no hand difference for 
the 3- to 4-year-olds. Girls (n — 165) 
Showed significant left-hand superiority but 
not until after age 13. Witelson concluded 
that the right hemisphere may be specialized 
for spatial processing (the detection of 
shapes) earlier in boys than in girls, Other 
evidence of earlier right hemisphere spe- 
cialization in boys than girls for processing 
of nonauditory, tactual configurations has 
been provided from the investigation of 
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Braille reading in both normal (Rudi 
Denckla, & Spalten, 1973) and blind sub. 
jects (Hermelin & O’Connor, 1971а, 19715) 
Since Braille characters are symbols of alpha 
bet letters, a left hemisphere (right-hand a 
vantage for reading Braille might be er 
pected. It has been observed, however (Не 
melin & O'Connor, 1971b), that right-handed 
blind individuals have a clear advantage for 
reading Braille with the fingers of their lell 
hand. Hermelin and O'Connor (1971a) have 
suggested that for the blind, Braille symbols 
are processed as spatial configurations, not 
as linguistic symbols, and are as a resul 
processed by the right cerebral hemisphere 
A direct test of this hypothesis has been 
provided by Rudel et al. who examined nor- 
mal children. Right-handed males and fe 
males (m = 80) between 7 and 14 years 0 
age learned 12 Braille letters, 6 with each 
hand. Results indicated that 7- and 8-year“ 
old boys performed equally well with both 
hands in the Braille, paired-associate learn 
ing task but that the girls of the same ag 
showed right-hand superiority. Left-hand 
(right hemisphere) superiority emerged for 
both boys and girls at later ages (13 and и 
years), but the difference between right 
and left-hand scores was statistically signifi 
cant only for the boys, indicating an earlit 
and perhaps superior pattern of right hemi- 
sphere development in boys than girls. | 
Another opposing hypothesis to that 0 
greater right hemisphere specialization m 
males than females has been proposed. Ri 
Harris (1978). According to Harris, "t 
male eventually equals and then spe 
the female in degree of left hemisphet 
lateralization, so that in adulthood, be 
in females is bilaterally represented, : jd 
impeding her spatial ability (p. 460). " 
port for Harris’ first postulate—that aal 
Breater bilateralization of language a 
in females than males—is provide id 
Studies of normal (Bryden, 1966; Reming 
ton, Krashen, & Harshman, Note We l 
well as clinical populations (Lansdel ов | 
1962; McGlone & Kertesz, 1973). i 
dismisses developmental studies wi c 
spheric specialization, however, whi m 
sistently show earlier and greater left | "T 
sphere specialization of language functi 
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irls than boys (Buffery, 1970, 1971a, 1971b, 
971c; Kimura, 1963; Pizzamiglio & Cec- 
chini, 1971; Bryden, Allard, & Scarpino, 
Note 19). 

Harris (1978) provides support for his 
second postulate—that bilateral cerebral rep- 
resentation impedes spatial skills—mainly on 
the basis of studies of left-handers. The as- 
sumption is that left-handers tend to be less 
well lateralized (more bilateral) than right- 
handers in cerebral representation of verbal 
‘and spatial functions (Bryden, 1966; Good- 
glass & Quadíasel, 1954; Remington et al., 
Note 18). The implication is that left-hand- 
ers, like females, should score lower on tests 
of spatial ability than right-handers, since 
they are less well lateralized. Studies sup- 
porting the relationship between left-handed- 
ness and deficits on spatial tasks (James, 
Mefferd, & Wieland, 1967; Levy, 1969, 
19765; McGlone & Davidson, 1973; Miller, 
1971; Nebes, 1971; Nebes & Briggs, 1974; 
A, Silverman, Adevai, & McGough, 1966) 
ate based on small samples, and differences 
associated with sex are not always examined. 

Numerous other studies (Annett & Turner, 
1974; Fagan-Dubin, 1974; Kutas, McCar- 
thy, & Donchin, 1975; McGee, 1976, 1978b; 
Newcombe & Ratcliff, 1973; Sherman, in 
press) report conflicting evidence regarding 
the prediction of poorer overall performance 
On spatial tasks by left- than right-handers. 
As noted by Hardyck and  Petrinovich 
(1977), the data indicating that left-handed- 
ness is associated with deficits of various 
kinds is far from compelling. 

A related problem associated with Har- 


tis’ (1978) hypothesis is the assumption that 


left-handers are more bilateral than right- 
handers in their cerebral representation of 
werbal and spatial functions. Bilaterality of 
Cerebral function seems to be present in the 
left-handed only when there is a family 
history of left-handedness (Hardyck & Pe- 
trinovich, 1977) and is mot a characteristic 
of the nonfamilial left-handed individuals 
who as a group tend to be organized for 
Cerebral specialization exactly as are the 
right-handed. 

In summary, the clinical and experimental 
neurological literature suggests conclusively 
that the right cerebral hemisphere is spe- 
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cialized for spatial processing and that males 
have greater right hemisphere specialization 
than females. Further research is needed to 
determine the causal relationship, if any, be- 
tween sex differences in hemisphere special- 
ization and sex differences in spatial abilities. 


Conclusion 


Six conclusions are warranted. First, a 
plethora of factor analytic studies since the 
1930s have provided strong and consistent 
support for the existence of at least two dis- 
tinct spatial abilities—visualization and ori- 
entation, Spatial visualization is the ability 
to mentally rotate, manipulate, and twist 
two- and three-dimensional stimulus objects. 
Spatial orientation ability includes the com- 
prehension of the arrangement of elements 
within a visual stimulus pattern, the aptitude 
to remain unconfused by the changing ori- 
entations in which a spatial configuration 
may be presented, and an ability to deter- 
mine spatial orientation with respect to one's 
body. Second, visualization and orientation 
abilities are more highly correlated with 
success in a number of technical, vocational, 
and occupational domains than is verbal 
ability, making them important variables 
in applied psychology. Third, sex differences 
in various aspects of perceptual-cognitive 
functioning (e.g., mathematics and field in- 
dependence) are a secondary consequence 
of differences with respect to spatial visual- 
ization and spatial orientation abilities. 
Fourth, sex differences on tests of spatial 
visualization and orientation as well as on 
numerous tasks requiring these abilities do 
not reliably appear until puberty. Fifth, spa- 
tial abilities are known to be influenced al- 
most as much by genetic factors as is verbal 
ability in all populations studied; however, the 
X-linked recessive gene hypothesis that has 
served as a tentative explanation for sex dif- 
ferences in spatial abilities and for the mode 
of genetic transmission is not supported 
strongly in recent studies. Sixth, the devel- 
opment of sex differences in spatial skills is 
likely related to sex differences in the de- 
velopment of hemisphere specialization. Re- 
cent work in hemisphere specialization dem- 
onstrates conclusively that the right cerebral 
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hemisphere is specialized for spatial pro- 
cessing and that males have greater right 
hemisphere specialization than females. 
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Simonton (1977) discussed the use of several 
linear models for the analysis of data arising 
in the interrupted time-series design and the 
multiple-group time-series design. The inter- 
rupted time-series design consists of measuring 
the same subjects on several pretreatment 


9% occasions, introducing a treatment, and mea- 


$uring the subjects on several posttreatment 
occasions. In the multiple-group time-series 
design, two groups of subjects are used. In 
this case, the experimental group is treated in 
the same manner as the single group in the 
interrupted time-series design. The control 
group is measured on the same occasions as 
the experimental group but does not receive 
the treatment. Typically, assignment of sub- 
jects to the two groups is nonrandom, although 
the treatment may be assigned randomly 
to the groups. 

The linear models discussed by Simonton 
(1977) are similar to the univariate polynomial 
trend analysis model that can be used to 
estimate regression curves with a quantitative 
independent variable. However, there is a 
major difference between the data for which 
univariate trend analysis is appropriate and 
the data obtained in the time-series designs. 
The difference is that to use the univariate 
trend analysis, different subjects are used at 
each level of the independent variable. There- 
fore, it may be reasonable to assume that the 
residuals from the polynomial trend function 
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i Alternatives to Simonton’s Analyses of the 
Interrupted and Multiple-Group Time-Series Designs 
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Statistical procedures for analyzing the interrupted time-series and the multiple- 
group time-series designs are outlined. The procedures are applicable when sev- 
eral subjects are observed on several pretreatment and posttreatmeant occa- 
sions, and the number of subjects is greater than the number of occasions. 


are uncorrelated, an assumption that is 
required for the use of ordinary least squares 
estimates. The reasonableness of this assump- 
tion derives from the fact that each residual 
characterizes a different subject. Since in the 
time-series designs the same subjects are 
measured at each occasion, the residuals in 
any model for time-series data characterize 
the same individuals and therefore are likely 
to be correlated. As a result, ordinary least 
squares estimates of the parameters may not 
be efficient and hence should not be used. 
Simonton recognized this problem and sug- 
gested the use of modified generalized least 
squares estimators. This is a reasonable 
suggestion. 

The analyses discussed by Simonton (1977) 
consist of estimating the parameters of the 
linear model and then testing hypotheses about 
the parameters to examine the hypothesis of a 
treatment effect. In general, we do not have 
any objections to the models proposed by 
Simonton for the interrupted time-series 
design. Our concern is with the analyses 
advocated by Simonton, and one purpose of 
this article is to propose alternative statistical 
analyses. With regard to the multiple-group 
time-series design, our position is that the 
aims of Simonton’s analyses can be realized 
using profile analysis. Therefore, a second 
purpose of this article is to describe a profile 
analysis of the data arising from a multiple- 
group time-series design. 

In all of the following, we assume that 
the number of subjects (N) exceeds the 
number of occasions (p). Simonton (1977) 
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suggested that У should be larger than f, 
preferably twice as large. Hence, our criterion 
for N is not more restrictive than Simonton's. 


Critique of Simonton's Analyses 


It can be shown that each of Simonton's 
(1977) models for the interrupted time-series 
design can be expressed as a special case of 


X= Авље (1) 


where X is the (рх1) sample mean vector, 
A is a (pxr) known-design matrix, 8 is an 
(7х1) vector of unknown parameters, and e is 
a (px1) random vector of residuals. The vector 
X is calculated from the N realizations, Х,, 
of the random vector X. The variance- 
covariance matrix of X is X. If the model in 
Equation 1 is correct, then the population 
mean vector и = АВ and the variance- 
covariance matrix of e, and hence, X, is 
V=2/N. 

The vector 8 should not be estimated using 
the ordinary least squares estimator 
(A'A)-!A'X, since it may not be efficient if the 
variance-covariance matrix of e is not 227, 
(ке, if the elements of e are correlated or have 
unequal variances). An appropriate estimator 
for B is the modified generalized least squares 
estimator § = (A'V3A)3A'U3X, where V 
is an estimator of V. The problem in using 8 
is developing an estimator for V. 

As noted above, Simonton (1977) recognized 
that the elements of e are likely to be correlated 
and attempted to use Ê by calculating an 
estimate of V under the assumption that the 
elements of e conform to a first-order stationary 
autoregressive model. This implies that 


€ = ре 1+6, t= 2,..., р 


with the mean of e, — 0 and the variance of 
€, constant for! = 1,..., р. This assumption is 
unlikely to be correct for two reasons. First, 
Lord (1963) has shown that the variance of a 
variable is likely to increase over time. Second, 
Jóreskog (1970) has pointed out that an 
autoregressive model is unlikely to fit data 
perturbed by measurement error. 

More important than the likely failure of 
the data to meet the autoregressive assumption 
is that it is not necessary to make the assump- 
tion to estimate B, provided that У > Р. 
Use of В requires an estimator for V = Z/N. 
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This can be accomplished by using the sample ~- 


variance-covariance matrix 


1 N 
$= X (hi R(X- Ey 
i i=l 


as an estimator of E and using S/N as an 
estimator of V. This approach requires no 
assumptions about the stochastic process 
generating e. The estimator B then becomes 
B = (A'S7A)?A'S^£. 

A second problem with the analyses sug? 
gested by Simonton (1977) is that he did not 
provide a method for evaluating the fit of the 
model to the data. Therefore, it is possible to 
estimate the vector 8 and use this estimate 
to make an inference about a treatment 
effect even though the model fits the data 
badly. A third problem is that Simonton 
proposed the: use of approximate tests of 
significance to test hypotheses about 8. How-* 
ever, exact tests of significance are available 
when 8 is used to estimate 2, and use of these 
tests is preferable to use of the approximate 
tests advocated by Simonton. 

With regard to the multiple-group time- 
series design, our major criticism was discussed 
previously. The procedures reported by Simon- 
ton, and their attendant assumptions, are 
unnecessary since the aims of his procedures 
can be accomplished using profile analysis. 


Interrupted Time-Series Design 


As noted above the models discussed by 
Simonton (1977) are special cases of the linear 
model given in Equation 1. The models 
discussed by Simonton are called changed-level 
(with permanent, transient, or dampened 
change level) and changed-slope models. The 
first purpose of this section is to illustrate how 
each of these models is expressed in terms of 
Equation 1. Suppose that the subjects ate 
measured on six occasions, three pretreatment 
and three posttreatment. The permanent 
changed-level model is expressed as 


Xi 1 0) | Bo а 
ї, 1 0| [8: [7] 
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A 
\ The model states that the means for the 
4^ "retreatment. occasions are at the same level. 
After the treatment, the means move to a 
new level. The hypothesis that 81 = 015 tested 
to determine if a treatment effect has occurred. 
If the hypothesis that 61= 0 cannot be 
rejected, then the means for the entire time- 
series design are at the same level and a 
treatment effect cannot be inferred. 
The transient changed-level model can be 
к expressed as 
2 


Р 1 1 0 Bo ет 
P 2 1 O} [Ai ез 

О ез 
хи + |е, 
X 5 I9 е5 
po 10 ев 


» Here the model states that the mean for the 
occasion immediately following the treatment 
moves to a level different from the common 
level of all the other means. For the same 
reason as with the previous model, the hypoth- 
esis that 81 = 0 is of interest. 
The dampened changed-level. model is 
expressed as 


Х 1 0 Bo e 

a E 1 0 [5] ез 
120 ез 
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X; ENS е5 
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Тһе model states that the means for the 
pretreatment occasions are all at the same 
level. After the treatment the mean increases, 
but over time the means tend to drop back 
to the initial level. Again, the hypothesis that 
B1 = O is tested. 

* As Simonton (1977) noted, the previous 
models all assume that the population means 
would be equal if it were not for the intervening 
treatment. An alternative assumption is that 
there is a linear or higher order trend in the 
population means. Such an assumption can 
easily be incorporated in any of the changed- 
level models above. To illustrate, suppose 
that there is a hypothesized quadratic trend 
and a dampened level change. The model can 
LE 
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be expressed as 
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In this model, the parameter of interest is 83. 
If the hypothesis that 6; = 0 cannot be 
rejected, then а quadratic curve adequately 
fits the data, and hence, a treatment effect 
cannot be inferred. The logic here is that the 
same trend is evident both in the pretreatment 
and posttreatment means, so it is difficult to 
claim that the treatment had an effect. This 
notion was discussed by Campbell (1963). 

The changed-slope model discussed by 
Simonton (1977) can be expressed as 


Xi 1-10 0) [8 е 
X 1 0 0 0| |82 ез 
+|_|1 10 0 A је 
Ху ^ jo 0 1 —1 (А е 
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The model states that there is а linear trend 
in the pretreatment and posttreatment data. 
With this model there are two null hypotheses 
to be tested: the hypothesis of a common 
intercept, На: Ва = Bs, and the hypothesis 
of a common slope, Но: 8; = ва. If neither 
hypothesis is rejected, then the linear trend 
for both the pretreatment and posttreatment 
data is the same, and hence, a treatment 
effect cannot be inferred. If either is rejected, 
the hypothesis of a treatment effect is sup- 
ported. With more pretreatment and post- 
treatment occasions, this model can be 
generalized to include higher order trends. The 
analysis of data that used a linear model was 
discussed by Algina and Swaminathan (1977). 
The analysis of data that used a curvilinear 
model was discussed in detail by Swaminathan 
and Algina (1977). At this point the discussion 
turns to the statistical procedures for testing 
the fit of a particular model to the data and 
for testing hypotheses about the parameters 
of the model. These procedures are based on 
an article by Rao (1959) and have been 
discussed by Swaminathan and Algina. 
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Let e( ) denote the expectation operator. 
Since е(Х) = д, it follows that 


«X) = и = Ав + «(е). Q) 


Let Q[(p — r)xp] be а nonzero matrix con- 
structed so that QA = 0. The matrix Q can 
be chosen as a basis for the matrix [1 — A 
(A'A)-A']. Premultiplying Equation 2 by 
Q, we obtain 


Qu = 048 + Oc(e), 
=0«(е). 


As noted above, if the hypothesized model is 
correct, then » = Ад, and hence «(e) = 0. 
Therefore, if the model is correct, Qu = 0. 
On the other hand, if the hypothesized model 
given by Equation 1 is not correct, then 
Qu > 0. Therefore, to test the adequacy of 
the model, the hypothesis 


Hy: ди = 0 (3) 


is tested. Failure to reject the hypothesis given 
by Equation 3 supports the contention that 
Equation 1 is the correct model. Conversely, 
rejection of the hypothesis indicates that 
Equation 1 does not adequately fit the data. 

It is wel-known that the hypothesis given 
by Equation 3 may be tested using Hotelling's 
Т? statistic, 


T: = УХ'0' (080')-0Х, (4) 


where X is the sample mean vector, and S 
is the sample variance-covariance matrix. 
The test statistic may be transformed to the 
variate 
LAS ice as r 
(Ф— (У — 1) 


which is compared to the (1 — a)th fractile 
of the F distributions with p—randN—p 
са 1 degrees of freedom. The matrix Q is not 
unique, but Rao (1959) showed that the test 
Statistic is invariant to the choice of Q. 

If the test for the fit of the model to the 
data indicates that the model is inadequate, 
then a new model must be proposed. If the 
model adequately fits the data, then the vector 
B is estimated and the relevant hypothesis is 
tested. The modified generalized least squares 
estimate of 8 is = (A’S1A)1A’S-1 

All of the hypotheses concerning 8 can be 
expressed as special cases of the hypothesis 


T» 
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68 = 0, where С is a (kxr) matrix of г 4 
k < r. Као (1959) has shown that this hypoth 
esis can be tested using the test statistic 
p Or р bB'G'TG(A'S?A)-:G"]-1G8 
k[(N —1)4- 77] 1 


(5) 
where 7” is the quantity defined in Equation 5. 
The quantity F follows the F distribution with 
k and N — & p+r degrees of freedom. 
For the changed-level models, G is a (lar) | 
vector, say g’, with 0 as the first r — 1 elements’ 
and 1 as the remaining element. In this case, 
the test statistic reduces to 
_ Ми р (8) (6) 
8 (АЗРА) а (У — 1) + 7:7 
The inner product £/ simply picks out the 
estimate of the changed-level parameter from 
the vector B. The quantity F given by Equation 4 
6 follows the F distribution with 1 and (У — 1) 
— (р — r) degrees of freedom. If the test 
statistic exceeds the critical value F, then the 
hypothesis of a treatment effect is supported. 

It should be noted that the test statistic 
given above has an exact distribution in 
contrast to the test statistic given by Simonton 
(1977), which has only an approximate 
distribution. Hence, with the procedure out- 
lined here, it is possible to control the alpha # 
value precisely. 

As noted previously, the changed-slope 
model discussed by Simonton (1977) is a 
special case of the class of models discussed by 
Swaminathan and Algina (1977). Since the 
statistical details are available in that article, 
only a verbal description of the analysis will 
be given. Again the logic of the analysis is 
based on Campbell's (1963) observation that 
if a single polynomial adequately fits the 
entire time-series design, then a treatment 
effect cannot be inferred. The steps in thei 
Swaminathan-Algina analysis are Д 

1. Test the adequacy of а model which 
specifies that the pretreatment and post 
treatment portions of the series are adequately 
fit by separate polynomial regression curves of 
degree r — 1. Refer to this model as the 
complete model. 

2. If the complete model is not adequate 
test whether ап (ғ — 1)th-degree polynom!4 
adequately fits one portion of the time-series 4 


design but not the other. If the answer is 

rmative, then different regression curves fit 

w/the two portions of the series, and the hypoth- 
esis of a treatment effect cannot be rejected. 

3. If the answer in Step 2 is negative, raise 
the degree of the polynomial, and repeat 
Steps 1 and 2. This cycle is followed until the 
complete model is accepted or rejected, and 
the answer in Step 2 is affirmative. 

4. In the event the complete model is 
accepted, estimate the regression coefficients, 

[3 and test the equality of the coefficients of the 

pretreatment and posttreatment portions. If 
the two sets of coefficients are equal, then the 
same curve fits both portions, and the hypoth- 
esis of a treatment effect is not tenable. If 
the hypothesis of equality is rejected, then 
the hypothesis of a treatment effect is sup- 
ported, 


» Multiple Time-Series Design 


' The multiple-group time-series design con- 
sists of observing an experimental and a 
control group on several pretreatment occas- 
ions, introducing a treatment to the experi- 
mental group, and observing the two groups 
on several posttreatment occasions. In essence, 
Simonton's (1977) proposed analysis consists 
of fitting four separate first-degree poly- 
4 nomials: one to the pretreatment means for 
the experimental group, the second to the 
posttreatment means for the experimental 
group, the third to the pretreatment means 
for the control group, and the fourth to the 
posttreatment means for the control group. 
Inferences about treatment effects are based 
on statistical comparison of the slopes and the 
intercepts of the various equations. Again, 
an objection to this kind of analysis is that 
first-degree polynomials may not adequately 
* fit the four sets of means. However, there is 
до way to detect such lack of fit from Simon- 
ton's analysis. A standard multivariate tech- 
nique, profile analysis (Morrison, 1967, pp. 
141-148), can be used to accomplish the aims 
of Simonton's analysis of the multiple-group 
time-series design without making the assump- 
tion that polynomials of any degree fit the 
data. 
Simonton (1977) suggests that in an 
optimal situation subjects would be randomly 
assigned to experimental and control groups. 
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He recognizes that this may be impossible 
and suggests that equality of the intercepts 
and slopes of the pretreatment equations for 
the experimental and control groups is a 
necessary condition for the groups to be 
considered equivalent. If the equality condition 
fails, he suggests dropping the control group 
and analyzing the experimental group data as 
an interrupted time-series experiment. In our 
view, a more appropriate analysis would begin 
by testing the hypothesis that there is no 
Groups X Occasions interaction. If an anal- 
ysis indicates that there is no interaction, 
then a treatment effect is not supported. 
With no interaction, the mean time series 
for the two groups are parallel. The rela- 
tionship between the mean time series remains 
the same over the course of the study, and 
therefore, it is difficult to argue that a treat- 
ment effect has occurred. 

Let 4, and u, denote (px1) population mean 
vectors for the experimental and control 
groups, respectively. The hypothesis of no 
Groups X Occasions interaction can be ex- 
pressed as 

Ho: Си, = и) = 0, (7) 


where 0 is а (2х1) null vector and С is a 


С — 1)xp] matrix, 


SIT ТО 00 
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It is well-known that under the assumption 
that the experimental and control group 
observations are each multinormally distrib- 
uted with equal variance-covariance matrices, 
the hypothesis given by Equation 7 can be 
tested using Hotelling's T? statistic, 


N.N. (Ct n- 
De ats X.)'C'(CSC^)-: 
x С(Х, – Xj, 


where №, and №, аге the numbers of cases in 
the experimental and control groups, X, and 
X. are (рх1) sample mean vectors, and 
S(pxp) is the pooled sample variance-covar- 
jance matrix. The test statistic can be trans- 
formed to the variate 
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which follows the F distribution with р — 1 
and N. + Ne — р degrees of freedom. 

As indicated earlier, if the hypothesis given 
by Equation 7 is not rejected, the hypothesis 
of a treatment effect is untenable. If the 
hypothesis is rejected, no conclusion can 
be made because an interaction can occur 
whether or not there is a treatment effect. For 
example, suppose that an interaction exists 
for the pretreatment portion of the experi- 
mental and control group time series. Then 
the hypothesis given by Equation 7 should be 
rejected, but this hardly indicates that there 
is a treatment effect. Rather, it indicates that 
because of nonrandom assignment to groups, 
the mean time series would not have been 
parallel even if the intervention had not 
occurred. Now suppose that a Groups X Occa- 
sions interaction does по! exist for the pre- 
treatment means but does exist for the post- 
treatment means. Here the hypothesis given 
by Equation 7 should be rejected, and there 
is evidence supporting the hypothesis of a 
treatment effect, that is, that although the 
pretreatment mean time series were parallel, 
after the treatment the mean time series 
became nonparallel, which suggests that the 
treatment had some effect. 

The above discussion suggests that if for 
the entire time series, the hypothesis of no 
Groups X Occasions interaction is rejected, 
then the next phase of analysis should deter- 
mine at which point the deviation or devia- 
tions from parallelism occur: on the pretreat- 
ment occasions, on the posttreatment occasions, 
or on both types of occasions. If an interaction 
occurs only for the posttreatment occasions, 
then the hypothesis of a treatment effect is 
Supported. If an interaction occurs for the 
pretreatment occasions, then the control group 
must be dropped, and the experimental group 
data should be analyzed as an interrupted 
time series experiment. 

Let wie, ше раг, and из, represent the pre- 
treatment and posttreatment population mean 
vectors for the experimental and control 
groups, respectively. Let there be m pretreat- 
ment occasions and s posttreatment occasions. 
The hypothesis of no Groups X Occasions 


interaction for the pretreatment means may 
be expressed as 


Ho: Cu — Hic) = 0, (8) 
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where С; has the same general form as C but | 


has dimensions [(m — 1)xm]. The relevans. 


hypothesis for the posttreatment means may 
be expressed as 


Ho: C. (us. — ux) = 0, (9) 


where again C, has the same general form as 
C but has dimensions [(s — 1)xs]. The test 
statistics for these two hypotheses are 


Tz XS. — Кос (GSC) 

x &(X,, — Xy 
and 
RE UE. — Eos 

МА." | au. 


X €, (X., — X), 


where S, and S; are pooled-sample variance- 
covariance matrices for the pretreatment and 
posttreatment observations, respectively. 


The test statistics can be transformed to 


the variates 
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The critical value to which Ё; and F» are 
compared is the same as the critical value (0, 
which F is compared in testing the hypothesis 
given by Equation 7. Using this critical value 
keeps the overall a rate at the nominal level. 
If the hypothesis given by Equation 8 is not 
rejected and the hypothesis given by Equation 
9 is rejected, then the hypothesis of a treatment 
effect is supported. If only the hypothesis 
given by Equation 8 or both hypotheses are 
rejected, then the data for the experimental 
group is analyzed as an interrupted time-series 
experiment. 
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Discussion 


The major purpose of this article was i 
suggest several improvements to statistic? 
procedures reported by Simonton (1977) for 
the analysis of time-series designs. The 
improvements consist of the employment 0 
tests of the fit of proposed models to the dal? 
and the exact small sample distribution! 


n 
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theory for testing hypotheses. In addition, it 


Should be pointed out that the assumptions 


underlying the procedure advocated in this 
article are different from those made by 
Simonton. Simonton assumed that the errors, 
€1, €2, . .., €p, follow a first-order auto-regres- 
sive scheme, that is, e; = pé:1-+ 6, where 
the residual 6, is assumed to be independent 
of other ôs and the es. Since N individuals are 
measured on p occasions (№ > p), Simonton 
recommends estimating p for each individual 
(based on p observations) and pooling these 
estimates across the У individuals. However, 
when N > p, unless there is a strong reason 
to believe that the particular error structure 
exists, it may be more meaningful to relax 
this assumption, assume that the vector of 
errors e has a multivariate normal distribution 
with mean vector 0 and dispersion matrix 2 
and proceed as suggested in this article. 


9? Another advantage of this approach is that the 


У > 


ёхасі distributions of the statistics for testing 
hypotheses of interest are known (as compared 
with the Simonton approach in which only 
the approximate distribution of the test 
statistic is known). 

A necessary condition for the application of 
the procedures developed in this article is 
that the number of subjects exceed the number 
of occasions. Simonton tacitly assumed this 
condition also. This does not seem to be a 
serious drawback, since with a large number of 
subjects, fewer time points are required for 
satisfactory estimation of parameters. When 
N « p, a structure has to be imposed on e. 
The procedure suggested by Simonton is more 
appropriate in this case. However, even in 
this case, the test of the fit of the model to 
the data should be explored. This is not a 
trivial problem but may be solved by adapting 
methods such as those suggested by Krishnaiah 
and Murthy (1966) and Rao (1967). 

If one of the changed-level models fits the 
data, and a treatment effect has occurred, then 
the Swaminathan-Algina (1977) analysis 
should detect this treatment effect because the 
changed-level models state that a polynomial 
fits the pretreatment data, whereas a poly- 
nominal does not fit the posttreatment data. 
Hence, Step 2 of the Swaminathan-Algina 
analysis should indicate a treatment effect. 
Are the changed-level models then superfluous? 


& 
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In our opinion, they are not. The Swamina- 
than-Algina analysis asks whether the pretreat- 
ment and posttreatment regression curves on 
time are the same. An affirmative answer 
strongly indicates that a treatment effect 
has not occurred. A negative answer indicates 
that after the treatment occurred, the regression 
curve shifted. There may be numerous threats 
to internal validity that provide reasonable 
explanations of the shift, and the evidence 
for a treatment effect may be quite weak. 
The changed-level models postulate the form 
of the treatment effect. If the changed-level 
model fits the data and the changed-level 
parameter is nonzero, then the class of threats 
to internal validity that are plausible explana- 
tions of the form of the regression curve will 
probably be narrower. Hence, employing the 
changed-level models as suggested by Simon- 
ton with the analytic procedure advocated in 
this article would provide an efficient method 
for the analysis of the interrupted time-series 
designs. 

In the Introduction it was noted that if 
the residuals are correlated, then the ordinary 
least squares estimator may not be efficient. 
Rao (1967) has discussed the situations in 
which the ordinary least squares estimator is 
more efficient than the modified generalized 
least squares estimator. However, strictly 
speaking, application of his results requires 
knowledge of the structure of 2. Furthermore, 
even when ordinary least squares estimators 
are more efficient, it can be shown that 
multivariate procedures should be used for 
testing hypotheses about the adequacy of the 
model and about 8 (Grizzle & Allen, 1969). 
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Reply to Algina and Swaminathan 


Dean Keith Simonton 
University of California, Davis 


Algina and Swaminathan have proposed more sophisticated analyses for the 
cross-sectional time-series experiment. Especially valuable is their suggested 
procedure for testing the empirical adequacy of the hypothesized intervention 
model. Nonetheless, the greater complexity of their approach may not always 
be justified in many research applications. In particular, their exact-test method 


= 


test procedure. 


Algina and Swaminathan's (1979) article 
constitutes a significant contribution to the 
recent literature on cross-sectional time-series 
quasi-experiments. Certainly they have of- 
fered some reasonable and more sophisticated 
alternatives to the analyses I had proposed in 

gan earlier article (Simonton, 1977). Particu- 
larly notable, in my view, are the procedures 
they have developed for testing the adequacy 
of a model's fit to the data, an issue I 
neglected. Although it may not always be 
necessary to run such tests, it is easy to 
imagine many real situations in which such 
verification would be required. On the other 
hand, the general analytical procedure they 
have outlined for parameter estimation and 
Significance tests is definitely more compli- 
cated than that I had proposed. Therefore, 
it is reasonable to ask what specific advan- 
tages accrue from such augmented complex- 
ity. Apparently, the chief improvements are 
two in number. In the first place, the signifi- 
cance tests that they have developed are 
exact, whereas mine are only approximate. 
Whether one prefers complex exact tests or 
simple approximate tests may be somewhat 
*- a matter of personal choice, at least given 
pur ignorance regarding the degree of ap- 
proximation in the approximate tests. Also, 
the exact tests are only exact when the as- 
sumptions are exactly met, and hence the 
distinction is partially obscured. 


Requests for reprints should be sent to Dean 
Keith Simonton, Department of Psychology, Uni- 
versity of California, Davis, California 95616. 


will normally yield statistical inferences similar to those of my approximate- 


However, Algina and Swaminathan raise a 
second and related critical point about their 
proposed alternative: My solution to the 
serial dependencies in the disturbances was 
to postulate a first-order autoregressive 
scheme, whereas they estimate а variance- 
covariance matrix without any a priori struc- 
ture. As they point out, their procedure re- 
quires that the number of subjects exceed 
the number of observations, but this condi- 
tion is easily fulfilled. Furthermore, they 
mention situations in which a first-order 
autoregressive model may not adequately de- 
scribe the disturbance process (e.g., when 
there is measurement error). Nonetheless, I 
think it is reasonable to ask what the conse- 
quences of following my procedure are any- 
way, no matter what defines the true vari- 
ance-covariance matrix, Here I believe the 
differences between the two alternative pro- 
cedures will usually be small. For example, 
if the disturbances actually are generated by 
a second-order autoregressive scheme, the 
significance tests will be only slightly affected 
(Miklich, Note 1), whereas the chief loss 
will fall in the area of estimation efficiency 
(Engle, 1974). Yet at this point in the his- 
tory of the behavioral sciences, estimation 
efficiency (i.e., the variance of our parameter 
estimates) is probably a low priority con- 
cern. Moreover, whenever we are dealing 
with quasi-experiments entailing only one 
intervention, the significance tests for cross- 
sectional time series are extremely robust, 
even when the autoregression is moderate 
(Miklich, Note 1). Indeed, it is my belief 
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that most cross-sectional time-series’ experi- 
ments could probably do without generalized 
least squares estimation and could simply 
rely on a more conservative alpha level (e.g., 
minimum of .01). 

In all, I am fairly confident that in the 
majority of data analysis situations, it may 
not make a substantial difference whether 
one employs my simple approximate-test ap- 
proach with an a priori disturbance model or 
the Algina-Swaminathan complex exact-test 
approach with an a posteriori variance-co- 
variance matrix. Nevertheless, I am also of 
the opinion that their procedures represent 
a wider range of valuable tests that probably 
render their approach far more useful in the 
long run as a general strategy for analyzing 
such data. 


Џ 


DEAN KEITH SIMONTON 


Reference Note 
1. Miklich, D. R. Robustness of analysis of variance 


treatments: Comparisons to within subjects auto- © 


regressive data. Unpublished manuscript, 1978. 
(Available from National Asthma Center, 1999 
Julian Street, Denver, Colo. 80204). 
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The MMPI As a Primary Differentiator and Predictor 


of Behavior in Prison: A Methodological Critique 
and Review of the Recent Literature 


Milton L. Gearing II 
University of South Carolina 


Seventy-one investigations of Minnesota Multiphasic Personality Inventory 
(MMPI) usage in prison work were systematically evaluated. Additional studies 
were examined to provide a methodological basis for the comparisons of the 
research, which were made within sections on sampling procedures, sources of 
variance and their effects on test results, protocol validity, and methods of 
profile interpretation. Several methodological shortcomings and various differ- 
ences in procedure across studies limit the generalizability of the findings. 
However, research in the hostile-assaultive section has produced preliminary 
MMPI indicators for a type of violently aggressive behavior pattern that is 
otherwise difficult to detect. Other areas in which the MMPI shows promise 
include homosexuality, recidivism, and the classification of psychopathologic 
behavior. More research is needed in the areas of institutional adjustment and 
suicide. Recommendations for future investigations prescribe adequately con- 
trolled sampling procedures, modifications in the interpretation of protocol 
validity, investigation of certain methodological questions in their own right, 
consideration of more than one aspect of profile data, the use of base-rate 
probabilities in predictive studies, and the pursuit of longitudinal studies with 
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thorough follow-up procedures. 


The Minnesota Multiphasic Personality 
Inventory (MMPI) is probably the most 
widely used personality test in American 
‘criminal justice settings today, and MMPI 
administration is a part of standard admis- 
Sions practice for all federal institutions as 
well as for several state and local institutions 
(Elion & Megargee, 1975). It is routinely 
used as a general aid in diagnosis and treat- 
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ment program planning (Haven, Note 1), 
yet several researchers have investigated 
more specific applications of the MMPI that 
deal with several varieties of circumscribed 
inmate behavioral problems. These investi- 
gators are striving to increase the usefulness 
of the MMPI in prison settings, either for 
gaining dynamic insights into a specific be- 
havior problem group or for probabilistic 
prediction of future behavior problems. The 
full extent of the MMPI’s present usefulness 
and its potential in these two major applica- 
tions constitutes the focus of the present 
review. An attempt is made to determine if 
the MMPI has the potential to become a 
major and valuable aid in the making of cor- 
rectional decisions that are appropriate and 
beneficial for the individual inmate's welfare 
as well as facilitative of the smooth and 
effective operation of prison programs in 
general. 

The MMPI consists of 550 different state- 
ments covering a wide range of subject mat- 
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Table 1 dud 
Standard Validity and Clinical Scales for the 
Minnesota Multiphasic Personality Inventory 


Symbol Scale 

? Question 

L Lie 

F Validity 

K Test-Taking Attitude 
Hs(1) Hypochondriasis 
D(2) Depression 

Hy(3) Hysteria 

Pd(4) Psychopathic Deviate 
Mf(5) Masculinity-Femininity 
Pa(6) Paranoia 

Pt(7) Psychasthenia 

Se(8) Schizophrenia 

Ma(9) Hypomania 

Si(0) Social Introversion 


ter. The client responds to each statement 
by answering true or false or leaving the 
statement blank. The standard MMPI pro- 
file consists of 4 validity scales (?, L, F, 
and K) and 10 clinical scales (Hs, D, Hy, 
Pd, Mj, Pa, Pt, Sc, Ma, and Si). The ? 
(Question) scale merely records the number 
of items left blank. The L (Lie) scale con- 
sists of 15 statements describing minor yet 
common human failings that attempt to 
identify intentional efforts by the subject 
to make himself or herself “look good." 
The F (Validity) scale contains 64 items 
rarely answered in the scorable direction by 
normals, and it attempts to detect the pres- 
ence of confusion due to psychosis or illiter- 
acy, a look-bad attempt meant as a “cry 
for help” or a random response pattern. The 
K (Test-Taking Attitude) scale consists of 
30 items and basically serves as a measure 
of defensiveness in the subject's test-taking 
attitude. The Hs (Hypochondriasis) scale 
has 33 items reflecting somatic complaints 
commonly found in hypochondriasis. The D 
(Depression) scale contains 60 items that 
describe various symptoms of depression, 
such as feelings of hopelessness and worth. 
lessness, preoccupation with death, and so 
forth. The Hy (Hysteria) scale is made up 
of 60 items that tend to identify conversion 
hysterics in particular. The Pd (Psycho- 
pathic Deviate) scale is comprised of 50 
items designed to detect the amoral and 
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asocial types commonly described as psycho 2 
pathic personality disorders. The Mf (Mas 
culinity-Femininity) scale has 60 items and 
was initially designed to identify those ef. 
feminate males suffering from a sexual in- 
version disorder but actually seems to reflect 
aesthetic and vocational interests. The Pa 
(Paranoia) scale contains 40 items intended 
to detect the clinical pattern of paranoia, 
which may also be part of another disorder 
such as schizophrenia. The Pt (Psychas- 
thenia) scale is made up of 48 items that 
attempt to detect the obsessive-compulsive 
syndrome and that are also suggestive of a 
high degree of anxiety. The Sc (Schizo- 
phrenia) scale consists of 78 items that were 
originally intended to differentiate the psy- 
chotic pattern of schizophrenia. The Ma 
(Hypomania) scale contains 46 items that 
reflect the overactivity, emotional excite- 
ment, and flight of ideas common to the 
affective disorder of hypomania. Finally, the 
Si (Social Introversion) scale is made up 
of 70 items suggesting unease in social situa- 
tions, hypersensitivity, and insecurity. The 
10 clinical scales are also commonly referred 
to by number. (See Table 1 for the symbols, 
numbers, and names of the standard MMPI 
scales.) 

Recently Megargee and his associates. 
(Megargee, 1977a; Megargee, 1977b; Me- 
gargee & Bohn, 1977; Megargee & Dorhout, 
1977; Meyer & Megargee, 1977) filled an 
entire issue of Criminal Justice and Behavior 
with the results of 6 years of coordinated 
research. This research was aimed at the 
derivation of a comprehensive MMPI classi- 
fication system for criminal behavior. Me- 
gargee (1977a) makes a compelling case for 
the economy, efficiency, reliability, validity, ' 
and operational utility of such a system by 
comparing it to existing systems on seven 
dimensions of usefulness that he feels are 
requirements for productive classification. He 
then formulates 8 research questions that he 
asserts must all be answered in the affirma- 
tive if the MMPI is to be an adequate basis 
for a typological system, and he explains 
how their series of research studies was de- 
signed to address these questions. This series 
of studies has yielded 10 reliably occu ai 
MMPI profile configurations in a populatio 


M 
4 


= 


MMPI AND BEHAVIOR IN PRISON 


of youthful male federal offenders. Each pro- 
file type is presented with rules for inclusion 
into its respective classification, modal de- 
scriptions of significant characteristics drawn 
from several assessment sources and case 
history data, and hypotheses about optimal 
modes of management and treatment. This 
computer-assisted system successfully classi- 
fied 96% of their sample of 1,214 offenders. 
Although these results are represented as a 
“progress report,” these researchers feel that 


thus far, the taxonomy has surpassed all of our 
initial expectations, The system has outgrown the 
capability of one laboratory to investigate it ade- 
quately. It is hoped that this series of progress 
reports will stimulate other researchers to investi- 
gate these groups and help us determine their utility. 
(Megargee, 1977a, p. 113) 


Six of the eight questions mentioned above 
were addressed and have been answered in 
the affirmative, and this massive research 
effort continues. Besides seeking replication 
with other prison populations and in other 
areas of the country, Megargee and his asso- 
ciates are interested in productively inter- 
facing their fledgling empirical system with 
both theoretical orientations and other em- 
pirical lines of research in corrections (Me- 
gargee, 1977b). Their initial results appear 
promising. 

This review focuses on research that has 
differentiated criterion and comparison in- 
mate groups on some independent basis and 
has then sought MMPI indicators that re- 
liably differentiate these groups. It is hoped 
that this will be a useful complement to the 
work of Megargee and his associates, who 
are separating inmate groups on the basis of 
their MMPI profiles and then seeking corre- 
sponding dynamic and predictive behavioral 
indicators. Haven (Note 1) has comprehen- 
sively reviewed earlier literature in this area, 
but a substantial body of research has 
emerged since his investigation (see Dahl- 
strom, Welsh, & Dahlstrom, 1975, chap. 3). 
An up-to-date assessment of the MMPI's 
potential in prison work is called for, both 
to stimulate relevant cross-validations and 
extensions of the latest findings and to maxi- 
mize the productive interfacing of these find- 
ings with the important work of Megargee 
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and his associates. Studies published in En- 
glish-language journals from 1967 to the 
time of Megargee et al.’s (Megargee, 1977a; 
Megargee, 1977b; Megargee & Bohn, 1977; 
Megargee & Dorhout, 1977; Meyer & Me- 
gargee, 1977) publications are emphasized, 
but studies published before 1967 that meet 
certain specific criteria (see the last paragraph 
of the Protocol Validity subsection) are in- 
cluded for the sake of representativeness and 
continuity. Only studies using some version 
of the MMPI that at least provides for full 
scoring of the 13 standard scales are in- 
cluded; Mini-Mult (Kincannon's, 1968, 71- 
item short form of the MMPI that esti- 
mates scores on most of the standard scales) 
prison research is not addressed, The Review 
section itself is limited to research that deals 
with civilian criminal populations incarcer- 
ated in conventional correctional facilities 
or psychiatric facilities specifically for the 
criminally insane. All criterion and compari- 
son groups are drawn from such populations 
unless otherwise noted. Research dealing 
with incarcerated armed services popula- 
tions, probation populations, and psycho- 
pathic/sociopathic populations in conven- 
tional mental hospitals is not included. 

The first section of this review consists of 
a methodological evaluation of the research 
that includes the following subsections: Sam- 
pling Procedures, Sources of Variance and 
Their Effects on Test Results, Protocol Va- 
lidity, and Methods of Profile Interpretation. 
Limitations of the research are discussed, 
and recommendations for standard design 
modifications and further needed investiga- 
tions are presented. The studies are then re- 
viewed under two major categories: The 
MMPI As a Primary Differentiator of Devi- 
ant Behavior and The MMPI As a Predictor 
of Deviant Behavior. A third section, MMPI 
Differences Across Race and Sex, examines 
the effects these two variables have on 
MMPI profiles obtained from prisoners and 
examines the implications for interpretation 
and generalization. 


Methodological Evaluation 


The studies reviewed in this article illus- 
trate several methodological shortcomings. 
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Table 2 

Design Aspects and Subject Data of the Studies Reviewed x 
——_ 

Study Ne Sh. Te Subject datat 

Adams (1976) 28. M. а 2 5 11 14 

Adams & West (1976) 2 R a 14 

Beall & Panton (1956) iS} ra 14 

Black (1967) 2 d 3 5 9 11 14 

Blackburn (1968) le a 14 

Caldwell (1959) 2 s 23 11 14 

Carroll & Fuller (1971) IR S 2 14 

Cavior et al. (1967) Hare a ON 4 14 
Christensen & LeUnes (1974) 1 s 14 

Costello et al. (1973) ДОМ Р 2 3 5 11 12 14 

Craddick (1962) le s 2.3 5 6 11 14 

Cubitt & Gendreau (1972) 1 E 3 14 

Davis (1971) 1 a 14 

Davis & Sines (1971) 1 a 14 

Deiker (1974) ЗМУ. в 2 3 8 1 13 14 

Driscoll (1952) 1 a 14 

Dunham (1954) 1 a 14 

Edwards (1963) ОМ a 2. 3 5 11 14 

Elion & Megargee (1975) SUNT a 14 

Erikson & Roberts (1966) 3 31152 144 5 Д 

14 

Fisher (1969) a DAMM 5 14 і 
Flanagan & Lewis (1974) 2 ? 14 | 
Frank (1971) Zr! a 14 

Gendreau & Gendreau (1970) 1 a 14 

Gough et al. (1965) ira 14 

Gregory (1974) бог a 14 

Gynther (1962) 1 a 2 5 14 

Haven (Note 10) Ега 14 

Joesting et al. (1975) 1 a 

Johnson & Cooke (1973) 1 a 14 

Lefkowitz (1966) ILES 2 5 14 15 
McCreary (1975) 1 a 23 14 

McCreary & Padilla (1977) 2 Ж а 3 M. aaa 
Mack (1969) 3 0/2 5 7 14 

Megargee & Cook (1975) D Naa} m 14 

Megargee et al. (1967) 2 Е КЫ: 11 14 

Megargee & Mendelsohn (1962) 2 

Oliver & Mosher (1968) : у 1 s т 7 " 

Panton (1958) А Ч V 14 

Panton (1959а) 2 а X 

Panton (1959b) 2 a и 
Panton (1960b) 2 Ma 2 5 ^ 

Panton (19622) 1 Ma 23 5 

Panton (1962b) 1 та 2 X v i 

Panton (1962c) 2 a и 

Panton (1972) amu а 2 5 14 

Panton (1973) c RE: 14 

Panton (1974) 

Panton (1976) о И iia 11 

Panton (1977) > ADI NS T 

Panton (1978) ciues s 14 

Panton (Note 6) 1 1 d 

Panton (Note 7) 2 2 14 
Panton & Behre (1973) quens је " | 


у 2 (continued) 
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Study Ve „55 Те Subject data? 
Panton & Brisson (1971) 2 Ms 2 5 11 14 
Persons & Marks (1971) 1 a 14 
Pierce (1971a) T увора 2 14 
Pierce (1971b) TELE 0 14 
s 
Pierce (1972b) 1 a 2 3 5 14 
Pierce (1972c) 1 Ma 23 5 d 14 
Pierce (1973) 1 TEE SARIS 14 
s 
Pierce (Note 4) 1 М 23 5 14 
Randolph et al. (1961) 2 s 14 
Rosenblatt & Pritchard (1978) 1 M 3 5 11 14 
Shinohara & Jenkins (1967) 3e s 14 
Snortum et al. (1970) 1 a 14 
Stanton (1956) 2° a 11 14 16 
Stump & Gilbert (1972) 1 a 14 
Sutker & Moan (1973) 1 а 14 
Tsubouchi & Jenkins (1969) 3 s 14 
Twomey & Hendry (1969) РЕКЕ 14 
Wattron (1963) d dap 14 
Wilcock (1964) 01M? 7 14 


* V = validity criteria: 1 = no criteria employed; 2 = employed standard cutoffs (L scale Т score maxi- 
mum of 70, F scale T score maximum of 80, K scale T score maximum of 70) or discarded invalid Minnesota 
Multiphasic Personality Inventories (MMPIs); 3 = detected and discarded random profiles only. 

^S = sampling procedures: К = complete randomization; M = complete matching; r = partial randomiza- 


tion; m = partial matching; M = mixed approach; more than one pair of criterion and comparison groups 


used. 
*T = timing of MMPI administration: a = routine admissions administration; s = given specifically for 


the study conducted; d — administered just prior to discharge; ? — not specified; = = mixed арргоасћ, 


more than one pair of criterion and comparison groups used. 

d Those characteristics either controlled or matched: 1 = addiction status; 2 = age; 3 = educational 
achievement; 4 — ethnicity; 5 — IQ; 6 — length of current sentence; 7 = length of time in prison; 8 = 
marital status; 9 = number of prior convictions; 10 = number of disciplinary restrictions; 11 = race; 
12 = rate of recidivism; 13 = religion; 14 = sex; 15 = social class; 16 = socioeconomic status. 


* Conducted preliminary IQ/reading-level screening. 


In addition to the problems discussed later, ations in procedure substantially limit the 
the studies vary on such dimensions as tim- generalizability of present results. 

ing of administration of the MMPI (ке, 

on admission vs. at the time of the research Sampling Procedures 

study), security grade of the institution from ^ 

which the test groups were drawn (this was Slightly more than half of the studies used 
frequently not specified) and region of the some form of matching (ie., matching on 
country in which the study was conducted. a few dimensions such as age, length of 
Table 2 presents most of the important char- imprisonment, IQ, or educational level) or 
acteristics of the studies reviewed, and the random sampling, whereas the rest used 
lack of consistency with respect to the con- every available member of a group with 
trol of subject characteristics across the completed MMPIs. Those studies employing 
, Studies is readily apparent. Overall, the vari- partial randomization frequently used ran- 
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domly selected comparison groups while 
using all available prisoners who qualified 
for the criterion group. Studies employing 
partial matching commonly had more than 
two groups within an experimental compari- 
son or more than one comparison and only 
matched some of their groups on a few di- 
mensions, such as those mentioned above. 
(See Table 2 for all of the dimensions that 
were employed.) 

Although complete randomization in prison 
research is sometimes impractical, partial 
approaches such as those above do not com- 

. pensate for systematic bias. If full random- 
ization cannot be realized, then matching 
a comparison group to the available criterion 
group (which is defined by a history of 
a given target behavior such as escapism, 
homosexuality, etc.) is preferable to random- 
izing the comparison group, since the latter 
procedure does not constitute an intelligible 
comparison, Partial matching approaches 
have often been necessitated by the lack of 
data on some matched variables for certain 
groups, yet this still serves to weaken the 
representativeness of results. In short, the 
findings of many studies are severely limited 
in their generalizability and must be inter- 
preted with caution. 


Sources of Variance and Their 
Effects on Test Results 


Table 2 demonstrates that the only source 
of variance that was consistently controlled 
across almost all studies was sex. Sixty-nine 
Studies limited their observations to only 
one sex, and the overwhelming majority in- 
vestigated male prisoners. The only other 
sources of variance that were controlled by 
a substantial number of studies were age, 
educational achievement, IQ, and race. 

Some research has been done on the effects 
of these latter variables on MMPI profiles 
in general. Costello, Tiffany, and Grier 
(1972) examined several methodological 
questions with respect to racial (ie., black- 
white) comparisons in their investigation of 
carefully matched inpatient and outpatient 
psychiatric clients, They found that blacks 
had more elevated profiles in general, with 
significantly higher scores on the F, Hs, 
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Mj, Sc, and Ma scales and significantly: 
lower scores on the К scale. Gynther (1972) - 


reviewed the literature on black-white MMPI 
differences in normal populations and reached 
several pertinent conclusions. There seem to 
be consistent differences between blacks' and 
whites MMPIs, the most frequent being 
significantly higher elevations for blacks on 
the Sc and Ma scales, which seem to be 
affected by such variables as education, resi- 
dence, and cultural separation. However, 
Gynther claims that there is no evidence to 
indicate that these trends signify that blacks 
are less well adjusted than whites. Gynther 
advocates the construction of a new MMPI 
form with T' scores based on black norms 
and with the derivation of black behavioral 
correlates based on profile types. He suggests 
that a temporary solution would be the 
generation of qualifying rules for the inter- 
pretation of blacks’ MMPIs, and he urges 
that those interpreting MMPIs should be 
alerted to the potential misuse of the test 
in treating black profiles in the same manner 
as white profiles. 

Subsequent research has attempted to 
clarify the issues raised by Gynther’s (1972) 
review. Gynther, Altman, and Warbin (1973) 
investigated the consistent finding of higher 


~ 
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F scale scores for blacks by comparing the ç | 


MMPIs of 1,125 white psychiatric inpatients 
to those of 134 black psychiatric inpatients 
(63% of the overall sample was male). They 
also used records that contained sets of 
168 demographic descriptors and 111 mental 
status items for each patient. Nonrandom 
profiles having an F scale raw score 2 26 
were separated from the other profiles for 
each racial group, and the demographic and 
mental status items were analyzed for their 


x 


ability to differentiate the groups. Replica- 4 - 


tions were carried out for both black an 

white groups. For whites, 16 items overall 
provided significant differentiations across 
the original and replication samples, аі 

these items essentially characterized F scale 
> 26 profiles as representing “confused pSY- 
chotics.” This was consistent with earlier 
research findings. However, the original black 
sample with F scale > 26 profiles shared only 
5 discriminating items with the original white 
sample, and only 3 of these items were 80194 


И 
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in the same direction. Furthermore, not one 
variable provided a significant differentiation 
for both the original and replication samples 
of blacks. Gynther et al. (1973) concluded 
that blacks with an F scale score 2 26 are 
not seen any differently than blacks with 
lower F scale scores when compared on a 
mental status exam, and they maintain that 
an F scale score > 26 for a black psychiatric 
patient has different behavioral correlates 
than a similar score for a white psychiatric 
+ patient. They also suggest that this differ- 
ence may be due to "race-sensitive" items 
on the MMPI; blacks seemed to favor items 
reflecting certain attitudinal dispositions such 
as alienation, religiousness, liking for school, 
and romanticism. 
As part of a larger project, Gynther, 
Lachar, and Dahlstrom (1978) used a large 
, sample of normal, conservative middle-class 
P". black adults from Alabama, Michigan, and 
| North Carolina (321 males, 561 females) to 
generate a new F scale for blacks. They 
| 


replicated the criterion used when the origi- 

nal F scale was generated on a Minnesota 

sample, which accepted items answered in a 

given direction by 1046 or less of the sample. 

They identified a total of 33 items in this 

manner, 22 of which are on the standard 

А 64-item F scale. They found that higher 

a ' scores in their black sample for both the 

original and the new F scale were signifi- 

cantly related to a younger age, less educa- 

tion, and a less-skilled job classification of 

the head of the household. Gynther et al. 

(1978) concluded that the significant corre- 

lates of high F scale scores for whites do not 

appear to be valid for blacks, and they sug- 

gest that their new F scale may be a better 

ж ‘Measure of deviant responding for blacks 
| ? than the standard F scale. 

А Further studies have investigated other 

| MMPI scales as well in their examination of 

racial differences on the test. Penk and Ro- 

binowitz (1974) investigated the profiles of 

black and white male veterans who were drug 

l abusers, They compared black opiate users 

| to white opiate users as well as comparing 

| black opiate users to white opiate nonusers. 

| They found that black opiate users scored 

Significantly lower than white opiate users 

l on the F and Hy scales and scored lower 
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than white non-opiate users on the F, Pt, 
Sc, and Si scales. They note that these find- 
ings conflict with the expectation that blacks 
typically obtain higher MMPI scale scores. 

Davis and Jones (1974) compared black 
and white veterans who were psychiatric pa- 
tients by systematically varying race, educa- 
tion, and differential diagnoses obtained in- 
dependently from the MMPIs, An initial 
survey of all available black subjects (390) 
and of a randomly selected white comparison 
group found that blacks were more likely to 
be diagnosed schizophrenic, whereas whites 
were more likely to be diagnosed alcoholic 
and/or depressed. They concluded that there 
seem to be race-related differences in diagno- 
sis, but they were unable to determine 
whether this was due to actual race-related 
variations in psychopathology or to blacks 
being misdiagnosed more often. Next, Davis 
and Jones randomly selected 20 subjects for 
each of eight groups; each racial group was 
blocked into groups of high-education (mini- 
mum of 12 years of education) schizophren- 
ics, low-education schizophrenics, high-edu- 
cation nonschizophrenics, and low-education 
nonschizophrenics, Subjects older than 50 or 
having an F-K index greater than 14 were 
not included. They found that there was no 
significant main effects associated with dif- 
ferences in гасе, Schizophrenics scored sig- 
nificantly higher on the Pa and Sc scales 
than nonschizophrenics, and the poorly edu- 
cated scored significantly higher on the Sc 
scale than the well educated. Some interac- 
tion effects were evident: The more poorly 
educated blacks and better educated whites 
scored significantly higher on the Pa scale 
than the other subjects, and the more poorly 
educated blacks scored significantly higher 
on the Sc scale than the other subjects (with 
the better educated blacks scoring the lowest 
on this scale). Davis and Jones concluded 
that race is a type of independent subject 
variable that confounds attempts to ade- 
quately interpret MMPI profiles of randomly 
selected black versus white groups, but this 
is not the case when diagnosis and education 
are controlled. They maintained that with 
poorly educated groups or with randomly 
selected groups in which the majority of 
blacks are poorly educated, blacks would be 
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expected to perform in a more pathological 
direction than most random samples of 
whites who are similar in characteristics, 
such as type of psychopathology. 

Cowan, Watkins, and Davis (1975) ex- 
amined the same subjects used by Davis and 
Jones (1974) and blindly sorted the MMPI 
profiles into schizophrenic and nonschizo- 
phrenic groups. A protocol was classified as 
schizophrenic if the Sc scale T score exceeded 
70 and if the Sc scale was more elevated than 
the РЕ scale. In three of the four nonschizo- 
phrenic groups, the profiles were correctly 
classified significantly more often than by 
chance in a way that conformed to previ- 
ously determined expectations for percentage 
of correct classification. However, almost one 
half (9 out of 20) of the poorly educated 
black nonschizophrenics were misclassified 
as schizophrenic. Cowan et al. concluded that 
although cultural background factors exert 
a significant influence, much of the reported 
variation in blacks’ MMPIs may be a func- 
tion of education. They maintained that the 
MMPI appears to retain adequate discrim- 
inative power for blacks with at least 12 
years of education. 

The conclusions of Davis and Jones 
(1974) and Cowan et al. (1975) seem to 
constitute the key issue here. Davis and 
Jones suggest that their results indirectly 
support Gynther’s (1972) hypothesis, which 
maintained that blacks produce more ap- 
parently pathological MMPIs because of 
their experiences of alienation from the estab- 
lished white society. Gynther maintains that 
blacks learn to be suspicious of others be- 
cause it is necessary for their survival in 
this society. However, Davis and Jones point 
out that well-educated blacks do not appear 
to demonstrate the consistent racial differ- 
ences on MMPI scales that randomly se- 
lected groups of blacks do, and they con- 
clude that a more advanced education seems 
to have either a masking or obliterating 
effect on cultural factors that can produce 
misleading MMPI configurations. They hy- 
pothesized that a selection process may be 
operating here in which the more sensitive 
and suspicious blacks drop out of school as 
soon as they are legally allowed to. On the 
other hand, blacks that elect to continue 
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their education are exposed to extended cul- 1 
turation effects through their prolonged con-.—7 
tact with white values and expectations, and È 
since they are less sensitive and suspicious 
than those blacks who drop out, they are 
more apt to assimilate the cultural values of 
whites. Cowan et al. essentially concur with 
this perspective. Therefore, if higher eleva- 
tions for blacks on MMPI scales are chiefly 
due to factors such as differences in IQ or 
education instead of race, then Gynther’s 
advocacy of separate norms for blacks seems ~ 
inappropriate. Apparent bias could be due 
to a combination of factors in which race 
may play a minor role. At this point the 
assumption of racial bias of the MMPI ap- 
pears to be premature at best; the entire 
issue demands further research. (See the 
discussion of racial differences in MMPI 
profiles of prisoners in particular, which is 
included near the end of the Review section.) 
Panton (1960a) has investigated the ef- 
fects of intelligence on profiles taken from a 
prison population, Prisoners with an IQ be- 
low 110 consistently demonstrated neuroti- 
cism and anxiety (elevations on the Hs, D, 
Pt, and Sc scales), whereas prisoners above 
the 110 cutoff consistently demonstrated 
character disorders (elevations on the Hy, 
Pd, and Ma scales). Panton (1959a) also 
examined the effects of age on prison profiles 
and found that white inmates over 30 years 
old scored significantly higher than under-30 
white inmates on the Hs, D, Ну, and Pd 
scales and significantly lower on the Sc, Ma, 
and Si scales. (Groups of black inmates did 
not show these differences.) A second study 
by Panton (1976-1977) on the effects of age 
found that when compared to baseline MMPI 
figures for the prison population, inmates 
who were at least 60 years old displayed “а | 
neurotic overlay with less psychopathy” (sig- '» 
nificantly higher on the Hs, D, Hy, and Si 
scales and significantly lower on the Pd 
scale). In their investigation of the F scale . 
in white psychiatric patients, Gynther and 
Shimkunas (1965a; see the section on Pro- 
tocol Validity) found an interaction between 
age and intelligence that affected F scale 


a 


scores. 
These findings are far from conclusive, but : 
they do suggest that all of these factors atè- 
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.. primary sources of variance that should be 


ontrolled in MMPI research. In fact, they 
demand further investigation in their own 
right (especially the issue of racial and cul- 
tural bias), but until the nature of their 
influence is clarified, they should be dealt 
with either through matching or blocking 
procedures. Even researchers using full ran- 
domized assignment would be well-advised 
to examine their samples post hoc for sig- 
nificant differences on age, IQ, and race. 
Other variables in Table 2 that are more 
specific to prison populations have not been 
investigated for their effects on MMPI pro- 
tocols. Therefore, randomization seems to be 
the preferred method to control for confound- 
ing effects, since these latter potential sources 
of variance may exert a presently unknown 
influence on test results. 


Protocol Validity 


Hathaway and McKinley (1967) state 
that "subjects sixteen years of age or older 
with at least six years of successful school- 
ing can be expected to complete the MMPI 
without difficulty" (p. 9). Yet only eight 
of the studies reviewed (see Table 2) men- 
tioned any attempts to screen their subjects 
for IQ or reading level, even though there 


is a significant incidence of illiterate and 


foreign-language-speaking inmates in prison 
in general. Such preliminary screening is an 
important consideration in obtaining valid 
profiles and should be a standard part of 
MMPI research in prisons. Dahlstrom, 
Welsh, and Dahlstrom (1972) recommend 
the use of brief intellective or reading 
achievement measures such as the Wide 
Range Achievement Test (Jastak, Bijou, & 
Jastak, 1965), the Ohio Literacy Test (Fos- 
ter & Goddard, 1924), or the Kent EGY 
(Kent, 1946) to identify subjects who may 
have difficulty completing the MMPI (p. 
21). However, they report one study (Glenn, 
1949) that investigated the use of oral pre- 
Sentation of the MMPI to some retarded 
juvenile delinquents who lacked the above 
minimal educational requirements. This study 
Concluded that success could be achieved for 
Subjects with an IQ of at least 65 and with 
at least 3 years of schooling. 
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Dahlstrom et al. (1972) also report that 
Panton has adopted an oral delivery ap- 
proach for the group testing of prisoners in 
which the inmate with the most education 
and best reading competency reads the items 
to the rest of the group. The extent to which 
this procedure is legitimate with respect to 
those prisoners below the IQ/reading-level 
criteria has not been investigated. Therefore, 
at this time the use of the MMPI with 
prisoners failing to meet these criteria cannot 
be sanctioned as a sound methodological 
practice. Further research should investigate 
the advisability of oral administration with 
prisoners having less than a sixth-grade 
reading level or an IQ below 80 (the com- 
mon Wechsler Adult Intelligence Scale 
[WAIS] cutoff; see Dahlstrom et al., 1972, 
p. 21). One alternative for researchers whose 
sample representativeness is threatened by 
this requirement would be to administer oral 
versions of the MMPI to the subjects in 
question and analyze their overall outcome 
data twice, once by including these subjects 
and once by excluding them from the analy- 
sis. Differences in results could then be 
examined, and unless the number of such 
oral administrations is substantial, few if 
any significant differences in outcome be- 
tween the two analyses should be found. 

One study links the validity of a given 
MMPI profile with the timing of its ad- 
ministration in the prison setting. Pierce 
(1972a) gave 60 inmates the MMPI on 
their admission to prison and then retested 
the same group 6 weeks later. He found 
that group mean scores dropped significantly 
on the Pt, Sc, and Si scales, whereas the 
group mean score rose significantly on the K 
scale. He concluded that the initial profiles 
were confounded by the presence of stress 
associated with entering prison, and this 
tended to obscure the actual profile configu- 
ration that was indicative of the inmate's 
true personality. Although comparisons of 
changes within an individual inmate's two 
profiles would have been preferable to 
Pierce's method of comparing differences be- 
tween the group means for each scale, Pierce 
has raised an important question here, Fur- 
ther research should determine whether the 
delaying of MMPI administration to allow 


938 


for some institutional adjustment and a 
corresponding decrease of situational stress 
would result in more accurate and more use- 
iul MMPI profiles. 

The F scale is probably the single most 
widely used indicator of MMPI profile in- 
validity. A common contemporary practice 
is to discard profiles with T scores exceeding 
70 for the L scale, 80 for the F scale, or 70 
for the K scale, and a high F scale score is 
usually the criterion that causes the dis- 
carding of protocols. As shown by Table 2, 
29 of the previously mentioned studies deal 
with this protocol validity question in some 
manner, whereas the rest seem to ignore the 
issue entirely. However, research suggests 
that the rigid application of the F scale T 
score exceeding 80 to determine invalidity 
may be ill-advised. Comrey’s (1957-1958) 
factor analysis of the F scale led him to con- 
clude that a high F scale score may be a valid 
indicator of pathology, not a signal of profile 
invalidity. Morrice (1957) felt that high F 
scale scores were an index of personality 
disorder in his group of recidivist criminals, 
and he suggested focusing on overall profile 
configurations as meaningful while under- 
interpreting absolute single scale elevations. 
He reports the following “impressionistic” 
investigative results: 


The test was repeated in three fully investigated 
prisoners to determine whether their profiles were 
reproducible, In fact the test, repeated after an 
interval of six to seven months, reproduced the 
original profiles very closely and in each case with 
а repeat of an abnormally high F score. There was 
no reason for these three prisoners to malinger and 
it would be clever deception indeed to reproduce 
nearly identical responses . . . . The impression 
gained is that a high F, together with several ab- 
normal scores on the personality scales, is mean- 
ingful in terms of personality disorder oí anti- 
social type. (Morrice, 1957, p. 634) 


This latter study was the first suggestion 
that people specifically classified as charac- 
ter disorders may validly produce high F 
scale Scores when they respond to the test 
in a candid and truthful manner. In their 
consideration of the meaning of the F scale 
Dahlstrom et al. (1972) concluded that in 
certain instances, “elevated F scale scores are 
part and parcel of the behavioral disorder 
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generating the clinical-scale configurati 
documenting its range and severity but по 
reflecting adversely upon the dependability 
of the MMPI protocol itself" (p. 161; 
this source for a more in-depth considerat 
of the F scale). They even suggest that valic 
profiles may, in “rare instances,” exhib 
raw-score F scale values of 40 or more 
score well over 110) for these very reason 

Later studies supporting the validity 
nonrandom profiles with F scale 7 scores e 
ceeding 80 have searched for possible clin 
interpretations of the high F scale score, 
Gynther (1961) found evidence which sug 
gested that F scale scores could differentiate 
between diagnostic classifications, since his 
group of "aggressive" criminals tended 
get T scores exceeding 80 significantly mon 
often than “passive” criminals. Investigating: 
Е scale scores with respect to 12 crime classis 
fication groups, Gynther (1962) also found 
a significant relationship between high F 
scale scores and sexual crimes and concluded 
that larger F scale values indicate “emotio 
ally ‘sicker’” criminals, Blumberg (1967 
found supporting evidence for Gynther's | 
(1962) conclusion. However, Gauron, Ste- 
venson, and Englehart (1962) failed to af 
ferentiate behavioral disorders from the rest 
of their hospital patient sample on the basis а 
of high F scale scores and concluded that а 
F scale T score exceeding 80 “cannot 5 
routinely employed as a diagnostic sign 
behavior disorder with psychiatric patien 
(p. 488). Gynther and Shimkunas (1965b ] 
attempted to confirm these findings by using 
both hospital patient and prisoner groups 
Their psychotic patients accounted for alc 
most 70% of their F scale T scores that | 
exceeded 80 for the patient group, whereas” 
their behavior disorder prisoners accounted 
for 66% of their F scale T scores that eX. 
ceeded 80 for the prisoner group. The re 
versal of this relationship between high F 
scale scores and diagnosis was significant аи 
the .001 level of confidence. They concluded 
that the differences between the two samples” 
reflect personality features of those р 
break the law, and they concur with Leary $ 
(cited in Gynther & Shimkunas, 1965b) 257 
sertion that the F scale measures hostility 
and aggression, Rice (1968) also found 4 
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significant relationship between F scale Т 
scores that exceeded 80 and overtly hostile, 
aggressive behavior, 

A few studies have attempted to derive 
diagnostic indicators from the F scale. Mc- 
Kegney (1965) found that 21 specific F 
scale items were answered in the scorable 
direction by male juvenile delinquents sig- 
nificantly more frequently than by normals, 
He concluded that these items accurately 
described special problem areas of the sub- 
jects in his sample even though they caused 
F scale T scores above 80, and he suggested 
that specific F scale item endorsements could 
provide clinical insight into individual cases. 
Dahlstrom et al. (1972) echo this suggestion, 
pointing out that 21 F scale items appear 
on Grayson's Critical Items List (Grayson, 
Note 2) and stating that “in the utilization 
of the F scale items as a set of rare answers, 
the clinician should not lose sight of the 
possibility that one or all of the answers to 
these items from some test subjects may be 
quite literally true, clinically relevant, and 
worthy of special investigation" (p. 115). 
Gynther and Petzel (1967) failed to confirm 
their hypothesis that psychotics and in- 
dividuals with behavioral disorders with F 
scale T scores exceeding 80 endorse different 
, patterns of F scale items; only one item 
(No. 215, *I have used alcohol excessively") 
reliably differentiated the two groups. How- 
ever, using this one item in conjunction with 
the total number of items endorsed on a 
12-item F scale subscale of manifest psy- 
chotic content proved to effectively differen- 
tiate the two groups. They suggest that a 
“nonconformity” dimension causes high F 
scale scores and is a general dimension un- 
derlying both psychosis and behavior dis- 
orders. 

The effects of age and intelligence on F 
scale scores in psychiatric patients were in- 
vestigated by Gynther and Shimkunas 
(1965a). Several of the above studies have 
found that youth correlated with higher F 
scale scores (Blumberg, 1967; Gauron et al., 
1962; Gynther, 1961; Gynther, 1962), but 
no consistent relationship was found between 
high F scale scores and intelligence (Gyn- 
ther, 1961; Gynther, 1962) or educational 
„level (Gauron et al, 1962; McKegney, 
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1965). Gynther and Shimkunas (1965а) 
iound an interaction between age and in- 
telligence that they termed the "critical ele- 
ment” affecting changes in F scale scores: 
The scores decreased with increasing age for 
low- and high-IQ subjects but remained 
relatively constant for average-IQ subjects. 
However, educational level did not effect F 
scale scores. 

A few studies have examined other indices 
of invalid MMPI profiles with prisoners. 
Lawton and Kleban (1965) retested their 
prisoner group with the MMPI and told 
them to simulate someone who had never 
been in trouble with the law. Seven clinical 
scales showed significant drops compared to 
the original testing, but relative configura- 
tional elevations did not change significantly. 
Lawton and Kleban concluded that prisoners 
are unable to successfully manipulate the Pd 
scale alone to conceal their sociopathy. Ben- 
nett (1970) used a similar design and con- 
cluded that inmates either cannot or do not 
*fake good" to any significant degree. How- 
ever, Gendreau, Irvine, and Knight (1973) 
criticized the above instructional sets as 
unrealistic and after obtaining protocols in 
the standard fashion, instructed their prison- 
ers to successively fake bad and fake good 
on retests as if they were trying to manipu- 
late the prison system for desired treatment 
or privileges. They found that faking in both 
directiong radically distorted initial profiles 
and successfully concealed the basic charac- 
terological problems of the prisoners. Both 
the F scale (using a raw score cutoff of 34, 
which has a 7 score well over 100) and the 
F-K index (subtracting the K scale score 
from the F scale raw score and using a 
cutoff of 24) successfully separated 100% of 
the fake-bad profiles from the original stan- 
dard administration profiles. All seven of 
the indicators that they examined effectively 
identified fake-good profiles, the two best in- 
dicators being the Positive Malingering 
(Mp) scale (92% overall hit rate) of Cofer, 
Chance, and Judson (cited in Dahlstrom 
et al., 1972) and the F-K index (85% hit 
rate). Ability to fake effectively was not 
found to be related to IQ. Gendreau et al. 
advocated further research investigating the 
readjustment of the standard F scale and 
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F-K index cutoff scores to properly discrimi- 
nate honest versus faked inmate profiles and 
suggested the possibility of the routine use 
of the Mp scale in identifying fake-good 
profiles. 

As part of their larger study investigating 
racial differences on the MMPI, Costello 
et al. (1972) considered the effects that 
different validity criteria for the selection 
of protocols have on research outcomes. 
Their study analyzed the data by using all 
available protocols and then reanalyzed the 
data by using only valid protocols (? scale 
maximum T score of 100, L scale maximum 
T score of 70, F scale maximum T score of 
80, K scale maximum T score of 70, F-K 
maximum raw score of 9). They found that 
all significant differences in the initial analy- 
sis were eliminated in the reanalysis, and 
they felt that this was partially due to the 
fact that high F scale scores are associated 
with elevations on certain clinical scales 
(such as the Sc scale). They concluded that 
the employment of validity criteria similar 
to those described above restricted both 
profile variability and the detection of actual 
differences between groups. (They also noted 
that more black profiles were eliminated by 
those criteria than white profiles.) 

The above studies offer several insights 
into what constitutes a valid MMPI inmate 
profile. Of primary importance is the finding 
that the conventional cutoff of the, F scale 
T score exceeding 80 should not be dog- 
matically employed as a criterion for profile 
invalidity; profiles with this F scale ele- 
vation may actually be valid, especially with 
a prison population. A high F scale score 
does not seem to necessarily indicate any 
single behavior or diagnosis but may be 
associated with a generally nonconforming, 
hostile, and aggressive approach to life. Al- 
bet Vienna of F scale item endorse- 
ments on individual profiles ma: i 
useful clinical intitle: the use of high * 
Scale scores in differential diagnosis does 
not seem to be a promising possibility. Fac- 
tors such as age and intelligence may also 
effect F Scale elevations, but more research 
using prison populations specifically is needed 
especially to check for any interaction effect 
resembling that found by Gynther and Shim- 
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kunas (1965a). Dependable indicators of 
faked prisoner profiles are sorely needed, 
but none have been identified to date, 
Megargee (Note 3) described a complex 
approach to the profile validity problem that 
he and his associates have used in the de- 
velopment of their classification system, 
Reading levels for prisoners were not deter- 
mined through any routine testing methods, 
but those inmates who were found to have 


reading difficulties were tested orally with | 


a tape-recorded version of the MMPI. The 
Spanish-speaking inmates were given the 
Spanish version of the MMPI. After each 
inmate completed the MMPI, he was re- 
quired to correctly identify his answers to 
six test items randomly chosen from his 
protocol by the examiner. Failure to pass 
this check necessitated a retest. Profiles with 
F scale T scores exceeding 100 were ex- 
amined clinically for their approximation to 
the mean random-response clinical-scale pat- 
tern, which consists of scores on each scale 
that are equivalent to the midpoint (ie, 
half the items marked in the scorable direc- 
tion) of that scale. Only those profiles that 
deviated from this pattern on the validity 
scales and the clinical scales and that made 
“clinical sense” were retained as nonrandom. 

This approach was the best one encoun- 
tered by this author, but it still has some 
drawbacks. For one thing, Megargee states 
that "obviously some expertise with the 
MMPI is involved" in the latter clinical 
scrutinizations of suspect profiles; the valid- 
ity sections of Dahlstrom et al. (1972) may 
aid the researcher who lacks such expertise 
in coming to grips with this challenging M- 
terpretive task, Contrary to the previously 
mentioned IQ/reading-level recommendations, 
no intelligence testing was done to screen !n- 
mates for the MMPI. Again, the testing of 
subjects lacking the minimal reading-skills 
criteria has not yet been empirically demon- 
strated to be a sound methodological prac- 
tice, and the lack of preliminary reading-level 
screening cannot be sanctioned. Even 50, this 
approach is more acceptable overall than 
most being employed today, and the combi- 
nation of routine screening examinations wit 
Megargee’s guidelines is probably the best 


approach to the issue of MMPI protocol у 
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validity that can be realized presently. One 
alternative that may prove to be a more 
economical means for the detection of ran- 
domly answered profiles (especially if pro- 
tocols are being drawn from an accumulated 
data bank) is the test-retest (TR) index 
of Buechley and Ball (1952). This index 
examines the 16 repeated items in the book- 
let and R forms of the MMPI for consist- 
ency of item endorsement and employs a cut- 
off of more than three disagreements to 
identify and discard randomly generated pro- 
files. Unfortunately, the lack of dependable 
fake-bad and fake-good indices cannot be 
satisfactorily compensated for at this time; 
the findings of Gendreau et al. (1973) must 
be cross-validated and built upon to fill this 
void. Future studies should also investigate 
the retesting approach for random profiles 
mentioned above and should consider the use 
of standardized instructions cautioning 
against any further attempts at deception. 
The comparability of such protocols obtained 
under added duress with protocols obtained 
in the standard manner would be difficult 
to assess, however. (See Dahlstrom et al., 
1972, pp. 105-106, for a discussion concern- 
ing the implications of the forced-choice 
change in administration; for a more thor- 
_ ough treatment of all of the above validity 
indicators as well as other validity indica- 
tors that have not yet been investigated in 
the prison population, see the validity sec- 
tions in Dahlstrom et al., 1972.) 

In summary, there is a strong possibility 
that a good deal of the MMPI research with 
prisoners has been adversely effected either 
by the failure to employ any criteria for 
profile invalidity (causing a confounding of 
any actual differences between groups by 
random profiles) or by the overly rigid ap- 
plication of conventional validity criteria 
(causing a concealment of significant differ- 
ences between groups by restricting eleva- 
tion ranges on several scales). The validity 
criteria used in each study reviewed in the 
main body of this article will be indicated 
by a number after the date of each article 
the first time it is cited (e.g., Beall & Panton, 
1956; 1). This number corresponds to the 
number entered under the V column of Table 
2: 1=no criteria employed; 2 = employed 
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standard cutoffs (L scale T score maximum 
of 70, F scale T score maximum of 80, K 
scale T score maximum of 70); 3 — detected 
and discarded random profiles only. This is 
intended to aid in the consideration of the 
possible effects that selection of validity cri- 
teria may have had on the findings pre- 
sented. The present author advocates Alter- 
native 3 as the most appropriate procedure 
until further research on fake-bad and fake- 
good indices clarifies their proper applica- 
tion with inmate MMPI profiles. 

As mentioned in the introduction, special 
validity criteria were used in the selection of 
pre-1967 studies for review. Any such study 
that either employed standard cutoffs (usu- 
ally, L scale maximum T score of 70, F scale 
maximum T score of 80, and K scale maxi- 
mum T score of 70) or at least provided 
validity scale data (usually means and stan- 
dard deviations) were included, since this 
allows at least a rough assessment of the 
direction and extent to which the findings 
may be distorted. Pre-1967 studies that failed 
to meet these criteria provide no clues for 
such an assessment and were therefore 
omitted to keep the present review to a 
manageable length; the only exceptions made 
were for studies that generated original ex- 
perimental scales, for these studies were 
necessary for purposes of continuity in the 
review. (See Haven, Note 1, for a more 
comprehensive review of pre-1967 studies.) 


Methods of Profile Interpretation 


One crucial choice that any MMPI re- 
searcher must make is which of the several 
ways of viewing MMPI data will maximize 
the quality and quantity of useful informa- 
tion gained from the protocols. The studies 
reviewed in this article consider one or more 
of several protocol aspects, including con- 
ventional scale elevations, mean profile con- 
figurations for each group, actuarial high- 
point coding systems, a sequential linear- 
sums model, and experimental scales using 
cutoff scores. Costello et al.’s (1972) previ- 
ously reviewed study on racial differences 
(see the section on Sources of Variance and 
Their Effects on Test Results) examined the 
variations within the significant findings of 


942 


several different data assessment approaches 
that were applied to the same set of MMPI 
protocols. They found that consideration of 
isolated scale elevation means between groups 
produced significance on the F, K, Hs, М], 
Sc, and Ma scales, whereas а high-point 
coding approach identified the Mf, Pa, Sc, 
and Si scales, and a two-digit coding analy- 
sis identified the Hs, D, Pd, Pa, Pt, and Sc 
scales. They concluded that “inferences 
drawn from contrasting differences would de- 
pend on the particular dependent measure 
employed” (p. 167). Black’s (1967) study 
(see the section on Recidivism) serves as a 
` valuable object lesson in this respect; even 
though initial investigation of isolated scale 
elevations failed to produce significant dif- 
ferences between his recidivist and nonrecid- 
ivist groups, Black carried out further anal- 
yses in a highly resourceful manner until 
he obtained a combination of indices that 
correctly identified 90% of his overall sam- 
ple. Efficient maximization of profile data 
may demand the investigation of several 
different approaches simultaneously, for the 
above two studies suggest that not only the 
nature of interpretation of results but even 
the efficient detection of existing significant 
differences are a function of the type of 
MMPI dependent measure employed. 
Gregory's (1974; see the section on Classi- 
fication of  Psychopathologic Behavior) 
unique sequential linear-sums approach also 
deserves consideration here. Essentially, this 
approach used a stepwise regression analysis 
that employed the conventional MMPI scales 
as predictor variables and the classifications 
based on checklist ratings of available case 
history data as criterion variables. His three 
index formulas successfully classified 63% of 
his overall sample as psychopathic, adjusted, 
or neurotic. Gregory’s perspective is that 


the evaluation of code type, systems should be 
pragmatic, that is, based on the proportion of target 
population profiles that can be interpreted within 
the system and on the degree to which stable and 
useful personality correlates are generated . . 


utility must always have the final 4 
italics in original) say. (p. 391, 


The extent to which Gregory's approach 
lives up to this standard as compared with 
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other approaches needs to be examined wit] 
further comparison research in the priso 
population. 

In their important and extensive work on 
the efficiency of predictive psychometric in- 
dicators, Meehl and Rosen (1955) identified 
a critical methodological point, not observed 
by most of the predictive studies in this re- 
view, that used "cutting scores" on experi- 
mental scales, Such studies usually isolate 
those MMPI items that differentiate best be- 
tween their research groups and use these! 
items to form an experimental scale. The 
distribution of total scores on the scale is 
examined for each of the two groups, and 
the single score that maximizes the differen- 
tiation of the two groups (i.e., the first group 
mostly falls on one side of this score, whereas 
the second group mostly falls on the opposite 
side of this score) is designated the "cutting 
score" for the scale, However, Meehl and' 
Rosen point out the necessity of considering 
the base rate of occurrence of a given cri- 
terion variable in the overall population 
under study in the investigations of this 
type. This is required because a psychomet- 
ric predictor such as an experimental MMPI 
scale that at first blush appears to make an 
impressive differentiation may actually cause 
more incorrect identifications than the basez. 
rate differentiation. This is particularly à 
problem when the criterion variable normally 
occurs in almost all or almost none of the 
population under study, whereas it ceases 10 
be a problem when the criterion variable пог- 
mally occurs in approximately 50% of this 
population. 

Shupe and Bramwell (1963; this study 
did not meet the validity criteria and is not 
included in the Review section) constitute 
an example of this problem in their invest и | 
gation of the Prison Escape (Ec) scale. 
Since the base rate of escape at their 1m- 
stitution was about 5%, their sample results 
indicated that the Ec scale could predict €s- 
cape risk in the overall population with 90% 
accuracy. However, Shupe and Bramwe 
realized that even though they could cor- 
rectly predict escape risk in all prisoners | 
90% of the time by using the Ec scale, they | 
would correctly predict escape risk m ad | 
prisoners 95% of the time if they predicte, | 
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that no inmates would escape. Another prob- 
lem with the Ec scale is that it would pro- 
duce about 7.6% false positives (nonescapees 
incorrectly labeled as escapees) in the overall 
population, whereas base-rate prediction 
would produce no false positives, since it 
predicts that no one will escape. These find- 
ings are a powerful illustration of Meehl 
and Rosen’s disquieting assertion that “de- 
ciding on the basis of more information can 
actually worsen the chances of a correct de- 
cision” (p. 202, italics theirs), In view of 
this methodological revelation, Gough, Wenk, 
and Rozynko’s (1965; see the section on 
Recidivism) prescription seems most appro- 
priate: 


All claims as to predictive accuracy must be strin- 
gently verified by utility analyses. In essence, this 
means that in any prediction a chance level of 
accuracy must be defined, based on the observed 
frequency of the criterion, and then the diagnostic 
or forecasting technique must be contrasted with 
this chance level. (p. 433) 


Any researcher involved in predicting any- 
thing from MMPI inmate profiles should 
incorporate this crucial point and Meehl and 
Rosen’s other methodological requirements 
and helpful suggestions into their experi- 
mental design (see also Cronbach & Gleser, 
1965). Unfortunately, Gough et al.’s study 
is the only work of a predictive nature re- 
viewed in this article that followed Meehl 
and Rosen’s recommendations. The other 
predictive researchers reviewed here should 
obtain appropriate base-rate figures from 
their overall sample population and reevalu- 
ate their findings following Meehl and Ros- 
en’s prescriptions. The true utility of their 
findings could not be assessed adequately in 
every case in this article, since few studies 
provided the appropriate base-rate figures. 
Although Meehl and Rosen point out that 
certain special situations (e.g, see the dis- 
cussion on suicide in the section on Institu- 
tional Adjustment) mitigate the relevant 
applicability of chance level prediction accu- 
racy, their guidelines should be considered 
nonetheless; the pros and cons of using base- 
rate prediction in any situation must be 
individually assessed for the particular situa- 
tion in question. 
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Table 3 
Experimental Scales Among the Significant 
Results of the Studies Reviewed 


————————— 


Symbol Scale 
A Anxiety 

AI Anxiety Index 

Al Alcoholism 

Ap Prison Adjustment 

As Asocial 

Asx Aggravated Sex 

Al Anxiety 

CI Critical Item 


CR Conversion Reaction 

DaS Drug Abuser 

Di Defect of Inhibition Control 

DH Direction of Hostility 

Dn Denial 

Ec Prison Escape 

Em Emotional Immaturity 

Eo Ego Overcontrol 

ES Ego Strength 

Ex Extraversion 

FTI Frustration Tolerance Index 

GH General Hostility 

HC Habitual Criminal 

He Heroin 

Hsx Homosexual 

Hy2 Need for Affection and Reinforcement 
subscale 

Hy3 Lassitude-Malaise subscale 

In Inner Maladjustment 

Jh Judged Manifest Hostility 

Mal Amorality subscale 

Маг Psychomotor Acceleration subscale 

Ma3 Imperturbability subscale 

Ma4 Ego Inflation subscale 

мл Personal and Emotional Sensitivity 
subscale 

Mf5 Denial of Masculine Occupations 
subscale 

Mp Positive Malingering 

0-H Overcontrolled Hostility 

Pal Ideas of External Influence subscale 

Pa? Poignancy subscale 

PAS Prison Adjustment 

PaV Parole Violator 

Pdl Family Discord subscale 

Pd2 Authority Conflict subscale 

Pd4A Social Alienation subscale 

PM Prison Maladjustment 

Pq Psychotic Tendency 

Pr Prejudice 

R Repression 

Re Responsibility 

Rmn Recidivism-Rehabilitation 

R-S Repression-Sensitization 

Sc1A Social Alienation subscale 

Sc2A Lack of Ego Mastery-Cognitive subscale 

SD Sensorimotor Dissociation 
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Table 3 lists all of the abbreviations and 
corresponding names of the experimental 
scales and the conventional clinical scale sub- 
scales that were among the significant find- 
ings of the studies reviewed. 

(See Butcher & Tellegen, 1978, for addi- 
tional recommendations concerning the meth- 
odological design of MMPI research in gen- 
eral.) 


Review 


The preceding methodological evaluation 
revealed several methodological shortcomings 
within the designs of the studies to be re- 
viewed. As a result, the interpretation and 
generalization of the present findings are re- 
stricted. These studies can best be viewed as 
indicative of the potential general usefulness 
of the MMPI in many different areas of 
prison work rather than a final judgment 
of its value to corrections. The findings be- 
low will perhaps serve as starting points for 
better controlled studies, which should at- 
tempt to cross-validate and extend these 
results. 

The present section reviews the studies in 
the following three main categories: The 
MMPI As a Primary Differentiator of Devi- 
ant Behavior, the MMPI As a Predictor of 
Deviant Behavior, and MMPI Differences 
Across Race and Sex. The first category is 
broken down into five sections: hostile-as- 
saultive, first offenders versus recidivists, 
sexual deviancy, addictions, and classification 
of psychopathologic behavior. The second 
category is broken down into three sections: 
recidivism, prison escape, and institutional 
adjustment, Finally, the third category is 
broken down into two sections: racial differ- 
ences and sex differences, 


The MMPI Asa Primary Differentiator of 
Deviant Behavior 


Hostile-assaultive. The most extensive 
work in this field has centered around the 
development and validation of the Over- 
controlled Hostility (0-H ) scale by Megar- 
gee and his associates, Initially Megargee 
and Mendelsohn (1962; 2) attempted to dif- 
ferentiate between assaultive and nonas- 
saultive criminals using 12 relevant experi- 
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mental scales. No scale correctly isolated the 
assaultive group in the predicted manner,\ 
but the authors noticed one surprising trend: 


the following hypothesis: 


The extremely assaultive person is often a fairly 
mild-mannered, long-suffering individual who buries 
his resentment under rigid and brittle controls, 
Under certain circumstances he may lash out and 
release all his aggression in one, often disastrous, 
act. Afterwards he reverts to his usual overcon- 
trolled defenses. Thus he may be more of a menace 
than the verbally aggressive, “chip-on-the-shoulder,” 
type who releases his aggression in small doses. 
(p. 437) 


Megargee, Cook, and Mendelsohn (1967; 2) 
explored this hypothesis in a complex study 
consisting of the generation of the 31-item 
O-H scale and two cross-validation attempts. 
They found that the scale seemed to identify 
the co-occurrence of two usually incompat- 
ible personality constructs, impulse control 
and hostile alienation, They felt that a high 
O-H scale score indicated “а conflict be- 
tween strong aggressive impulses and strong 
inhibitions against the expression of aggres- 
sion" (p. 528), which can manifest itself 
either in explosive violent acts or psychosis. 
They concluded that although the genera- 
tion of an all-purpose assaultiveness scale 
from the MMPI seems unlikely, the 0-H 
scale is capable of identifying a subgroup of 
assaultive criminals who are of this overcon- 
trolled type. Megargee et al. (1967) did not 
advocate a specific cutting score. The best 
cutting score for their data identified 85.790 
of the “extremely assaultive" type, producing 
43.2% false positives. 

Deiker (1974; 1) examined the O-H scale 
along with the conventional scales and 20 
other experimental aggression scales on thé 
MMPIs of four experimental groups: homi- 
cide, battery, threat, and a control group. 
Differences on 17 of the experimental xa 
were significant, but only 4 scales Wo 
significance in the predicted direction: 0-2, 
Ego Overcontrol (Eo), Direction of Hon 
tility (DH), and Denial (Dn). Significa? 
differences were also observed on the F, А» 
Pd, Pt, Sc, and Ma scales, with the contro | 


The protocols of the nonviolent group made 
them appear less controlled or more hostile 
than the aggressive groups. They suggested 
р 
y 
Y 


» 
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group having the highest elevations on all 
except the K scale. Deiker concluded that 
although his results seemed to support Me- 
gargee (1966), a negative response bias hy- 
pothesis accounted for the results equally well. 
(The O-H scale has 21 items keyed false and 
10 items keyed true.) Megargee and Cook 
(1975; 2) responded to this criticism by 
constructing two shortened and two length- 
ened O-H scales, all with equal numbers of 
scorable true and false items, They reana- 
lyzed their initial set of protocols and found 
that one of the lengthened scales was actu- 
ally a better discriminator than the original 
O-H scale, thereby refuting the negative- 
response bias hypothesis. Neither of these 
studies used cutting scores, instead they com- 
pared group means. 
Davis and Sines (1971; 1) have discovered 
a profile configuration that seems to be asso- 
У, ciated with hostile-aggressive acting-out be- 
havior, (These 4-3 studies are the sole ex- 
ceptions to the prototypical research design 
included in this review, since their approach 
resembles that of Megargee et al, 1967. 
However, the importance and relevance of 
these findings demanded their inclusion in 
this review.) They concisely defined the 4-3 
configurational prototype (profile peak on 
the Pd scale with the second highest eleva- 
^ tion on the Hy scale) and examined the dif- 
ferences between these profiles and a control 
Broup's profiles in each of three settings: a 
state hospital, a prison, and a medical center. 
A behavioral pattern of hostile-aggressive 
outbursts in usually quiet men consistently 
emerged across the three settings. The 
authors pointed out that this behavioral pat- 
tern is similar to Megargee's O-H type and 
; Gilberstadt and Duker's (cited in Davis and 
7) Sines, 1971) “4-type,” and they concluded 
MP that the consistency of their findings both 
across their samples and with previous work 
Constitutes strong evidence of the proto- 
iype's validity. They speculated that 4-3 
Profile types have a constitutional predispo- 
sition toward this behavior pattern: 4-3 types 
Seem to be controlled by a cyclical internal 
mechanism that periodically causes acute 
emotional and behavioral disturbances and 
seems impervious to conventional treatment 
methods. Persons and Marks (1971; 1) suc- 
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cessfully replicated these results, noting the 
significantly higher incidence of violent 
crimes by 4-3 types when compared to three 
of the most common MMPI code types in 
prison, Davis (1971; 1) also obtained similar 
results with a female inmate population. 
The nature of these studies precluded the 
determining of false-positive rates. 

Two other studies were found that inves- 
tigated aggression in prisoners. Blackburn 
(1968; 1) found that his “extremely assaul- 
tive” group scored significantly higher than 
his “moderately assaultive" group on the 
L, K, R, Eo, and Dn scales and significantly 
lower on the F, Pd, Ma, Extraversion (Ex), 
and General Hostility (GH) scales. Even 
though the O-H scale was not yet available 
to him for this study, Blackburn concluded 
that his findings supported Megargee’s (Me- 
gargee, 1966) overcontrolled hostility hy- 
pothesis. Carroll and Fuller (1971; 1) com- 
pared nonviolent, violent, and sexual of- 
fenders and found that the nonviolent group 
appeared to be hostile and confused in think- 
ing and displayed the most deviant profiles. 
These nonviolent subjects were found to be 
significantly higher than the other two groups 
on the F, Sc, and Ma scales. Their findings 
would also seem to support Megargee's hy- 
pothesis; it is unfortunate that the O-H 
scale was not included in their analysis. 

The potential value of these important 
findings is self-evident. The identification of 
potentially violent inmates with the help of 
the MMPI would aid in treatment plans, 
administrative decisions, and parole con- 
siderations. As Megargee and Mendelsohn 
(1962) point out, the detection of assaultive- 
ness of the overcontrolled variety can be 
a difficult task, since these types usually ap- 
pear so passive and mild-mannered. The 
cyclical nature of the 4-3 prototype may also 
be initially deceptive. Even though not all 
violent criminals are identified by these two 
major indices, these indices seem to isolate 
those types who may not be seen as as- 
saultive until a harmful outburst occurs. 
Although further cross-validation is needed, 
the already demonstrated discriminative 
powers of both the O-H scale and the 4-3 
prototype are encouraging. However, inves- 
tigation of false-positive rates for both in- 
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dices must be pursued. Comparisons of high 
O-H scale profiles and 4-3 profiles should be 
carried out to determine if they are measur- 
ing essentially the same thing. 

First offenders versus recidivists. A few 
studies have attempted to differentiate first 
offenders from recidivists with MMPIs ob- 
tained after the recidivists had returned to 
prison. Dunham (1954; 1) found that the D 
and Pd scales were significantly higher for 
recidivists, whereas Stanton (1956; 2) found 
that the Pd and Ma scales were higher. Pan- 
ton (1959a; 2) found no significant differ- 
ences between first offenders and recidivists. 
Others who have not restricted themselves to 
the conventional MMPI scales have come up 
with somewhat more meaningful results, 
Panton (1962b; 1) found that recidivists 
were significantly higher on the Pd, Ma, and 
Prison Adjustment (4р) scales and combined 
the Pd and Ap scales to form the new Habit- 
ual Criminal (HC) scale. However, the re- 
sults of his cross-validation (which blocked 
on age and number of prior offenses) were 
not compelling, and he noted that the scale 
seemed to get less effective as the number of 
prior offenses decreased, Pierce’s (1972a; 1) 
attempt at cross-validation of the HC scale 
succeeded in effectively differentiating first 
offenders from recidivists. (He did not select 
a cutoff score, however.) Adams (1976; 2) 
also reexamined the HC scale and found that 
it effectively differentiated recidivists from 
first offenders, as did its parent scales Pd and 
Ap. Flanagan and Lewis (1974; 2) found 
that offenders with juvenile records scored 
significantly higher than “absolute first of- 
fenders” on the F, Pd, Pa, Sc, and Ma scales 
and significantly lower on the Responsibility 
(Re) scale. Christensen and LeUnes (1974; 
1) used the Prison Adjustment Scale (PAS) 
in addition to the conventional scales, but 
they failed to achieve any significant differ- 
entiations with respect to recidivism, 

These results are of little direct predictive 
usefulness due to their Post hoc nature. 
Observed differences within this paradigm 
cannot be equated with -differences extant in 
future recidivists before their initial release 
from prison. The most consistently signifi- 
cant findings across these studies showed 
recidivists to have relatively higher evalua- 
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tions on the Pd and Ma scales, but these are ` 
common peaks for prison populations that 
are frequently above a T score of 70. There- 
fore, these results seem to offer no dynamic 
insights that could be useful in treatment 
approaches. The only possible utility the HC 
scale might have is if it could predict recid- 
ivism before the fact. In short, little use- 
ful information seems to be contained in 
these studies. (More directly predictive 
studies are examined in the section on Recid- 
ivism.) 

Sexual deviancy. Work in this area has 
centered around the identification of homo- 
sexuals. Panton (1960b; 2) found the con- 
ventional Mf scale to be ineffective for this 
purpose, so he generated the Homosexual 
(Hsx) experimental scale. This scale identi- 
fied 81% of the homosexuals and 87% of 
the nonhomosexuals in his initial sample and 
86% of the homosexuals and 81% of the A 
nonhomosexuals in his cross-validation sam- 
ple. Pierce (1972b; 1) found that the Hsx 
Scale identified 94% of his “active homo- 
sexual" group, 100% of his “situational 
homosexual” group (heterosexual before 
coming to prison), and 100% of his com- 
parison group. A second study by Pierce 
(1973; 1) that used the Hsx scale differen- 
tiated 81% of his active homosexual group 
and 81% of his situational homosexual group ^ 
with a cutoff score of 9.8, and scores on 
the scale remained relatively stable on a re- 
test 1 year later. Another study by Pierce 
(Note 4; 1) that used both the Hsx scale 
and the M/ scale found that mean scale 
score differences for both scales successfully 
differentiated his’ active homosexual group 
from his situational homosexual and non- 
homosexual groups, whereas neither scale 
successfully differentiated the latter two 
groups from each other. However, Cubitt 
and Gendreau (1972; 1) failed to effectively 
differentiate their homosexual group with the 
Hsx scale, even though other scales did sig- 
nificantly differentiate (ie., the Hs, D, Н» 
Mf, Pa, and both of Manosevitz’s abridged 
Mf scales [Manosevitz, 1970]). They note 
that their homosexual group was significantly 
older and that Hsx scores demonstrated а 
significant positive correlation with age; this 
should have exaggerated the power of 
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Hsx scale to detect homosexuals in their 
ample. Cubitt and Gendreau conclude that 
the validity of the Hsx scale is “limited” but 
that the Mf scale (and Manosevitz’s abridged 
versions) effectively discriminates homosex- 
uals from heterosexuals, Panton’s (1978; 1) 
most recent study effectively discriminated 
prior-to-incarceration homosexuals from a 
baseline prison sample, finding that homo- 
sexuals scored significantly higher on the 
Mf, Pa, Sc, Ma, Hsx, and Frustration Toler- 
M sance Index (FTI; Beall & Panton, Note 5) 
scales. These results led Panton to conclude 
that homosexuals exhibit “an implied greater 
social alienation and weaker impulse control 
characterized by a more likely acting-out to 
stress and frustration than is indicated for 
the population sample as a whole” (p. 11). 
Panton also found significant differences 
among demographic variables and Sex Inven- 
a ,tory (Thorne, 1966) scales. He offered the 
overall conclusion that the homosexual enter- 
ing prison will probably pursue his homo- 
sexual inclinations but will not necessarily 
become sexually assaultive in these pursuits. 
Two other studies were found that ex- 
amined group differences on conventional 
MMPI scales. Oliver and Mosher (1968; 1) 
compared a group of heterosexuals with 
homosexual ‘‘inserters” and homosexual “in- 
*sertees," There were no significant differ- 
ences between the two homosexual groups 
(possibly due to a small №), but the in- 
Sertees were significantly higher than the 
heterosexuals on the Hs, Hy, Pd, Mf, and Pt 
Scales, whereas the inserters were signifi- 
cantly higher on the F, Hs, D, Hy, Pd, Pa, 
Pt, and Sc scales. McCreary (1975; 1) com- 
pared child molesters with previous offenses 
1 to first-offense child molesters and found that 
л Previous offenders scored significantly higher 
Ж? on the Pd, Pd2, Hs, Hy, and Sc scales. 
Although the MMPI appears to be sensi- 
tive to differences between homosexuals and 
heterosexuals, no one specific indicator was 
Consistently effective across all studies. 
Cubitt and Gendreau’s (1972) study was 
conducted in a Canadian prison, and as a 
result cultural differences may partially ex- 
plain the failure of the Hx scale to differ- 
entiate in their sample. Manosevitz's abridged 
_ Mf scales need to be examined further with 
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prison populations. A number of the previ- 
ously mentioned studies (Oliver & Mosher, 
1968; Panton, 1978; Pierce, 1972) discuss 
the several types of disciplinary problems 
centering around the homosexual inmate (in- 
cluding fighting, homosexual seduction and 
rape, and attempted suicides or escapes to 
avoid homosexual demands), which drama- 
tize the need to identify types of homosexual 
inmates as soon as possible for effective and 
beneficial treatment approaches and admin- 
istrative decisions. Due to a lack of truly 
accurate figures for the incidence of inmate 
homosexuality as well as the possible human 
expense resulting from false negatives in 
chance prediction, Meehl and Rosen's (1955) 
base-rate prescriptions seem to be inappli- 
cable here. All of the above research was con- 
ducted after homosexual behavior had al- 
ready been detected in the criterion groups, 
which seems to be a sensible approach con- 
sidering the relative stability of homosexual 
behavior over time. However, more research 
of a longitudinal nature is needed, prefer- 
ably combining MMPI protocols obtained 
on admission with careful follow-up proce- 
dures to maximize the predictive utility of 
the MMPI in coping with problems caused 
by inmate homosexuality. Distinctions be- 
tween types of homosexuals such as those 
made by Oliver and Mosher could prove to 
be valuable in assisting future predictive 
studies to identify homosexual subtypes who 
may instigate some of the disciplinary prob- 
lems previously mentioned. Other types of 
sexual deviancy (such as sexual offenders) 
have yet to be explored to any significant 
extent, but the MMPI may also prove to be 
useful in these areas. 

Addictions. A few studies have examined 
drug addiction by limiting themselves to the 
conventional MMPI scales. Gendreau and 
Gendreau (1970; 1) failed to find any sig- 
nificant differences between their groups of 
heroin addicts and nonaddicts and concluded 
that the “addiction-prone” theory (which 
seeks to ascribe specific personality traits to 
addicts) is an inappropriate approach to the 
problem. Panton (1977; 2) compared drug 
dealers to drug abusers and found that deal- 
ers scored significantly higher on the Ma, 
Amorality (Ma1), Psychomotor Acceleration 
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(Маг), and Ego Inflation (Ma4) scales and 
significantly lower on the Hs, D, Hy, Pt, and 
Si scales. He concluded that drug dealers are 
a more difficult management and treatment 
problem than drug abusers. 

Cavior, Kurtzberg, and Lipton (1967; 1) 
compared heroin addicts to nonaddicts and 
generated the Heroin (He) experimental 
scale that when applied to two cross-valida- 
tion samples, correctly classified 75% of the 
overall adult sample and 67% of the overall 
adolescent sample. As part of a larger study, 
Panton and Brisson (1971; 2) compared 
drug abusers with nonusers and found that 
the drug group scored significantly lower 
on the Aggravated Sex (Asx) scale, signifi- 
cantly higher on the Hy, Pd, Mf, Sc, Ma, 
Pa, Ec, and HC scales and significantly 
higher on the Need for Affection (Hy2), 
Lassitude-Malaise (Hy3), Familial Discord 
(Pd1), Authority Problems (Pd2), Social 

` Alienation (Pd4A), Personal and Emotional 
Sensitivity (M/1), Denial of Masculine Oc- 
cupations (Mj5), Social Alienation (Sc14), 
Lack of Ego Mastery-Cognitive (Sc24), and 
Imperturbability (Маз) subscales, These re- 
sults tended to support the several findings 
they had made through their examination 
of their other sources of data. Panton and 
Brisson also generated the Drug Abuser 
(DaS) experimental scale, which identified 
75.4% of the drug abusers and 81.4% of 
the nonusers in their initial sample and 
75.8% of both groups in a cross-validation 
sample. Panton and Behre (1973; 1) com- 
pared drug addicts to abusers without addic- 
tion on the conventional MMPI scales, 11 
experimental scales (not including He or 
DaS) and several demographic variables, 
Contrary to Panton and Brisson's findings, 
they found no significant differences on the 

IMPI (several demographic variables did 

ifferentiate, however), The authors con- 
cluded that 


the MMPI results . . . support the contention of 
Gendreau and Gendreau that there is no such diag- 
nostic identity as the addition-prone Personality, 
and that imprisoned narcotics addicts do not neces- 
sarily have unique Personality traits, as measured 
by the MMPI which predispose them toward the 
special effects of heroin, or distinguish them from 
non-addicted imprisoned drug abusers, (p. 416) 
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Panton (1972; 1) also investigated 
validity of three alcoholism scales in a pris 
population. In comparing a prison alcoholi 
group with a prison nonalcoholic group an 
a normal nonalcoholic group, he found that 
the Alcoholism (AZ) scale successfully dif- 
ferentiated the prison alcoholics (65.8% hi 
rate) from the prison nonalcoholics (also 
65.8% hit rate). The second Alcoholis 
(Am) scale only differentiated the norma 
group from the prison groups, and the thir 
Alcoholism (АЛ) scale failed to achieve any, 
differentiation, Panton concluded that evi 
the А! scale seems to be affected by a mori 
general factor of sociopathy, however. 
part of his larger study, Stanton (1956; 
the section on First Offenders Versus Recid. 
ivists) compared alcoholics with narcotics 
addicts and nonaddicts and found that alco- 
holics scored significantly higher on the Pd 
scale than either addicts or nonaddicts. He 
found no differences between the addict and 
nonaddict groups. 

These sparse results are difficult to com- 
pare, since the criteria for formation of the 
criterion groups vary widely across the 
studies. Furthermore, little practically useful 
predictive potential is apparent, and the 
addiction-prone personality theory has not 
received any support. More reliable and ex- 
pedient means of identifying addicts are al- 
ready at the disposal of correctional person- 
nel, so the application of experimental MMPI 
scales seems unnecessary. The MMPI might 
be able to identify prognostically favorable 
signs that could aid in the treatment of 
addicts, but evidence of such indicators does 
not presently exist. In short, the MMPI has 
not been shown to offer any practically sig- 
nificant insights into the addiction problem 
that cannot be more reliably gained from 
other sources; a productive use of the MMPI | 
in this area has yet to be demonstrated. — 

Classification of psychopathologic bekavitr. 
Some studies have investigated the extent t0. 
which the MMPI can differentiate betwee? 
different types of psychopathologic behavior: 
Gynther (1962; 1) examined the perl 
ance of 12 crime category groups of cour 
referred patients on the F scale and соба 
cluded that larger F values indicate | 
subjects, Craddick (1962; 1) extracted ^ 
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group of psychopaths and a group of non- 
psychopaths from a larger pool of prisoners 
fon the basis of scores on a checklist of psy- 
| chopathic characteristics that he filled out 
— for each prisoner. He found that the psy- 
chopaths obtained significantly higher scores 
on the Pd, Pt, and Ma scales. Johnston and 
Cooke (1973; 1) separated one pool of 
prisoners into four pairs of mutually over- 
apping groups (aggressive vs. nonaggressive, 
"maximum security vs. nonsecurity placement, 
escape precaution vs. no precaution, and alco- 
оте diagnosis vs. nonalcoholic diagnosis) 
on the basis of behavioral records and clinical 
judgments. They found no significant differ- 
“ences between any of these pairs on the Ah, 
c, Hc, or Recidivism (Rc) experimental 
"scales. 
Several studies have used and expanded 
- on theoretical classifications like Lindesmith 
; and Dunham's (cited in Randolph, Richard- 
(son, & Johnson, 1961) socialized versus in- 
' dividualized juvenile delinquent types. Ran- 
- dolph et al. (1961; 2) paraphrased Linde- 
- smith and Dunham's descriptions of these 
types as follows: “Тһе socialized criminal is 
one who commits crimes that are supported 
and prescribed by his culture, so that, by 
committing a crime, the criminal gains in 
Status and recognition" (p. 293). On the 
‘other hand, “the individualized criminal . . . 
# acts for reasons that are personal and pri- 
vate. He commits his crimes alone and, in 
theory, is a stranger to others who commit 
Similar crimes" (p. 293). Furthermore, "the 
socialized criminal seems likely to be a rather 
normal person," whereas “the individualized 
criminal, at odds with his own primary 
group, seems likely to be an individual whose 
criminality is merely symptomatic of deeper 
psychological pressures" (p. 293). Randolph 
1) al. compared groups of these two types on 
the conventional scales and found that al- 
though profile configurations were similar, 
the solitary delinquents scored significantly 
higher on all clinical scales except the Ma 
cale. Wilcock (1964; 1) added an aggres- 
ive socialized group to his samples approxi- 
та пр the above two groups (this new group 
Was described as combining certain elements 
. Of the first two groups) but merely found 
that the individualized group scored signifi- 
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cantly higher than the other two groups on 
the Hs and Hy scales. (A small N may par- 
tially account for this lack of significance.) 

Shinohara and Jenkins (1967; 3) used 
Jenkins’ classifications (the first two of 
which are highly similar to Lindesmith and 
Dunham’s two basic types) of socialized ver- 
sus unsocialized aggressive versus runaway 
delinquents. This latter type was character- 
ized “by repeatedly running away from home 
overnight, by staying out late at night, by 
stealing in the home . . . and by stealing 
which is furtive rather than aggressive” 
(Shinohara & Jenkins, 1967, p. 157; italics 
theirs). They examined differences between 
these groups on the conventional scales and 
8 experimental scales, They found that the 
unsocialized aggressive group scored signifi- 
cantly higher than the socialized group on 
the F, Hs, D, Pd, Pa, Sc, and Anxiety (At) 
scales, whereas the runaway group scored sig- 
nificantly higher than the socialized group on 
the F, Hs, D, Hy, Pd, Mf, Pt, Sc, At, and 
Asocial (45) scales. They also found that the 
runaway group scored significantly higher 
than the unsocialized aggressive group on 
the Mf scale but scored significantly lower 
on the Pa scale. They concluded that the 
socialized group showed much less psycho- 
pathology and that their delinquency was 
*typically adaptive goal-oriented motivation 
behavior," whereas the other two groups 
showed more psychopathology and their de- 
linquency was "frustration behavior rather 
than adaptive behavior" (pp. 161-162). 
Tsubouchi and Jenkins (1969; 3) attempted 
to validate and extend these findings using 
the conventional scales and 10 experimental 
scales. They found that their unsocialized 
aggressive group scored significantly higher 
than the socialized group on the #, Hs, D, 
Pd, Pt, Emotional Maturity (Em), and HC 
scales, whereas the runaway group scored. sig- 
nificantly higher than the socialized group 
on the F, Pd, Sc, Pa, and Em scales. They 
also found that the unsocialized aggressive 
group scored significantly higher than the 
runaway group on the Hs scale. They con- 
cluded that the findings of Shinohara and 
Jenkins were essentially supported. In addi- 
tion, they generated an experimental scale 
(unnamed) that separated their “motiva- 
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tion" delinquents (the socialized group) from 
their “frustration” delinquents (the other 
two groups). 

Gregory (1974; 1) took a unique ap- 
proach to this problem by generating sequen- 
tial linear-sums formulas to identify delin- 
quents who had been classified as psycho- 
pathic, adjusted, or neurotic on the basis of 
clinical graduate student checklist ratings of 
commitment summaries. Of Gregory's over- 
all sample, 63% was classifiable by these 
three formulas, which used 7 scores on stan- 
dard scales (e.g., Psychopathic Index [PI] 
= F + 2Pd + Ma —2K — Pt — 28i — 40). 
Gregory suggested that this approach to 
classification may be more efficient in gen- 
eral than the three other approaches to pro- 
file typing that he inspected and discarded 
as inadequate. (The three types are clinically 
derived code types, two-point techniques, 
and the “D? method.") Although Gregory 
emphasized the need for replication, no other 
studies were found that addressed his 
findings. 

The work based on theoretical classifica- 
tions such as Lindesmith and Dunham’s 
(cited in Randolph et al., 1961) seems to be 
the most promising approach in this area at 
this point. Although criteria for inclusion 
into the different criterion groups varied 
across the studies, three out of four studies 
found extensive significant differences that 
supported their theoretical hypotheses. The 
pursuit of these findings could be profitable 
in maximizing the effectiveness of differential 
treatment approaches to these various group- 
ings, More stable prima facie indicators de- 
tived from the MMPI for these criminal 
types would be a desirable extension of these 
findings; the investigation of Tsubouchi and 
Jenkins’ (1969) experimental scale and the 
generation of other experimental scales to- 
ward this end seems to be the most advisable 
approach. The possible presence of prog- 
nostic MMPI indicators for these groups 
should also be examined. Gregory’s sequential 
linear-sums approach demands further in- 
vestigation, both with respect to his specific 
findings and to the general utility of his 
linear-sums methodology. The work of Me- 
gargee (1977) and his associates appears 
to be a more extensive and comprehensive 
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_ system, but perhaps Megargee's empii 


effort at this type of classificatory M " 
approach could be productively interfae 
with the theoretical and empirical approach 
previously described. 


The MMPI As a Predictor of 
Deviant Behavior 


Recidivism. These studies are disi 
guished from those in the First Off 
Versus Recidivists section in that they comm 
pare the MMPIs of recidivists and по 
recidivists that were administered befo 
each subject’s initial release from  prisol 
Instead of a post hoc analysis seeking po 
sible treatment approaches, these studié 
concentrate on using the MMPI as a reliab 
predictor of future recidivists. The base-rafi 
predictive guidelines of Meehl and Rosel 
(1955) are particularly relevant here. Frani 
(1971; discussed later) cites the following 
Task Force on Corrections (1967) findings! 
"The best current estimates indicate that 
among adult offenders, 35 to 45 per cent 0 
those released on parole are subsequentl 
returned to prison" (p. 3). Therefore, an 
study in this area that attempts to clai 
any practical significant value must co 
fortably exceed the 65% overall hit га | 
that could be realized by merely predicti 
that no released inmate will return to prison 
The rate of false positives (those identifie 
by the MMPI as recidivists who are аса! 
nonrecidivists) must also be weighed agains 
any gains over this chance expectancy figure 
(which creates false negatives only). 

Only one study was found that confi 
itself to the examination of conventional 
MMPI scales. Mack (1969; 3) found nof 
significant differences between his group o0 
recidivists and parole successes. Other studi 
have attempted to generate or employ € i 
perimental scales as predictive indicators 0 
recidivism. As part of a larger study, 6 
et al. (1965; 1) examined the conventio 
scales and the Anxiety (A), Repression 
and Ego Strength (ES) scales, and Mu 
that only the Ma scale was significa 2) 
higher for recidivists. Panton ino 
generated the 26-item Parole Violator ( d 
scale that identified 80.5% of the vio% 
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and 80.5% of the nonviolators in his initial 
sample (an overall hit rate of 80.5%, with 
12.7% false positives) and 78.6% of a cross- 
validation sample consisting solely of viola- 
|| tors. He also found that nonviolators scored 
‘significantly lower than violators on the Hs, 
D, Hy, Pd, Pa, Pt, Sc, and Ma scales and 
significantly higher on the Mj scale. 

A major work in this area is the disserta- 
‘tion of Black (1967; 2). He examined 15 
experimental scales as well as the conven- 
‘ional scales and found that no single scale 
‘chieved a significant differentiation between 
"Ms groups of recidivists and nonrecidivists. 
"However, he took two of the scales (Si and 
IC) that had the highest correlations with 
the criterion, selected items that the two 
“groups had responded to differentially five 
| “or more times, eliminated overlapping items, 
and called the results the Recidivism-Re- 
ју habilitation scale (Rmn; inspection reveals 

that this scale does not share any items with 

Panton's PaV scale). This 22-item scale ini- 

tially identified 8896 of the recidivists and 

8496 of the nonrecidivists in his sample. 

Attempting to improve on these results, 

Black then constructed the Rmn index, 

which yields a score of 1 point for each of 

the following scale elevation criteria: Si 
scale T score less than 54, HC scale T score 
A greater than 58, Rmn scale T score greater 
than 50. Scores of 0 and 1 on this index 

were found to predict rehabilitation, and a 

Score of 3 was found to predict recidivism, 

With a score of 2 being indecisive. However, 

Black found that the arithmetical difference 

between the 4 and R experimental scales suc- 
"cessfully identified 80% of the two-score 

.recidivists with differences of 8 or less and 
ў 75% of the two-score nonrecidivists with 

differences greater than 8. This combined 
bsysteri achieved an overall predictive accu- 

тасу of 90%, with only a 7.8% rate of false 
positives. Black organized all of his findings 
into the Recidivism-Rehabilitation Inven- 
ory, which presents the following scales in 
i format modeled after the conventional 
‘MPI profile sheet: L, F, К, Si, HC, Rmn, 
› К, PaV, Pd, Ma, Ec, and ES. 
'A dissertation by Frank (1971; 2) sought 
to cross-validate these findings. He found 


that the Rmn scale alone was the best pre- 
E 
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dictor for his sample, identifying 75.596 of 
the parole successes and 68% of the recidiv- 
ists (a 73.1% overall hit rate, with 15.9% 
false positives). The Rmn index identified 
80.0% of the successes and 54.4% of the 
recidivists (an overall hit rate of 71.096 
with 13% false positives), and Frank found 
that the major cause of misclassifications 
seemed to be the A scale minus R scale 
difference for Rmn index scorers of two. 
Frank points out that Black’s subjects were 
tested just prior to release, whereas his sub- 
jects were tested on their initial admission 
to prison, and he admits that the failure 
of his testing procedure to include the effects 
of incarceration may have compromised the 
predictive powers of the Rmn scale and in- 
dex. He found that no single item of the 
Rmn scale was a more efficient discriminator 
than the scale as a whole, and this led him to 
tentatively conclude that “the Recidivism- 
Rehabilitation scale taps a recidivist ‘syn- 
drome,’ a personality dimension with overt 
MMPI response tendencies” (pp. 37, 39). 

Two studies reviewed elsewhere in this 
article make passing reference to the pre- 
diction of recidivism. Wattron (1963; see 
the section on Institutional Adjustment) 
found that his 72-item Prison Maladjust- 
ment Scale (PM) identified 68% of his 
recidivists and 69% of his successful parolee 
group (an overall hit rate of 68.7%, with 
20.1% false positives. Inspection reveals only 
a four-item overlap between the PM and 
Rmn scales: Items 56, 118, 216, and 469 all 
scored true), Davis and Sines (1971; see 
the Hostile-Assaultive section) noted that the 
12 men in their sample with 4-3 profiles 
who were subsequently paroled all violated 
parole within 1 year of release. 

The only study that showed an appreci- 
able gain over chance prediction in percent- 
age of overall hit rate with a corresponding 
smaller percentage of false positives was 
Black's (1967) work on the Rmn scale and 
index. The potential value of these findings 
cannot be assessed at this time, but they 
demand attempts at cross-validation, since 
even the shrinkage in accuracy that is ex- 
pected would still allow a significant im- 
provement over chance prediction. Black’s 
work also includes an extensive review of the 
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literature on recidivism, and he effectively 
integrates his findings with the theories and 
research findings of others in the field. The 
implications for both the successful identifi- 
cation of future recidivists and the formula- 
tion of more effective treatment approaches 
for this particular group are far-reaching 
indeed. Frank's (1971) attempted cross- 
validation of Black's findings was disap- 
pointing, but it was compromised by the 
difference from Black’s procedure in the 
timing of MMPI administration; it seems 
reasonable to assume that MMPIs adminis- 
tered just prior to release would probably be 
better discriminators of recidivism than 
ММРІ5 given on initial admission to prison. 
(In fact, Table 2 reveals that Black's study 
was the only study reviewed that adminis- 
tered all MMPIs just prior to release.) Also, 
success by Frank would not have greatly 
extended the generalizability of the Rmn 
scale and index anyway because he conducted 
his replication in the same state as Black's 
original work (Oklahoma). Further cross- 
validations that replicate Black's methodol- 
ogy more precisely must be conducted in 
other parts of the country before the value 
of Black's findings can be adequately as- 
sessed. 

Panton's PaV scale showed some gain in 
percentage of accurate prediction while pro- 
ducing a corresponding smaller percentage 
of false positives; future studies may also 
profit by considering the PaV scale along 
with Black's Rmn scale and index, Attempts 
at cross-validation must incorporate the vital 
requirement of adequate follow-up to mini- 
mize contamination of their nonrecidivist 
sample with “recidivists-to-be.” Frank cites 
evidence that indicates that over 80% of 
recidivists return to prison within 2 years 
of release, so it would seem that a thorough 
2-year follow-up would be a minimum re- 
quirement for such studies. In summary, it 
would seem that the predictive powers of the 
MMPI for the identification and treatment 
of potential recidivism with the use of the 
Rmn scale and index appear to be promising 
but need further study. 

Prison escape. Once again, Meehl and 
Rosen’s (1955) criteria constitute the focal 
point of this evaluation, Shupe and Bram- 
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well’s (1963; see the section on Methods 0 
Profile Interpretation) 5% escape figure wilt 
be used to estimate the effectiveness of pre 
dicting escape risk, since the studies dis 
cussed in this section did not supply escap 
base rates for their particular prison popu 
lations. Therefore, a successful escape pré 
dictor should exceed a 95% overall hit rat 
with a minuscule incidence of false positives 
This is a tall order indeed. 

All work in this area has centered arount 
the 42-item Ec scale that was generated Б 
Beall and Panton (1956; 1) from a sampl 
tested after escape attempts had alrea 
occurred. This scale identified 76.7% of 
escapees and 73% of the nonescapees (a 
overall hit rate of 73.2%, with 25.6% fals 
positives) in the original sample and 77.2% 
of the escapees and 78.3% of the nones 
capees (an overall hit rate of 78.2%, with 
20.6% false positives) in the cross-validaz 
tion sample. Pierce (19715; 1) emplo: 
alternative cutoff scores for two pairs oh 
groups, the first pair with a criterion group) 
having a record of escape prior to the MMP 
administration and the second pair with 
criterion group having attempted escape sub- 
sequent to MMPI administration. Using the 
cutoffs that achieved the greatest dichotomy, 
the Ec scale identified 80% of the prio 
escapees and 82% of the nonescapees (aff 
overall hit rate of 81.9%, with 17496 false 
positives) in the first pair, whereas In Е 
second pair the Ec scale identified 50% 0 
the future escapees and 76% of the "d 
escapees (an overall hit rate of 74.7%; wit 
22.8% false positives). 

As part of a larger study, Stum 
Gilbert (1972; 1) obtained mean scores on 
the Ec scale for a group attempting escape 
prior to the MMPI and a group attempting 
escape subsequent to the MMPI and o 
pared these means to the mean Ec a 
score obtained by the general prison Ton 
lation. The group attempting escape рт 
to the ММРІ had an Ec scale mean A 
was significantly higher than the 8° an 


p and 


population mean, but the other Wu 

group was not significantly different. con 
and West (1976; 2) found no sign ane 
differences between group Ёс scale m ine 
for either of their two comparisons, 010 
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volving inmates with one escape attempt 

subsequent to the MMPI versus a no-escape 

group and the other involving inmates with 
two or more escape attempts subsequent to 
the MMPI versus a no-escape group. Panton 

(Note 6; 1; Note 7; 2) ran two comparisons 

with admissions MMPIs, one involving es- 

capees versus nonescapees and the other in- 
volving groups with three or more escapes, 
two escapes, one escape, and no escapes. In 
the first comparison, the Ec scale identified 
4749 of the escapees and 70% of the non- 
escapees (an overall hit rate of 70.2%, with 
28.5% false positives), whereas in the second 
comparison the scale identified 90.4% of 
the three-or-more-escapes group, 86.7% of 
the two-escapes group, 80.3% of the опе- 
escape group, and 71.4% of the no-escape 
group (an overall hit rate of 7296, with 

27.2% false positives). 

Even if the corresponding base escape 
rate was much higher than the 5% rate as- 
sumed here, the Ec scale would not appre- 
ciably improve prediction over chance ex- 
pectancy. Unless a high rate of false posi- 
tives is acceptable for security grade assign- 
ment, no benefit is apparent from the scale. 
Meehl and Rosen (1955) suggest that if a 
subpopulation can be isolated that has an 
у appreciably higher base rate of the criterion 
h than the overall population in question, in- 
dices that lack utility in this overall popu- 
lation may prove to be useful. The classifi- 
cations of Megargee (1977) and his asso- 
ciates might help in this respect. Perhaps 
examining the number of escape attempts 
would be more appropriate than examining 
the number of actual escapes; this would 
enlarge the base rate and possibly make the 
Ec scale a more useful tool. However, the 
) Ec scale does not seem to have any practical 
utility at the present time. 

Institutional adjustment. Several studies 
have examined groups of disciplinary prob- 
lem inmates by focusing primarily on the 
conventional MMPI scales. Driscoll (1952; 
1) created four groups of prisoners ranging 
on a continuum from “most maladjusted” to 
“most adjusted.” He found that the most 
adjusted group was significantly higher than 
any of the other groups on the D, Mf, and 
Pa scales and that the most maladjusted 
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group presented the most normal MMPI 
profiles. This led Driscoll to speculate that 
the prison environment fosters modes of 
adaptive behavior that are viewed as mal- 
adaptive behaviors outside of prison. Erik- 
son and Roberts (1966; 3) initially com- 
pared a maladjusted juvenile delinquent 
group to an adjusted group, and they found 
that the maladjusted group scored signifi- 
cantly higher on the Pd scale. They gener- 
ated a 19-item scale that differentiated the 
two groups, but two replication attempts 
failed to support any of their initial findings. 
Lefkowitz (1966; 1) compared adjustment 
“failures” with adjustment "successes" and 
found that the failure group scored signifi- 
cantly higher on the Ma scale. Twomey and 
Hendry (1969; 2) compared discipline prob- 
lem inmates with a comparison group and 
found that the discipline problem group 
scored significantly higher on the L, F, H5, 
D, Hy, Mf, Pa, Pt, Sc, and Si scales. Snor- 
tum, Hannum, and Mills (1970; 1) rated a 
group of women offenders on a continuum 
representing frequency of rule violations and 
discovered that frequency of rule violations 
positively correlated with elevations on the 
Pd and Ma scales. Two of these studies 
(Driscoll, 1952, and Snortum et al., 1970) 
obtained their MMPIs before the disciplin- 
ary problems occurred, whereas the rest were 
obtained after the research groups had been 
formed. 

None of the predictive studies discussed 
later considered the base-rate probability of 
their criterion variable, which was most 
commonly some indicator of disciplinary 
actions taken. Although these base-rate sta- 
tistics are probably not routinely gathered, 
Meehl and Rosen (1955) suggest that a 
simple analysis of available records would 
produce a suitable base rate for comparative 
purposes. It is recommended that future 
studies rectify this observed methodological 
flaw. 

Predictive studies have investigated the 
ability of experimental scales to identify po- 
tential disciplinary problems. Panton (1958; 
2) generated the 36-item Prison Adjustment 
Scale (Ap) and examined its discriminative 
powers on the profiles of two adjusted in- 
mate groups, two nonadjusted groups and 
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one severely nonadjusted group. The scale 
correctly identified 82% of each adjusted 
group, 87% and 85% of the two nonad- 
justed groups, and 93% of the severely non- 
adjusted group. Pierce (1972c; 1) compared 
inmates with two or more infractions to a 
comparison group and found that Panton’s 
cutoff scores identified 88% of the malad- 
justed inmates but produced a false-positive 
rate of 50%. Edwards (1963; 1) compared 
groups of first-offender juvenile “successes,” 
first-offender juvenile “failures,” and prison 
inmate "failures" on the conventional scales, 
the Harris-Lingoes subscales (Harris & Lin- 
goes, Note 8), 14 experimental scales (in- 
cluding Ap), and mean number of high- 
point elevations, He found that the prison 
inmate failure group scored significantly 
higher than the juvenile groups on the Sc 
scale and the Sc2A subscale but found no 
other differences. (This may be partially 
due to a small N.) 

Wattron (1963; 1) compared parole and 
maladjusted inmates and generated the 
Prison Maladjustment Scale (PM). This 
scale successfully identified 82% of the mal- 
adjusted group and 84% of the parolees in 
his cross-validation sample. Stump and Gil- 
bert’s (1972; see the section on Prison Es- 
cape) previously cited study compared the 
Ap scale scores of a group of inmates re- 
peatedly disciplined by solitary confinement 
with the Ap scale scores of a group that 
had spent no nights in solitary confinement, 
but this produced no significant differentia- 
tion. Panton (1973; 2) compared a group 
of management problem inmates with a large 
baseline inmate sample and examined the 
conventional scales, his Prison Classification 
Inventory (PCI; Panton, Note 9), modeled 
after the conventional MMPI profile sheet, 
consisted of the following conventional 
and experimental MMPI scales: L, PK; 
Ap, Ec, HC, PaV, Hsx, A, R, Pd, Defect 
of Inhibition Control (Dc&i), Sensorimotor 
Dissociation (SD), and Asx, the Harris- 
Lingoes subscales and five additional experi- 
mental scales. He found that the manage- 
ment problem inmates scored significantly 
higher than the baseline sample on the F, 
Pd, Pd2, Mf, Ma, Mal, Ap, Ec, HC, and 
Re scales, and these inmates scored signifi- 
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cantly lower on the Hs, Pt, Si, A, and R 
scales. Panton noted that the management! 
problem inmates exhibited uniformly poor 
prognostic MMPI signs (high scores on the 
Pd, Ma, and HC scales and low scores on 
the Hs, D, Pt, and A scales), and he con- 
cluded that their susceptibility to rehabilita- 
tive efforts “appears limited.” 

Sutker and Moan (1973; 1) examined 
differences on the conventional scales and 
the Ap and PM experimental scales between 
two successive pairings of groups of “ba 
actors” (severe discipline-problem inmates) 
and “no disciplines.” Their first group of 
bad actors scored significantly higher than 
the no disciplines on the F, Ma, and PM 
scales, whereas the second group of bad actors 
scored significantly higher on the F, Hy, and 
Ma scales. They noted that both no-disciplne 
groups actually scored higher on the Ap 
scale than their bad-actors counterparts, al- 
though the differences were not significant. 
Two of these studies (Panton, 1973; Sutker 
& Moan, 1973) obtained MMPI profiles 
after the criterion and comparison groups 
were formed, whereas the rest of the studies 
obtained their MMPIs before the discipline 
problems occurred. 

One study was found that examined self- 
mutilating prisoners. Panton (1962a; 1) 
compared a group of self-mutilators with a 
group of model prisoners and a group of 
infraction-nonmutilators on the convention 
scales and eight experimental scales. The in- 
fraction-nonmutilators were matched with 
the self-mutilator group “as to number and 
type of infraction and degree of exposure t 
custodial stress and pressure” (p. 63). The 
infraction-nonmutilator group scored signifi- 
cantly higher than the model prisoners group | 
on the Ap scale only. The seli-mutilum 
group scored significantly higher than the 
other two groups on the F, Pa, Pt, Sc, c 
ety Index (47), Critical Item (CD), Ind 
Madadjustment (Zn), Judged Manifest re 
tility (Jk), and Psychotic Tendency ( e 
scales and significantly lower on the pe: 
version Reaction (CR) scale. The MEM s 
lator group also scored significantly hig 
than the model prisoners group On ee. 
and Ap scales. Panton concluded that 
self-mutilators were more inclined {0% 
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compulsive outbursts of hostility, appeared 
ore anxious, expressed a greater inner 
turmoil, and appeared more inclined toward 
bizarreness in their overt resistance to strin- 
gent attempts to control their aggressiveness” 
(p. 66). Panton then generated the Self- 
Mutilator scale (SM) by adding the T 
scores of the F, Pa, and Sc scales together 
and determining a cutoff score. This scale 
identified 83.8% of the  self-mutilators, 
75.6% of the model prisoners, and 81.2% of 
Һе infraction-nonmutilators in his original 
sample. 

In this latter study, Panton (1962a) notes 
that with respect to the self-mutilators, 
“none of the group claimed they were ac- 
tuallp attempting to destroy themselves nor 
did the psychiatric examinations reveal any 
evidence in support of suicidal intentions" 
(p. 63). No other studies were found that 

jeven made reference to suicidal prisoners. 
'This lack of suicide research in prisons is 
difficult to explain, especially since attempted 
suicides constitute a dangerous problem for 
both administrative and treatment personnel. 
Generalization of suicide research results 
from other populations (e.g. mental hospi- 
tals) is an unwarranted procedure, and in 
àny case such generalization is no substitute 
for findings that are based on prison popu- 

Mlations, This important issue demands ex- 
ploratory research on the MMPI profiles of 
suicidal prisoners, since the ability to effec- 
tively identify suicidal risks in an economical 
manner would obviously be of great value to 
prison personnel. Although the overall inci- 
dence of suicidal attempts is probably small, 
Meehl and Rosen (1955) point out that their 
base-rate considerations sometimes cease to 
be relevant, especially in certain “life-and- 
. death" situations; surely a moderate rate of 

) false positives in suicidal risk prediction is 
àn acceptable price to pay for the trade-off 
In preserved human lives. 

Those studies that focused on the con- 
ventional MMPI scales have only come up 
with inconsistent and inconclusive results. 
This may be partially due to different cri- 
teria for the formation of criterion groups, 
different timing of MMPI administrations, 
and the different populations that were ex- 

5 amined. More well-controlled MMPI re- 
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search may yet offer valuable dynamic in- 
sights into the reasons behind maladjusted 
behavior in some prison inmates. Panton’s 
Ap scale has not yet proven useful in other 
parts of the country, although he has con- 
sistently obtained positive results with this 
scale in North Carolina; again, differences 
across studies with respect to timing of 
MMPI administration, criteria for the for- 
mation of criterion groups, and different 
types of prison populations may partially 
account for this lack of consistency. Wat- 
tron’s PM scale needs further research, since 
the only attempted cross-validation obtained 
significant results in just one of two com- 
parisons. Other potentially valuable indices 
such as the Harris-Lingoes subscales, Pan- 
ton's PCI, and Panton's SM scale also need 
further investigation before any conclusions 
about their worth can be drawn. This entire 
area merits further study, since effective pre- 
dictors of potential prison adjustment prob- 
lems would be invaluable to both administra- 
tive and treatment personnel. Groups differ- 
entiated along such dimensions could be 
placed within the prison so as to minimize 
potential disturbances, and treatment plans 
individually geared to such differentiated 
groups could bring an increase in positive 
rehabilitation results. Future research should 
attempt to cross-validate existing MMPI in- 
dicators and should consider the generation 
of new experimental scales that may prove 
to be more effective than existing ones. The 
lack of research on the MMPIs of suicidal 
risk prisoners constitutes a glaring omission 
in this research area that should be rectified 
immediately. 


MMPI Differences Across Race and Sex 


Racial differences. Several studies have 
examined the differences in conventional 
scale elevations produced by groups of black 
and white inmates. Stanton's (1956) previ- 
ously cited study (see the section on First 
Offenders Versus Recidivists) initially com- 
pared groups of black and white inmates and 
found no significant differences. Caldwell 
(1959; 2) pursued a similar design and 
found that black inmates obtained signifi- 
cantly higher scores on the Hs, D, Mf, Pa, 
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and Ma scales and significantly lower scores 
on the Pd scale. Panton's (1959a) previously 
cited study (see the section on First Offend- 
ers Versus Recidivists) found that black in- 
mates scored significantly higher than white 
inmates on ће F, Pa, Sc, and Ma scales and 
significantly lower on the Hy scale. As part 
of a larger study, Costello, Fine, and Blau 
(1973; 1) found no significant differences 
between their samples of black and white 
inmates. Elion and Megargee (1975; 3) in- 
vestigated the validity of the Pd scale among 
black males using groups of prisoners and 
groups of college students. They found that 
black inmates scored significantly higher on 
the Pd scale than both a group of “culturally 
deprived black male university students" 
and a group of white inmates. They con- 
cluded that elevations on the Pd scale validly 
express levels of social deviance among young 
black males but that the present scale norms 
appear to show racial bias. 

A few studies have examined black-white 
differences with respect to experimental 
scales, Panton (1959b; 2) compared black 
and white inmates on the Harris-Lingoes 
subscales and found that black inmates 
scored significantly higher on the Ideas of 
External Influence (Pal), Sc1A, and Ma4 
subscales and significantly lower on the Pd2 
subscale. Haven (Note 10; 1) investigated 
racial differences on Megargee’s O-H scale 
and found that black inmates scored sig- 
nificantly higher than white inmates, sug- 
gesting either that black inmates as a group 
experience more feelings of social alienation 
or that they have been shaped by societal 
pressure to more actively inhibit aggression. 
(Haven favored the latter alternative.) As 
part of a larger study, Fisher (1969; 1) 
investigated racial differences on the Re- 
pression-Sensitization (R-S) scale and found 
that white inmates scored significantly higher 
than their black counterparts, suggesting that 
the black inmates showed evidence of more 
repression. 

More recent studies have directly con- 
fronted the issue of racial and cultural bias 
in the MMPI with their comparisons of pri- 
soners from different racial and cultural 
groups. McCreary and Padilla (1977) com- 

pared 40 black, 36 Mexican American, and 267 
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white male misdemeanor offenders who had 
been convicted and were awaiting sentencing 
They compared individual scale elevations ag 
well as scores on Goldberg’s (1965) linear 
classification system and ran unmatched com. 
parisons as well as comparisons in which 
subjects were matched on educational level 
and occupation. They hypothesized that dif 
ferences due to socioeconomic factors would 
appear only in the unmatched comparisons, 
whereas differences due to cultural factors 
would appear in both comparisons. Analyses 
were conducted using only valid profiles (less 
than 30 items left blank, F scale less than 4 
raw score of 23, and the F-K index less (ай 
11) and using all profiles, but there were 
virtually no differences in the results of these 
two analyses, so valid profile results were! 
used to report most findings. Mexican Атей 
cans had significantly less education (ай 
whites, but no other significant differences] 
emerged with respect to educational level 
or occupation. In the unmatched compat 
sons, blacks were significantly higher on the 
Al scale and significantly lower on the Hy 
and Mf scales than whites. Mexican Ameri- 
cans were significantly higher on the L, Hs, 
and О-Н scales than whites. (They were als? 
higher on the К scale when all profiles we 
used.) In the matched comparisons, blacks) 
scored significantly higher on the K and Ме F 
scales and significantly lower on the Hy scale | 
than whites. (Only the Ma scale difference 
remained significant when all profiles M 
used.) The Mexican Americans scored higher 
on the L, K, and О-Н scales thal 
whites, The only significant difference 
the Goldberg indices showed that on tht 
psychiatric-sociopathic index, the Mexic 
Americans scored in the psychiatric rangi 
whereas the whites scored in the sociopathic 
range (unmatched condition). McCreary E 
Padilla concluded that both cultural ^ 
socioeconomic factors seemed to contribu” 
to the observed MMPI differences betwe® 
these three groups. 4 
е aud Pritchard (1978) M 
pared 104 black and 191 white male in? 4 
on the MMPI with respect to differences v 
full-scale WAIS IQ scores. They 1600 
number of years in school as an арргор a 
indicator of educational achievement bee 
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most of their sample of Mississippi inmates 
had been educated in segregated schools that 
were not considered to be equivalent in edu- 
cational quality. They divided their sample 
into four subgroups with respect to race and 
IQ (using the overall sample's mean IQ of 
93 to divide the groups). They found no 
racial differences between the high-IQ groups 
but found that the low-IQ blacks (№ = 81) 
scored significantly higher on the Hs, Sc, and 
Ma scales and significantly lower on the Hy 
scale than the low-IQ whites (N — 70). 
Successive applications of more stringent 
validity rules did not affect these findings. 
Rosenblatt and Pritchard concluded that 
racial differences on the MMPI seem to be 
limited to low-IQ subjects. 

The findings on the conventional scales are 
not definitive at this time. The most con- 
sistent finding appears to be a higher ele- 
vation for blacks than whites on the Ma 
scale (five studies, including Panton's Har- 
ris-Lingoes subscales findings), with a similar 
but weaker trend on the Sc and Pa scales 
(three studies each, all of which included 
Panton’s Harris-Lingoes subscale findings) 
and the Hs scale (two studies). Blacks also 
seem to occasionally score lower than whites 

' on the Hy scale (three studies) and the Mf 
scale (two studies). Only Elion and Me- 
gargee (1975) found significantly higher 
elevations for black inmates on the Pd scale, 
whereas significant differences in two other 
studies (including Panton’s Harris-Lingoes 
subscale findings) showed white inmates 
scoring significantly higher on the Pd scale. 
None of the findings for the experimental 
scales have been replicated. The most cru- 
cial findings here appear to be the results 
of Rosenblatt and Pritchard (1978), which 
tend to support the position on racial bias in 
the MMPI (discussed in the Sources of Var- 
iance and Their Effects on Test Results 
section), Apparent racial bias in the MMPI 
may actually be due to educational factors, 
since more intelligent blacks do not seem to 
display the differences in MMPI perform- 
ance that less intelligent groups of blacks 
display. More research is definitely needed 
to resolve the issue of racial and cultural 
bias in the MMPI, and such efforts should 
investigate the differences between black and 
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white MMPI inmate protocols with respect 
to education and/or IQ. 

Sex differences. Research with the MMPI 
in prison populations has been conducted al- 
most completely with male inmates. Only two 
of the studies reviewed previously (Davis, 
1971; Snortum et al., 1970) used female in- 
mate populations. Although there are far 
fewer female inmates than male inmates in 
the United States today, the female inmate 
population “presents a significant minority 
whose needs in terms of proper classification, 
treatment and training are as great as the 
needs of their more numerous male counter- 
parts” (Panton, 1974). Two studies were 
found that compared samples of male and 
female prison inmates. Panton (1974; 2) 
examined differences on conventional scales 
and the Harris-Lingoes Pa subscales and 
found that female inmates scored signifi- 
cantly higher than male inmates on the Poig- 
nancy (Pa2) and Si scales and scored sig- 
nificantly lower on the Hs and D scales. 
Panton concludes that male inmates “ap- 
pear to be more anti-social with neurotic 
overlays,” whereas the female inmates “ар- 
pear more asocial than‘anti-social with over- 
lays of greater emotional sensitivity” (p. 
332). Joesting, Jones, and Joesting (1975; 
1) examined differences on the conventional 
scales and seven experimental scales and 
found that female inmates scored signifi- 
cantly lower than male inmates on the F, Hs, 
D, Hy, Pd, Mf, Pa, Sc, Ma, Si, Ec, A, R, 
Pd1, Dc&i, and SD scales, and scored sig- 
nificantly higher on the K and Ap scales. 
They concluded that the males appeared 
more emotionally disturbed than the females. 
Although certainly lacking in consistency, 
both of these studies suggest significant 
MMPI protocol differences between male and 
female prison inmates that preclude the gen- 
eralizability of the research findings in this 
article to female inmate populations. At- 
tempted replication of these male inmate 
findings with female inmate populations is 
the only possible solution to this problem. 

Panton (1976; 2) compared a group of 
male prisoners admitted in 1966 to a group 
of male prisoners admitted in 1971 to in- 
vestigate any population changes over time 
on the Mf scale and the Pepper and Strong 
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(Note 11) Mf subscales. He found that the 
1971 group scored significantly higher on 
the Mf, Mf1, and М/5 scales, and concluded 
that the 1971 sample showed “а greater per- 
sonal and emotional sensitivity and a more 
frequent rejection of masculine occupations 
and avocations" (p. 606) than their 1966 
counterparts. This finding needs replication 
but suggests that considerable caution should 
be employed in applying any of the findings 
concerning the Mf scale that were discussed 
in this review. Although the essential mean- 
ings of these findings are probably not 
affected, any T-score levels employed for 
interpretive purposes may be anachronistic 
and therefore misleading. 


Conclusions and Recommendations 


The studies covered in this review can 
most accurately be described as a beginning 
effort in the attempt to maximize the use- 
fulness of the MMPI in corrections, As men- 
tioned in the introduction, these findings 
are best viewed as indicative of the potential 
of the MMPI in prison work, not as a final 
judgment of its worth. Since the overall rep- 
resentativeness of the experimental samples 
employed was often restricted, and several 
methodological shortcomings were apparent, 
the generalizability of the present findings is 
limited. The issue of racial and cultural bias 
in the MMPI with respect to prison popula- 
tions in particular is a vital question that 
demands extensive research in it’s own right, 
since at present one cannot determine with 
an acceptable degree of certainty just what 
а minority group member's MMPI means 
and does not mean. No legitimate inferences 
can be generalized from the present findings 
to women inmates, and the effects of age 
and IQ will probably modify interpretations 
somewhat when the nature of these effects 
is more precisely defined. A strong note of 
caution is necessary with respect to the 
appropriate use of this test: The MMPI 
serves as a source of probabilistic clinical 
statements that are hypotheses for further 
exploration, and employment of the MMPI 
as the sole basis for any kind of decision 
effecting the life of a subject constitutes a 
serious abuse of the test. This is because 
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the “noise” inherent in any test of this type 
always creates the danger of false Positive: 
conclusions, and as a result the MMP 
should be used in conjunction with other 
sources of data to form the basis of impor- 
tant decisions affecting a subject’s welfare 
if it is to be employed in an enlightened and 
ethical manner. In any event, these initial 
research findings do appear to be promising. 
Further research is necessary for a conclu- 
Sive assessment of the MMPI's potential for 
becoming an important and valuable aid i 
correctional practice that would benefit the 
inmate as well as the correctional personnel: 
responsible for effective program functioning, 

The MMPI shows potential in several 
areas. The findings concerning the O-H 
experimental scale and the 4-3 prototype 
possess a strong preliminary research base 
that needs cross-validation; however, evi- 
dence is sufficiently strong at this point to 
warrant the provisional application of these 
indicators in correctional practice as warn- 
ing signs of potentially dangerous violent 
outbursts. Several other findings appear 
promising but need further study. For ex 
ample, the identification of homosexuality 
with the MMPI could be profitably em- 
ployed in administrative and treatment de: 
cisions after some further exploration and 
refinement of the present findings. The clas- 
sifications of psychopathologic behavior de- 
Scribed previously could also be of great use 
to administrative and treatment personnel 
if expanded by further study. The Rm" 
experimental scale and index demands in-i 
Stant examination, for it could prove to W 
a highly effective predictor of recidivism tha 
could substantially improve hit i 
chance prediction. Finally, the pursuit s 
the findings of Megargee (1977) and his ke 
ciates may prove to be more fruitful p 
can be presently imagined. Inconclusive wi 
ings exist on indicators of institutional на 
justment, but the MMPI could yet d 
to be worthwhile in this area. The identi id 
tion of suicide risks in prison has er a i 
explicably ignored to date and deman: a aa 
mediate exploration, Areas investiga ug 
which the MMPI appears to be of ati 
present worth include the post hoc comp E 
son of first offenders versus recidivists 
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the identification of addictions and escape 
risks. In short, further research on the 
MMPI’s use in several of these areas could 
produce results that would prove to be in- 
valuable to correctional personnel. 

In 1970, Haven (Note 1) stated that “an 
overview of all the research reviewed gives 
the rather discouraging picture of a hodge- 
podge of one-shot investigations. There were 
few follow-ups and little cross-evaluation of 
previous findings" (p. 39). Unfortunately, 
this still seems to be largely true 7 years 
later. With the two outstanding exceptions 
of Panton and Megargee, there is no evi- 
dence of any concerted effort at comprehen- 
sive longitudinal study with thorough follow- 
up procedures for the MMPT's use in prison 
work. Such efforts are sorely needed and 
can be instigated by the routine testing of 
prison admissions with the full 566-item 
MMPI to build up a data bank. However, in- 
spection of Table 2 reveals that approxi- 
mately two-thirds of the studies reviewed 
seem to have drawn their MMPIs from just 
such a data bank; the lack of well-controlled 
longitudinal research becomes increasingly 
difficult to justify in the light of this fact. 
A few selected areas such as recidivism may 
‘be more profitably investigated with MMPIs 
administered long after initial admission, yet 
the comparison of such protocols with the 
corresponding admissions protocols across 
the criterion and comparison groups could 
in itself provide dependable indices of spe- 
cific inmate behaviors. In summary, more 
longitudinal research with thorough follow-up 
Procedures is needed if effective investiga- 
tion of the MMPI’s potential in corrections 
is to be realized. 

Past methodological shortcomings must 
be eliminated if future results are to prove 
fruitful. Future research must observe either 
Standard random sampling procedures or 
Standard matching procedures if their find- 
ings are to be representative; the presence 
of a large MMPI data bank should facilitate 
the meeting of these requirements. Investi- 
gators must also instigate better controls or 
Post hoc checks for variables such as age, 
Tace, and IQ to maximize the representative- 
Déss and generalizability of their results. In 
fact, these and several other design aspects 
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(such as F scale elevations, fake-good and 
fake-bad indicators, and the several remain- 
ing subject characteristics listed in Table 2) 
demand further research in their own right 
to produce methodological refinements that 
should increase the MMPI’s sensitivity to 
significant differences. One of the most im- 
portant improvements that needs to be made 
over previous predictive research is the in- 
corporation of Meehl and Rosen’s (1955) 
essential guidelines prescribing the considera- 
tion of base-rate probabilities. The different 
approaches to MMPI data interpretation 
discussed in the Methodology section should 
also be kept in mind, since the simultaneous 
investigation of more than one of these ap- 
proaches may produce unexpectedly valuable 
findings. Once again, longitudinal research 
with thorough follow-up procedures is a 
must. Future researchers who incorporate 
these important guidelines into their experi- 
mental designs will maximize both their 
ability to detect actual significant differences 
and the generalizability of their findings. 

The central purpose of this review has 
been to stimulate needed cross-validation of 
existing findings as well as to encourage 
further exploratory research with the MMPI 
in prison work. In addition, it is hoped that 
the interfacing of these findings with the 
ongoing exploration of the new classification 
system of Megargee (1977) and his associates 
will enhance the precision and productivity 
of the MMPI's employment in corrections, 
The full extent of the MMPI’s ultimate 
usefulness in this respect cannot yet be 
fully appreciated, but its potential should 
not be underestimated at this point. The 
MMPI may someday prove to be an indis- 
pensable factor in the creation of more 
effective rehabilitative approaches to correc- 
tional practice. 
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Validity Conditions in Repeated Measures Designs 


Huynh Huynh and Garrett K. Mandeville 
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University of South Carolina 


This article has two objectives. The first is to present necessary and sufficient 
conditions for the validity of traditional within-subject F tests in repeated 
measures designs. It is shown that the Mauchly sphericity criterion (W) and 
possibly the Box test for the equality of covariance matrices are appropriate 
to judge the validity of these conditions. Valid applications of both tests are 
conducted on sets of orthogonal normalized variables that are associated with 
each cluster of within-subject mean square ratios. The second objective of the 
article is to present empirical results on the appropriateness of using the W 
criterion when the variates are not normally distributed. For light-tailed dis- 
tributions, the W criterion was shown to be moderately conservative, whereas 
for heavy-tailed distributions, empirical Type I error rates exceeded nominal 
alpha. Since most social science applications typically involve light-tailed rather 
than heavy-tailed distributions, the W criterion should provide useful results 


in most cases. 


Traditional univariate analyses of vari- 
ance for repeated measures (or mixed model) 
designs are used extensively in educational 
and psychological research (Kirk, 1968; 
Winer, 1971). In most situations observa- 
tions are made on each subject in the sample 
under each combination of conditions (the 
within-subject factors). For example, a de- 
velopmental psychology study may require 
measurements at several time intervals, with 
alternate forms used at each point in time. 
Thus, Time and Form would constitute two 
within-subject factors. If no factors that 
differentiate subjects are included, the design 
implies the testing of three within-subject hy- 
potheses dealing with the effects of Time, 
Form, and the Time x Form interaction. 
Corresponding to each of these effects is a 
(nonunique) set of orthogonal normalized 
(orthonormal) variables. 

On the other hand, if the subjects are 
stratified on one or more independent (be- 
tween-subjects) factors, the interaction of 
these factors and the repeated measures fac- 
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tors provides more within-subject hypotheses 
that are subject to testing. In general, how- 
ever, all within-subject hypotheses may be 
grouped into clusters that are tested using 
a common error term. In the example, if sub- 
jects were categorized according to age level, 
the mean squares associated with Time and 
the Age X Time interaction would be tested 
against the Time х Subject-Within-Age error 
term, The second cluster of hypotheses deals 
with Form and the Age х Form interaction, 
and the last group focuses on the interactions 
Time x Form and Time x Form X Age 
in the case of no independent factors, each of 
these clusters of effects and the error terms 
against which they are tested are associat 
with a set of orthonormal variables. Tu 
variables remain unchanged regardless 0 
whether the subjects are subdivided into in- 
dependent categories or not. As is discuss? | 
later, within normal distributions only the 
orthonormal variables will play 4 role m 
the validity of the traditional univariate i 
tests for repeated measures designs Sm 
to the layout in the developmental psy 
ogy example. E 

A confusion exists regarding the ke 
conditions for the within-subject F tests 
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repeated measures designs. Originally these 
tests were derived from models implying 
equal variances and equal covariances (the 
compound symmetry condition) for the re- 
peated measures (Scheffé, 1959). Several 
textbooks and articles written in the 1960s 
or earlier (Collier, Baker, Mandeville, & 
Hayes, 1967; Kirk, 1968) tended to treat 
compound symmetry as the required (e.g., 
necessary and sufficient) condition for the 
validity of the within-subject F tests. Huynh 
and Feldt (1970) and Rouanet and Lepine 
(1970), however, have shown that compound 
symmetry is only a sufficient condition. Al- 
though counterexamples are illustrated in 
Huynh and Feldt (1970), authors of more 
recent textbooks (Ferguson, 1976; Keppel, 
1973), articles (Davidson, 1972; Poor, 
1973), and computer manuals (Nie & Hull, 


` Note 1) still mistake compound symmetry 


as a sine qua non assumption for the F tests. 

Confusion also persists in statements re- 
garding the validity conditions for designs 
with between-subjects (independent) factors. 
Huynh and Feldt (1970, Theorems 2 and 4) 
prove that these conditions involve only the 
orthonormal variables and give a counter- 


. example, showing that it is unnecessary to 


assume equality of covariance matrices for 
the repeated measures across the independent 
factors. The latter condition is quoted in 
Several design textbooks (Keppel, 1973; 
Winer, 1971, p. 523). To resolve this con- 
fusion, the next section of this article dis- 
cusses the conditions under which the tradi- 
tional within-subject F tests in repeated mea- 
sures designs are valid, 


Description of the Validity Conditions 


The validity conditions under normality 
for the within-subject F tests have been in- 
vestigated by Huynh and Feldt (1970) for 
randomized block (one-factor repeated mea- 
Sures) designs and for simple split-plot de- 
Signs (two-factor designs with repeated mea- 
Sures on one factor), by Rouanet and Lepine 
(1970) for randomized block designs, by 
Mendoza, Toothaker, and Cain, (1976) for 
three-factor designs with repeated measures 
on two factors, and by Huynh (1978) for 
Complex designs involving both independent 
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and repeated factors. In the most general 
terms, the mean square ratios in each cluster 
of hypotheses follow exact F distributions if 
and only if (a) the covariance matrices for 
the associated set of orthonormal variables 
are identical across all levels of the inde- 
pendent factors, and (b) the common co- 
variance matrix has a sphericity pattern (i.e., 
equal variances and zero covariances). If 
there are no independent factors, then the 
first condition does not apply. It may be 
noted that both conditions are based on the 
orthonormal variables and not on the original 
repeated measures. Therefore, they are more 
general than the requirements of equality of 
the covariance matrices for the original re- 
peated measures and of compound symmetry 
for the common matrix. The remainder of 
this section displays the necessary and suf- 
ficient conditions for four typical situations. 


Case 1: One-Factor Designs With Repeated 
Measures on the Factor 


Assume that all subjects are measured 
under the 5 levels of the repeated factor B, 
and let the vector X = (Xi, Xz... Љу) 
be the observation vector. Let M be any 
(b — 1) X b matrix of (5 — 1) orthonormal 
row vectors. Then Y — MX transforms the 
b original variables to (5 — 1) orthonormal 
variables. For example, if b = 4, the matrix 
M may be taken as follows: 


10 —1/(2)! 0 0 
[roo 1/(0) —2/(6) ОШ 
1/02) 1/2? 1/(2) —3/(12) 


Let X(X) be the covariance matrix of X. 
Then the covariance matrix of Y is X(Y) — 
MX(X)M'. Within normal distributions, the 
mean square ratio for the B effects follows 
am exact F distribution if and only if X(Y) 
= M, -1, I,-1 being the identity matrix of 
order (6 — 1). In other words, the (5 — 1) 
orthonormal variables Y are independent and 
equally variable, that is, the sphericity pat- 
tern holds. 


Case 2: Two-Factor Designs With Repeated 
Measures on One Factor 


Suppose that the subjects in Case 1 are 
categorized into a levels of an independent 


966 HUYNH HUYNH AND G 


Table 1 


Matrices Defining the Orthogonal Normalized Variables for the B, C, and BC Mean 


Square Ratios 


Repeated measures 


ARRETT K. MANDEVILLE 


Matrix ВІСІ Bic2 B1C3 B2C1 B2C2 B2C3 B3C1 Aem. B3C3 
B within-subject effects 
4 6) 1/(60)? —1/(0! —1/(60! —1/(6) 0 0 
Мвт И фу aay 1/08) » Á1/(8)9  1/(89 —2/(18) —2/(18)! E 
C within-subject effects l 
+ —1/(6)! 0 1/(60) —1/(6) 0 1/(60) —1/(6) 0 
Mc = Tu Taine —2/08)» 1/(18)1  1/(89 —2/(18)#  1/(8)9  1/(18) ED 
BC within-subject interaction effects 
0 
1/2 -1/2 0 -1/2 1/2 0 0 0 
M ilap —-1/(2)! 0 1/(12) MN Zi porn #49, о 
Re = |1/02)  1/(12* —2/(12)4 —1/(12)! —1/(12) 
AN. fs -2/6 1/6 1/6 —2/6 —2/6 —2/6 4/6 


, (between-subjects) factor A. The within-sub- 
ject hypotheses regarding the B and AB ef- 
fects share the same error term. Thus, they 
belong to one cluster, and the corresponding 
orthonormal variables are defined by the Y 
vector of Case 1. Let XY), —1,...,a 
be the covariance matrices of Y for the a 
levels of Factor 4. The B and AB mean 
square ratios follow exact F distributions if 
and only if (a) the 3 (У) matrices are iden- 
tical, and (b) the common covariance matrix 
has the sphericity pattern. 


Case 3: Two-Factor Designs With Repeated 
Measures on Both Factors 


Consider now a B X C two-factor design in 
Which both B and C are repeated factors. 
This corresponds to the developmental psy- 
chology study described previously in which 
all subjects are pooled together in one group. 
There are three within-subject mean square 
tatios—one for the B effects, one for the C 
effects, and one for the BC interactions. Each 
corresponds to a set of orthonormal variables, 
namely Y; for В,Ү, for C, and Y вс for BC. 
(See Table 1 for an illustration.) Mendoza 
et al. (1976) show that each mean Square 
ratio has an exact F distribution if and only 
if the corresponding orthonormal variables are 
independent and equally variable. Hence, the 


Е test for B is valid if and only if 3(Ys) = 
AnL, – 1. The F test for C requires that X(Yo) 
= AcL..;, and the F test for BC demands 
that X(Y ле) = Ancl œ - 1) (6 – 1). In these con- 
ditions, the constants As, Ac, and Аво need 
not be equal. 


Case 4: Three-Factor Designs With Repeated 
Measures on Two Factors 


Finally let Ax Bx C be a mec. 
design in which A is the independent ( e- 
tween-subjects) factor and both B and ae 
repeated (within-subject) factors. The wi 
subject mean square ratios are then a не, 
into the following three clusters: the pr 
AB effects, the C and AC effects, and he А, 
and ABC effects. Each cluster correspon А 
а set of orthonormal variables defined el 
Case 3 (e.g. as if there were no indes po 
factor). Let (У а), for example, be pi 
variance matrix of the У в orthonormal 
bles at the ith level of factor 4. D 
two mean square ratios for B and 4 


| 


exact F distributions if and only if (a) die 


> (У») matrices are identical, and her. 


common covariance matrix has à n 
pattern. Similar necessary and ur R 
ditions hold for the other two С 

mean square ratios. 
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"Testing for Sphericity 


» As indicated in each case of the previous 


pr 


section, the validity of the within-subject F 
tests requires sphericity for some covariance 
matrix. The matrix may be either the co- 
variance matrix of a suitably chosen set of 
orthonormal variables or the common co- 
variance matrix, if the design has some be- 
tween-subjects factor(s). Thus, a test for 
sphericity is required if preliminary testing 
is to be considered. 

Let the p-component vector Y be normally 
distributed with unknown mean vector and 
covariance matrix X(Y). The assumptions of 
independence and variance homogeneity for 
the р components of Y are equivalent to the 
condition that (Y) = AL, where I, is the 
identity matrix of order $. Our interest is in 
testing the above condition as a null hypothe- 
sis (Но) against the alternative hypothesis 
Hı: X(Y) == AL,. Let X(Y) be estimated by 
the sample covariance matrix S based on a 
random sample of m vectors. Then the likeli- 
hood ratio test for Ho against Н; is of the 
form W = |S|/|trace S/p|? (Mauchly, 1940). 
The exact sampling distribution of W has 
been provided by Consul (1967, 1969), Pillai 
and Nagarsenker (1971), Mathai and Rathie 
(1970), and Nagarsenker and Pillai (1972, 
1973; these also provide tables of critical 
values for W). 

In the context of repeated measures de- 
signs, the matrix S is the sample covariance 
matrix of each set of orthonormal variables 
if there are no independent factors (Cases 1 
and 3). For designs with independent fac- 
tor(s), S is taken as the pooled covariance 
matrix if the assumption of equal covariance 
matrices for the orthonormal variables is ten- 
able (Cases 2 and 4). 


Testing for Equality of Covariance 
Matrices 


As illustrated in Cases 2 and 4, suitably 
Chosen sets of orthonormal variables must 
Share the same covariance matrix across the 
independent factors for the corresponding 
within-subject mean square ratios to follow 
exact F distributions. Within normal distri- 
butions, equality of covariance matrices may 
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be tested via the Box modified likelihood ratio 
criterion M (Morrison, 1976; Timm, 1975; 
Winer, 1971). Let p be the number of ortho- 
normal variables and & be the number of 
levels of the independent factor. Let S; be 
the traditional unbiased estimate of X;, asso- 
ciated with the ith level of the independent 
factor and based on 7; degrees of freedom. 
Then the hypothesis 3; = ...= 5, may be 
tested via the Box criterion: 


k 
M = Znin|S| — У ndn|S;|, 


i=l 


where 


k 
S = У niS/Zn; 


is the pooled estimate of the common covari- 
ance matrix. If 


2p? + 3р 1 ( 1 1 ) 
^ ОФ ОФ— А n Хт; 
then pM approximately follows a chi-square 
distribution with (& — 1)p(p + 1)/2 degrees 
of freedom if the condition of equality of the 
covariance matrices holds. Tables of critical 
values of M at the .05 level may be found in 
Korin (1969) or Pearson and Hartley (1972) 
for the case of equal sample size. 

To test the validity conditions in the pres- 
ence of independent factors, a two-step pro- 
cedure is suggested. First, the Box test may 
be carried out for suitably chosen sets of 
orthonormal variables. If equality of covari- 
ance matrices is tenable, then the Mauchly 
test for sphericity may be carried out on the 
pooled estimate S of the common covariance 
matrix. If а is the (joint) probability of the 
Type I error of the two-step procedure, then 
each step may be performed at the 2/2 level 
of significance. The two following sections 
present numerical illustrations for Cases 3 
and 4. 


pal 


Numerical Examples 
Example 1 


Table 2 presents the basic data for 22 sub- 
jects in a B X C design with repeated mea- 
sures on both B and C. Table 1 displays the 
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Table 2 
Basic Data for Numerical Examples 1 and 2 


HUYNH HUYNH AND GARRETT K. MANDEVILLE 


Repeated measures 


Subject вас!  Bi1C2 Bic3 B2Ci  B2C2  B2C3  B3Ci . B3C2 B303 
1 53 20 12 14 42 30 10 5 63 
2 23 55 77 10 2 30 56 30 50 
3 20 30 50 43 12 30 53 21 20 
4 3 20 77 45 53 32 65 30 20 
5 23 22 21 12 32 30 3 54 33 
6 33 89 53 65 45 42 2 10 23 
i 30 33 55 42 87 30 2 10 30 
8 36 56 32 3 65 86 54 23 30 
9 23 3 78 63 68 68 54 12 39 

10 98 65 63 32 45 75 86 63 21 
11 53 65 86 96 63 32 12 45 25 
12 33 22 42 21 3 35 63 32 54 
13 10 30 53 65 43 20 32 45 65 
14 32 35 63 65 66 33 53 63 32 
15 12 30 56 22 30 56 42 30 12 
16 56 33 89 65 78 99 63 30 24 
17 53 30 36 65 22 33 22 54 33 
18 32 30 36 65 63 32 30 36 65 
19 30 36 65 66 33 22 12 30 32 
20 98 65 63 65 45 12 2 36 30 
21 00 22 11 42 53 63 32 53 3 
22 30 35 63 66 33 22 56 52 21 


coefficients defining the orthonormal variables 
associated with the within-subject effects B, 
C, and BC. Let S(X) be the unbiased esti- 
mate of the covariance matrix of the nine 
repeated measures. Then the sample covari- 
ance matrices of the orthonormal variables 
associated with B, C, and BC are, respectively, 


S(Y5) = MjS(X)M'5, S(Y;) = 
M;S(X)M';, 


and S(Yac) = M&cS(X)M'sc. The corre- 
sponding sphericity criteria are Wp = 9921, 
Wo = .9923, and Wego = .5055. At the .05 
level of significance, the critical values are 
7411, .7411, and .4173, respectively. Since 
large values of W support the sphericity as- 
sumption, the data indicate that the validity 
conditions for the B, C, and BC traditional 
F tests are tenable. 


Example 2 


The 22 subjects of Table 2 are now assigned 
to two levels of the independent factor A. 
The first level consists of the first 10 subjects, 


and the second level has 12 subjects, The 
orthonormal variables associated with the 
clusters of within-subject effects (B, 4B), 
(C, AC), and (BC, ABC) are defined by the 
matrices Ма, Mc, and Мас as in Numerical 
Example 1. The Box criteria for equality of 
covariance matrices are, respectively, Мв= 
2.4626, Ме = 1.7064, and Mac = 10.2912. 
The critical values at the .025 level of signifi 
cance are prx? (3) = .8902 x 9.3484 = 83219 
for My and Мо, and pscx'(4) = 7821 X 
11.1433 = 8.7152 for Myc. The Box criteria 
are thus small enough to support 
sumption of equal covariance matrices o 
the two levels of A for each of the three 59 
of orthonormal variables Ys, Yo, and Yso: 
The preliminary testing may а proce 
to the hypothesis of sphericity for p 
mon arane matrix. The Mauchly c" 
teria for the clusters (B, AB), (C, A 5. 
(BC, ABC) are .9912, .9780, and 4840, de 
spectively. It may be noted that under 


assumption of equality of the covariance 


trices, each pooled sample covariance vr eia 


with 20 degrees of freedom may be c? 


the рге" | 
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to be based on a random sample of 21 sub- 
^jects. By entering the value of N = 21 in the 
tables of Nagarsenker and Pillai (1972), the 
interpolated critical values at the .025 level 
are .6776 for the first two clusters and .3542 
for the last cluster. Hence, the assumption 
of sphericity holds for each set of orthonormal 
variables. 

In summary, the data analysis (at the joint 
.05 level of significance) indicates that assum- 
ing normality, the validity conditions for the 

# traditional within-subject F tests hold for the 
data of this numerical example. 


Effect of Nonnormality on the 
Mauchly Test 


As indicated previously, the Mauchly test 

is appropriate in all cases to check the valid- 
“ity of the traditional testing procedure. Being 
) by construction a LR criterion test, W may 
be suspected to be oversensitive to departure 
from normality. This is known to occur for 
various LR tests of variance homogeneity 
such as the Bartlett test and two of its com- 
petitors, the Hartley Fmax and the Cochran 
criteria (Scheffé, 1959; also see Games, Wink- 
Jer, & Probert, 1972, for a list of references). 
_ On the other hand, there аге indications that 
Y least squares procedures (which are equivalent 
* to LR methods in some situations) are fairly 
robust when the populations involved have 
light tails (Hogg, 1974; Tukey & McLaugh- 
lin, 1963, p. 332). As noted by Hogg, there 
are many practical situations in which dis- 
tributions are inherently light tailed. This is 
particularly true in the social sciences if the 
measuring instrument has modest floor and 
Ceiling effects. However, data may contain 
Outliers that tend to shift the density toward 

| the tails, thus creating situations in which 
tails are heavier than that of the normal dis- 
tribution. Since the F test for means is fairly 
Insensitive to the shape of the parent distribu- 
tions and since the W criterion may be used 
to determine the tenability of the F test, it 
5 desirable to explore the effect of nonnor- 
тајну on W. The remainder of this article 
focuses on the Type I error rate associated 
With the W criterion under several instances 
volving light-tailed distributions, heavy- 
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tailed distributions, and mixtures of normal 
distributions. These results were obtained us- 
ing the technique of computer simulation for 
situations in which the p components of the 
vector X were independent and had the same 
distribution. For normal distributions, the 
standardized fourth moment (82, a measure 
of kurtosis) = 3. Light-tailed distributions 
correspond to 8» < 3 and heavy-tailed distri- 
butions to 8» > 3. 


Selection of Common Distributions 
(Component Distributions) 


Eight component distributions were selected 
to represent a variety of departures from nor- 
mality. The three light-tailed distributions 
with bounded range that were chosen were 
the uniform distribution on the 0 — 1 inter- 
val (82 = 1.8), the convolution of two such 
uniform distributions (a triangular distribu- 
tion with 85 = 2.4), and the convolution of 
three such uniform distributions (a trape- 
zoidal distribution with 85 = 2.6). They are 
subsequently denoted as U1, U2, and U3, re- 
spectively. 

Five heavy-tailed component distributions 
were also selected. The first represents the 
distribution of the product of two mutually 
independent variables, one having the 0 — 1 
uniform distribution and the other being nor- 
mally distributed with zero mean and unit 
variance [і.е., N(0, 1)]. This distribution 
(B2 = 5.4) is labeled UN. The second heavy- 
tailed distribution was chosen to be the La- 
place (or double exponential) distribution for 
which 82 = 6. The remaining three heavy- 
tailed distributions were mixtures of two nor- 
mal distributions, each with mean zero and 
variances of 1 and 9, respectively [i.e., N (0, 
1) and N(0, 9)]. The mixtures were denoted 
as (1 — A) N(0, 1) + AN(O, 9), and the mix- 
ing proportions А were set at 5% (B= 
7.653), 10% (B2 = 8.333), and 20% (85 = 
7.544). 


Simulation Process 


A computer program was written to simu- 
late n independent vectors, each having p 
components drawn independently from the 
same component distribution, A sample co- 
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variance matrix S based on these m vectors 
was then obtained, and the criterion W was 
derived from S. The appropriate critical value 
for W was retrieved from tables in Nagar- 
senker and Pillai (1973). The empirical pro- 
portion of Type I errors was obtained by di- 
viding the number of times that W exceeded 
the given value by the number of data sets 
simulated. Five thousand replications were 
made for each combination of л, р, and com- 
ponent distribution to estimate the true Type 
І error at nominal alpha values of 10%, 5%, 
2.5%, and 196. To check the accuracy of the 
simulation process, initial computer runs were 
made using the normal distribution. As may 
be seen in Table 3, the discrepancies between 
the empirical Type I errors and their respec- 
tive true values for the normal case are within 
2.3 standard errors for proportions. 


Results 


Though the simulation was conducted for 
P — 2, 3, 4, and 5, combined with several 
levels of sample size n, only the data for p = 
5 are reported in Table 3. The patterns dis- 
played by the empirical Type I error for ? — 
2, 3, and 4 are virtually identical to those ob- 
served for the case p = 5. 

The following trends may be deduced from 
Table 3. 

1. The Mauchly W criterion tends to err 
on the conservative side for light-tailed com- 
ponent distributions, and the discrepancy be- 
tween the empirical Type I error and nominal 
alpha is greater for large samples. 

2. With heavy-tailed component distribu- 
tions, the empirical Type I error rates are 
much larger than the corresponding nominal 
values. As in the previous case, the differences 
become more visible as the sample size in- 
creases. 

3. In all situations under consideration, the 
ratio of the empirical Type I error to the 
posted alpha deviates further from unity at 
smaller alpha values. This trend has also been 
noticed in most studies regarding the robust- 
ness of the F test under nonnormality and/or 
variance heterogeneity (Collier et al, 1967; 


Nec Peckham, & Saunders, 1972; Huynh, 
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Discussion of Simulation Study 

In the simulation study, the beha 
Mauchly's sphericity criterion W 
documented for a number of nonno 
conditions. For light-tailed distributio 
errs on the conservative side in terms 0 
I error. This is partially because W is 
creasing function of the variability ¢ 
eigenvalues of the sample covariance 
S. A light-tailed distribution usually is mot 
dense near the mean than is true of a 
distribution. Thus, the eigenvalues 
from a covariance matrix based on a li 
tailed distribution would be expected to 
play less heterogeneity than those calo 
from a covariance matrix based on à 
distribution with the same variance, 
the critical values under normality for: 
be somewhat larger than are appro 
values of W calculated from light-tail 
tributions. 

The simulation previously reported | 
indicates that W does not behave 
long as the component distribution has 
lighter than that of the normal distri 
Fortunately, most data in the social 5 
are obtained from measuring instrumen! 
a limited score range. The correspond 
tributions are hence likely to be light-ta 
though probably not as extreme as t4 
form distribution. It would be expectet 
W, along with the normal distribution 
values, should do a reasonable job in 
of Type I error in checking the spheri 
sumptions in designs involving repeated 
sures. 

The study also points out the fallibiit 
the Mauchly criterion in the case of hem 
tailed component distributions. The au! 
are not aware of a method of overcoml З 
deficiency, although nonparametric ог 7 
procedures could conceivably be deve қ 
Though W тау be relied on in preli 
testing for the validity of the traditionis 
tests, it may simply be skipped in шаси 
stances. In previous articles on appro x 
tests for repeated measures designs; ЗИ 
and Feldt (1976) and Huynh (1978) | 
shown that an adjustment in the degre? 
freedom would be sufficient to acc? 
most examples of departure of the co' 
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matrix from sphericity. If heavy-tailed com- 
ponent distributions are suspected, simply dis- 
card the Mauchly sphericity criterion and 
conduct an approximate test. In most in- 
stances the true probability of the Type I 
error will not be far from the posted nominal 
alpha. For readers familiar with multivariate 
analysis, of course, alternative testing proce- 
dures such as Hotelling's T and many others 
based on the union-intersection principle are 
available (Morrison, 1976). 


Conclusions 


Details regarding the necessary and suffi- 
cient conditions for traditional tests of the 
within-subject mean square ratios in repeated 
measures designs are presented in this article. 
It is shown that these conditions are based 
on the orthogonal normalized variables asso- 
ciated with each cluster of within-subject 
mean square ratios based on the same error 
term. They are far more general than the as- 
sumptions of equality of covariance matrices 
and compound symmetry of the common co- 
variance matrix for all repeated measures. It 
is shown that appropriate preliminary testing 
for the mentioned conditions may be carried 
out within normal distributions via the Box 
modified likelihood ratio test and the Mauchly 
sphericity criterion. Both tests are to be con- 
ducted on suitably chosen sets of orthonormal 
variables. 

Furthermore, the behavior of the Mauchly 
criterion for nonnormal data was investigated. 
It was shown to provide a conservative test- 
ing procedure for light-tailed component dis- 
tributions and to produce more than the nomi- 
nal percentage of Type I errors for heavy- 
tailed distributions. Since many nonnormal 
distributions that occur in the social sciences 
are light-tailed, the W criterion should be use- 
ful for assessing the validity of the applica- 
tion of the traditional tests of repeated mea- 
sures data. 

The other test, the Box test, has also been 
suspected to be sensitive to nonnormality. 
Further studies are needed to determine the 
seriousness of this problem and possibly to 
identify procedures to overcome it. 
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Published formulas for the large sample variance of the kappa statistic that are 
appropriate for the case of different sets of raters for different subjects, when 
each set of raters is selected at random from a larger pool of available raters, 
are determined to be incorrect. New formulas are derived and checked by 
Monte Carlo simulation. Kappa is shown to be identical, except for terms that 
go to zero as the number of subjects increases, to the intraclass correlation 
coefficient resulting from applying a one-way analysis of variance to the data. 


Many human endeavors have been cursed 
with repeated failures before final success is 
achieved. The scaling of Mount Everest is 
one example, The discovery of the Northwest 
Passage is a second. The derivation of a 
correct standard error for kappa is a third. 

Cohen (1960, 1968) presented kappa and 
weighted kappa as chance-corrected measures 
of agreement between two raters, each of 
whom independently classifies each of a sample 
of subjects into one of & mutually exclusive 
and exhaustive categories. The standard error 
formulas that he presented as well as formulas 
published by Everitt (1968) were shown by 
Fleiss, Cohen, and Everitt (1969) to be 
incorrect. The formulas presented in the latter 
article have been confirmed analytically by 
Landis and Koch (1977a) and have been 
confirmed by means of Monte Carlo simulation 
by Cicchetti and Fleiss (1977) and Fleiss and 
Cicchetti (1978). 

Fleiss (1971) extended kappa to the case in 
which each of a sample of subjects is rated on 
a nominal scale by the same number of raters 
but in which the raters rating one subject are 
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not necessarily the same as those rating 
another. The standard error formulas presented 
in that article are incorrect. Landis and Koth 
(1977b) derived similar statistics for the ast 
of possibly varying numbers of ratings pd 
subject by applying a one-way analysis | 
variance model to the data. The method the 
suggested for calculating standard errors yield 1 
results appropriate for the nonnull case , 
overestimates the standard error. арргорта 8 
for testing the null hypothesis that UM 
parameter is zero. "m 
In this article, formulas for the stan p 
error of kappa in the case of different c 
equal numbers of raters that are valid w! | 
the number of subjects is large and Ет 
hypothesis is true are derived. Тһе rest a 
some Monte Carlo simulations confirm 
the formulas are correct. 


Notation 


(1971). Let N represent the total numbel 
subjects, т the number of ratings Pol У 5 
and k the number of categories into MI 
assignments are made. Let the зове ИД 
where i = 1, ..., №, represent E si : il 
and the subscript j, where Ј= > ' 
represent the categories of the scale. 


(n 


e 
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Define та to be the number of raters who 
assigned the ith subject to the jth category, 
and define 


b> ae (1) 


The quantity 9; is the proportion of all assign- 
ments that were to the jth category. Since 
Dj iy = п Dj pj = 1. 
A motivation of the following formulas, 
ù which are simplifications of the expressions 
presented there, can be found in Fleiss (1971). 
The measure of the extent of agreement 
beyond chance in assigning subjects to 
category j (j = 1, ..., k) is 


> ni(n — nij) 
Nn(n — Пра 


| where q; = 1 — ру. If there is perfect agree- 
ment in the assignments to category j (i.e., 
if each ni; = 0 or n), then x; = 1. If, on the 
other hand, the ә; vary as binomial random 
variables with parameters m and p;, then the 
expected value of x; is 0. The minimum value 
of x; is —1/(n — 1). 
The overall measure of agreement beyond 
*chance is a weighted average of the к;ѕ, 


к= У bass Es №? – У У тр De (3) 
У рд Nn(n — 1) È рр 


The overall measure also varies from a min- 
imum of —1/(n — 1) for poorer than chance 
agreement through 0 for just chance agreement 
to unity for perfect agreement. 


(2) 


к=1— 


Large Sample Standard Errors 


The error committed by Fleiss in his 1971 
article was to ignore the fact that the denom- 
Inators of к and к; are subject to the same 
order of random variation as are their numera- 
tors. The results below are derived by taking 
Proper account of the variation in both the 
numerators and denominators. 

It is convenient to define 


з= — = nij (4) 


30 that kappa (see Equation 3) may be 


915 


reexpressed as 


IS iy n=} s; 
а > 


Consider the hypothesis that the ratings are 
purely random in the sense that for each 
subject, the frequencies ma, mis, ..., n; are 
à set of multinominal frequencies with param- 
eters n and (Ps, P», ..., Px), where 2 P; = 1. 
Using known results about moments of powers 
and products of multinomial variables (the 
moments given by Fleiss, 1971, in Equations 
12-15 are correct, except that the term nij in 
Equation 14 should be squared), it may be 
checked that the covariance matrix > of 
(фу Sn Ps 52, ..., Pky Sx)! is given by the 
following expression : 


Lu У. Xu 
x = = m => j (6) 
Уа У. Dee 
where, letting Q; = 1 — P; and 
Ер =1+2(n —1)P;, (7) 
___РЕГ1 Fj 
Ба = Nn |F; КЕ) — 2(п — 1) Р;Р, | 
(8) 
foris j, and 
_ Q| 1 F; 
ав rol 9 


Under the hypothesis of randomness, the 
expected value of д; is P;, and that of s; 
is РДІ + (n — 1)P;]. The vector v of partial 
derivatives of к with respect to each of its 
components (fi S ..., Px, Sk), evaluated at 
the parameter values, is 

vi 
У (10) 
Vk 


where 


Eel a] o» 


According to standard large sample theory 
(Rao, 1973), the approximate variance of x 
when N is large may be found by replacing 
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the unknown parameters by their sample 
estimates in v/Z у. It may be checked by 
straightforward algebra, then, that the es- 
timated large sample variance of « is 


2 
Var) = Naga = DE pai? 
ХГС 2) — У pias; — РОЈ. 


The variance of x; may be found either by 
applying the same kind of algebra as above or, 
more directly, by specializing Equation 12 
to the binomial case where & — 2 by consider- 
ing only ratings into or not into category j. 
In either case, the large sample variance of 
Ky 15 


(12) 


Var() = (13) 


2 
Nn(n — 1)’ 
independent of the proportions. 

For large values of N and under the hypoth- 
esis of randomness, x and the separate xjs 
are approximately normally distributed with 
variances given by Equations 12 and 13. 
This result follows from the multivariate 
central limit theorem (Rao, 1973) applied to 
the average values (5;] and (s;]. The kappas 
are slightly negatively biased, however. This 
follows from the identity 


гура 
к= и (14) 
where 
N 
У (ni — пр)? 
xy = 1 (15) 


пр; ў 


the chi-square statistic for testing the homo- 
geneity of N binomial samples. Because the 
expected value of X*; under the hypothesis is 
approximately equal to N — 1, that of к; 
(and of x) under the hypothesis is approx- 
imately equal to —1/N (n — 1). 

_ Thus, when N is large, the statistical 
significance of к; may be tested by referring 
the quantity 


me oe! T Nn(n — 1) 
K [5+ М 2 


to the standard normal distribution. The 
statistical significance of x may be tested by 


(16) 
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referring the quantity 


1 
== E + т] (X р) 
© Nn(n — 1) | 
2105 Ра) — 55 Ра — 5)] ü 


to the standard normal distribution. 

The empirical distributions of z and of (2 
were obtained by Monte Carlo simulation fg 
a number of combinations of parameter values 
The empirical standard deviations were а 
close to the theoretical value of unity, andi 
N is at least 25 or 30, then testing the statistica 
significance of к and [x;] by referring th 
values of z and [zj] to the standard normal 
distribution seems safe. 

The incorrect formulas given in Equation 
16 and 23 of Fleiss (1971) overestimate the 
variance, as do the formulas proposed bj 
Landis and Koch (1977b) for the nonnul 
case. The use of these formulas, therefore 
together with a failure to take account of the 
negative bias in the kappas, leads to conservas 
tive tests of significance. 


Kappa.As an Approximate Intraclass 
Correlation Coefficient | 
Landis and Koch (19775) approach tht 
problem of measuring the degree of agreement’ 
on the jth category by applying the algebra 
of a one-way analysis of variance to the data 
resulting from coding assignments to the Ј 
category as 1 and assignments to anotht 
category as 0. The mean square within subjects 
is equal to 


а, 68 
SG > ng(n — ni) 


and the mean square between sub 
equal to 


WMS; 


jects 8 


1 y, (19) 
EL изв a — ту 
fige px C 

They take as the measure of age 
the jth category the sample intraclass 
tion coefficient 


BMS; 


ent on 
rrela- 


BMS;—WMS; . Qu 
557 BMS; + (n — WMS; | 
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and as the overall measure of agreement a 
eighted average of these coefficients, 


. È рат 
pL 21 
У biqi (21) 
It is easily checked that 
1 (Nn — 1 
^ + s(Nn — 1) Q2) 


7(Nn—n4 1 и)! 


which is always larger than xj. If № is large, 
however, rj; and x; are virtually equal. In 
fact, if BMS; is redefined to have'N instead 
of № — 1 in its denominator, then r;; and к; 
are identical. 

A major contribution by Landis and Koch 
(1977b) to the measurement of agreement 
was their consideration of the case of varying 
numbers of ratings per subject. In addition, 
„they indicated how large sample variances 
appropriate to the nonnull case could be 
calculated. 
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The variance portion of Games's three-factor model of inference on independent 
groups is extended. Six procedures that convert tests of spread into tests of 
location are reviewed and explored in a Monte Carlo study of how to test 
variances in a factorial design. The statistic / s? is shown to Бе aeslight im- x 


provement over Overall and Woodward's procedure. The dependence of these 


two tests on the normality condition is illustrated. Four robust alternatives of 
somewhat lower power are contrasted. The jackknife test is the most powerful 
and is only slightly sensitive to leptokurtosis if the ms are equal. The Brown- 
Forsythe median test is acceptable but it uses average deviations rather than 
variances. The Box-Scheffé test is always robust. No single test is ideal. A 


two-stage process is recommended. 


Behavioral investigators often attend only 
to central tendency, even when interesting 
trends in variability exist in the data. In 
education, it is desirable to reduce the variance 
when working with hierarchial tasks. Students 
will show little divergence on entry behaviors 
for the next task in the hierarchy, if we have 
been able to keep the variance small in the 
first task. Skinner (1958) suggested reduced 
variances as a desirable consequence from 
programmed learning, and Block (1973) 
recognized in his discussions of mastery 
learning that this outcome was desirable. 
Unfortunately, most investigators have failed 
to attend to variances except as a “nasty” 
assumption with respect to the analysis of 
variance (anova) on means. Two exceptions to 
this trend are Birch and Lefford (1967) and 
Johnson and Baker (1973). 

This neglect of variances is probably related 
to the fact that typical statistics texts teach 
only the classical omnibus tests on homogeneity 
of variance that (a) ignore the logical structure 
of factorial designs and (b) are extremely 
sensitive (nonrobust) to violations of the 
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normality assumption. The present article 
presents six tests that can be used in factorial “> 
designs of independent groups in a fashion 
similar to that of the familiar tests on means. 
Three of the six tests prove to be robust. 


Tests Requiring Normality 


Overall and Woodward (1974) proposed the 
Z-variance test, a clever extension and applica- 
tion of the Fisher and Yates (1963) z-score 
transformation for chi-square statistics. There ^ 
are problems of clarity in the formulation of 
the statistic and in the article, however. 
Given К samples of sizes s, from independent 
populations, Overall and Woodward define 


Z= (a Үс.(пь — 1) – 1, (1) 


where сь = 2+ 1/n, and MS, is mean square 
within cells. Overall and Woodward then define 


$ 


к 
Fie) = У Z2/(K — 1). (2) 


They note that (К— Ек) = Хк 3; and 
refer to Marascuilo's (1966) statistic U^ 
—Z[(0, — 6.)?/var(6,)] that is distributed as 
X?(x-1). 6; is a normally distributed unbiased 
estimate of the parameter 6,, 0, is the common 
value of 6, under the hypothesis of equality, 
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: де б, is an estimate of 6,. Assuming уаг(2,) 

1 and. 6, = 0, Overall and Woodward 
` ‘derive Equation 2. However, they later 
recommend, “The z-transformed sample var- 
lances can be analyzed by anova for factorial 
: design with one observation per cell" (1974, 
`p. 313). The standard anova will compute 
deviations squared from the grand mean, in 
this case, Z. This procedure is used in their 
illustrative example, hence they have used a 
different ‘procedure than in their definition in 
‘A Equation 2. This problem becomes particularly 
severe in unequal » factorial designs, in which 
Шеге are different ways that the row and 
column marginal means and the grand mean 
can be defined. Their procedure is analogous 
to the unweighted means modification of 
ANOVA. Carlson and Timm (1974) and Apple- 
baum and Cramer (1974) illustrate the 
€omplexities of the more desirable least squares 
analysis of nonorthogonal designs. To avoid 
these complexities, in the present article we 
stick with the equal » case. 

Bartlett and Kendall (1946) proposed an 
alternative transformation and illustrated its 
usage in a two-factor design..They propose 
using v. = /n s,? as the transformed value to 
compute on each cell and to enter into an 
Awova. (Either In s? = log,s* or 108 105° could be 
used,'since one is a linear transform of the 
kother. The /n s? usage simplifies later results.) 

Overall and Woodward (1974) discard this 
statistic because Bartlett and Kendall “ them- 
selves conclude that the anova of log variances 
is inferior to the more frequently used Bart- 
lett’s test (1937) in situations where the two 
can be compared" (p. 311). However, the 
inferiority of v was demonstrated when equal 
ns below 12 were used, values of n that were 
smaller than those used by Overall and 
Woodward or Levy (1975) in their Monte 
Carlo studies of Z,. On such small samples, we 
‘also can expect the Z-variance test to be 
inferior to the traditional one-factor Bartlett 
test. Unfortunately, neither the Overall and 
Woodward nor the Levy studies contrasted 
the two, 

On any given set of data, MS, will be a 
constant, so the definition can be rewritten as 


СИА sr =1 


= avs? — b. 


(3) 
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Thus Z; is a linear transform of s+, the square 
root of 5,7. The square root transformation is 
appropriate for stabilizing the variance of the 
Statistic, if the mean of the statistic is propor- 
tional to the variance of the statistic; but 
the In transformation is appropriate if the 
mean is proportional to the standard deviation 
(Bartlett, 1947). The latter is the case with s?, 
E(s*) = о°, ande; = o°[ (2/n — 1) + (ys/n) ] 
where y» is the kurtosis index of the population. 
Thus, for the equal n case, use of v, = In sj? 
will stabilize the variance of the ANOVA 
entries better than use of Z,. When H, is 
true, and the 52 statistics are reasonably 
closely clustered, the results of the two 
transformations will be similar, but as the 
sę statistics become more divergent, the 
difference increases. Thus the effect on power 
should be greater than the effect on the 
familywise risk of Type I errors (FWI; Games, 
1971). 

The disadvantage of both above formula- 
tions is that they are exceedingly sensitive to 
violations of the normality assumption, as are 
all other classical tests on variances (Box, 
1953). Scheffé (1959) shows that E(v) ~ In а", 
and var(v) ~ (2/(n — 1)) + (y2/n). If a good 
estimate of the common уз is available, it 
could be used. Box and Anderson (1955) 
build robust tests for variances by use of an 
estimate of уг. Such estimates require large 
ns for stability, however, and add great 
complexity to the computation, while the 
statistics that Box and Anderson proposed do 
not readily extend to multifactor cases. 

The Scheffé (1959) formulation makes it 
obvious why the v, test is so sensitive to 
nonnormality. By assuming normality, уз is 
set to 0 to yield М5, = 2/(n — 1). If the 
populations are platykurtic, then у, < 0, and 
this theoretical MS, is too large, which 
results in a conservative test. With leptokurtic 
populations, уз > 0, so use of the theoretical 
MS, results in an excess of Type I errors. 
Games, Winkler, and Probert (1972) showed 
that only the latter condition is a problem. 
Platykurtic populations yield greater power 
on the classical tests than do normal popula- 
tions, despite the conservatism at the null 
hypothesis point. Why object to a reduced 
risk of Type I error if there is no corresponding 
power loss? 
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Tests Robust to Nonnormality 


Both Box (1953) and Scheffé (1959) suggest 
breaking each cell into random subsamples 
and computing ins? on each subsample. If 
each cell has n cases, then J subsamples of m 
cases each (n = mI when possible) are deter- 
mined, and values of v;j; = In sig, i = 1, ...1, 
are computed in each cell. These values are 
used as input to an ANOVA with » observations 
per cell rather than with one observation per 
cell as in the prior tests. Now MS, may be 
computed and is an unbiased estimate of 
var(v), whether Y is normally distributed or 
not. This technique has been labeled the 
Scheffé test (Winer, 1971), the Box-Scheffé 
test (Levy, 1975), and the Bartlett and 
Kendall test (Games et al., 1972; Gartside, 
1972). We use the Box-Scheffé label here to 
avoid confusion with the prior Bartlett and 
Kendall suggestion of using a single value of 
In s* per cell. 

Games et al. (1972), Games (1975), and 
Levy (1975) pointed out that the Box-Scheffé 
test can be used on multifactor designs and is 
robust to violations of the normality assump- 
tion. Levy compared the power of the Overall 
and Woodward (1974) Z-variance test and the 
Box-Scheffé test for single-factor designs with 
equal n and К = 3, when Y is normally 
distributed. Levy concluded, “For all sample 
sizes, one can plainly see that the Z-variance 
test is vastly superior to the Box-Scheffé 
procedure with respect to power” (p. 521). 
Levy’s conclusion is due, however, to an 
exceptionally poor choice of subsample size 
for the computation of the Box-Scheffé test. 
Levy used m = 2 for ease of computation in 
his Monte Carlo study. However, Gartside 
(1972) and Games et al. demonstrated that 
the use of subsamples of only two cases 
produces power far lower than use of inter- 
mediate subsample sizes. Martin and Games 
(Note 1) further investigated the desirable 
subsample size and concluded that the use of 
m == (n)! (or the nearest whole divisor of n, 
if any) results in optimum power in the Box- 
Scheffé test. Games et al. report a procedure 
for a rough estimation of the power of the 
Box-Scheffé test. When appropriate values of 
m = 3 for n = 12, m = 5 for п = 26, and 
m = 6 for n = 40 are used, the estimated 
powers are uniformly higher than the powers 
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that Levy reports. In the trade-off between 
robustness and power, the cost of the Box& 
Scheffé, properly used, is far less than implied 
by Levy's results or conclusions. 

The major disadvantage of the Box-Scheffé 
test is that the use of random subsamples 
makes it possible for different data analysts 
to obtain different outcomes from the same 
data. This is unlikely with clear-cut data but 
might be a problem with borderline data. 
Scheffé (1970) reports that some users random- 
ized and rerandomized until they obtained the; 
results they wanted on a related test using 
subsample randomization. Brown and For- 
sythe (1974) rejected the Box-Scheffé test out 
of hand for this reason. 

Fortunately, Brown and Forsythe (1974) 
reported a robust alternative test for spread 
that was not subject to the randomization 
problem. In their procedure, transformed obs 
servations are defined as X;;,— | У; —mdnji|, 
where i = 1, ...», and mdnj, is the cell 
median. Then conventional ANOVA procedures 
are applied on the X;;. The Brown and 
Forsythe technique is more closely related 
to the average deviation, defined as ADj, 
= У | У; — mdnj,|/njs, than to the variance. 
As such, it should be less influenced by the 
presence of a few outliers. Unfortunatelf, 
derivations of power curves and other proper- 
ties of the Brown and Forsythe procedure агеф 
mathematically intractable. 

Another alternative suggested by Brown 
and Forsythe (1974) is the use of absolute 
deviations from a trimmed mean, here X m 
= |У — Yi4| is used in a conventional 
ANOVA, where Ў, is the trimmed mean for 
that cell. This is similar to the Levene (1960) 
test recommended by Glass (1966), except 
that Brown and Forsythe use a 10% trimmed 
mean for which the highest 595 and lowest 
5% of the cases are dropped from each sampleg 
when computing F jr. 

Another alternative is the jackknife test 
(Miller, 1968), which also subdivides the data 
into subgroups but has the virtue that all 
users will obtain the same results, since the 
subgroups are exhaustive. The subgroups are 
divided into » subgroups of n — 1 observations, 
that is, one observation is dropped in each 
subgroup. Then pseudovalues Pin = піп sj? 
— (n — 1) In si? are defined, where siii = the 


ES 


s 


| 


m 


—5 


unbiased sample variance with the ith observa- 
Dion dropped. These pseudovalues are entered 
into the two-factor ANOVA as the raw data. 
Prior literature (Brown & Forsythe, 1974; 
Layard, 1973; Miller, 1968; Martin & Games, 
Note 1) suggests, however, that on leptokurtic 
populations, the jackknife test has a Type I 
error slightly in excess of the nominal alpha. 
Values of 895-1095 are encountered when 
а = .05. 
. A virtue of all the present formulations is 
Sputhat they permit multifactor designs, trend 
tests, or the use of multiple comparisons that 
are familiar to most ANOVA users. By trans- 
forming variance problems into problems of 
location, they permit a great increase in the 
type of hypotheses that may be tested (Games, 
1978). 


EI Monte Carlo Study 


4. A computer program was written that 
computed each of the six tests on main and 
interaction. hypotheses on a two-factor in- 
dependent-groups design. The design was 
specified as an independent-groups factorial 
design with four levels of A and three levels 
of B. The Overall and Woodward (OW) test 

formulated using conventional two-factor 

| logic, thus taking 2 deviations from 
the observed grand mean rather than assuming 
— 0, since all the other tests would also be 
using such conventional ANOVA logic. For each 
design, 16 samples were drawn for each cell 
using the pseudorandom number generation 
“shuffle procedure" of Marsaglia, MacLaren, 
and Bray (1964). Each such design was 
replicated 5,000 times, yielding an estimate of 
FWI or empirical power for а = .05 and for 
a= .01. This procedure was repeated using 
a normally distributed population and also 

l using a population of chi-square values with 

| two degrees of freedom (Х?з). The X^; values 

were obtained by adding the squares of two 
independent unit-normal variates. 

When the probability of an FWI was 
assessed, all population variances were set 
equal to one. For the power comparisons, the 
variances were specified as in Table 1. In 

| Table 1, the null is false for both main effects, 

| but the null is true for the interaction of the 
variances. However, due to the use of cur- 
vilinear transformations, the interaction null 


[. 
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Table 1 

Values of the Cell Variances (ay?), Row 
Variances (ox), and Column Variances (о?) 
for a Two-Factor Independent-Groups Design 


Column factor B, 


Row factor 
j Bı B: Bs сд? 
А; 10 8 6 8 
А, 9 7 5 7 
As 7 5 3 $ 
А, б 4 2 4 
p 8 6 4 


is false for all but the Brown and Forsythe 
(BF) tests. This is an intrinsic consequence of 
curvilinear transformations: If additivity is 
present in the original data, it usually will not 
be present in the transformed data. Thus the 
interactions will be excluded when comparing 
the six tests on power. 

The values obtained for the .05 level of 
significance and the .01 level of significance 
showed comparable results, thus only the .05 
results are presented in Table 2. 


Familywise Risk of Type I Error 


The FWI values for the .05 alpha are shown 
in Table 2. With normally distributed popula- 
tions, all of the tests show values of FWI 
that are reasonably close to alpha. The 
Bartlett and Kendall (BK) and Brown and 
Forsythe absolute deviation from the trimmed 
mean (ВЕм) yield FWIs slightly larger than 
alpha, with mean FWI values of .064 and 
.057, respectively. The jackknife (JK) and 
Brown and Forsythe absolute deviation from 
the median (BF nan) are slightly conservative 
with mean FWI values of .042 and .036, 
respectively. 

Under the X? distribution, the BK and OW 
tests show the expected extreme FWI values 
over .05. The BFy test also shows an inflated 
FWI approximately three times o, whereas 
the JK again shows a slight inflation to an 
FWI ~ .089. Only the Box-Scheffé (BS) and 
BF nan tests show excellent control of FWI 
when the populations are leptokurtic and 
skewed. 


Power 


Since all the tests show reasonable control 
of FWI when the populations are normally 
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Table 2 


Empirical Type I Error Probabilities for a Two- Factor I ndependent-Groups Design 
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а 


Distribution 
Normal, N (0, 1) Skewed leptokurtic, x*» 
Sample А = 7 
size BK OW ЈК ВЕм BFman BS BK OW ЈК ВЕм BFman BS 
туь = 16 А 
А .066* .052 .042* .056 .032* .047 .499* .474* .089* .145* .044 .056 
В .062* .053  .041* .056 .039* .047 .403* .383* .085* .122* .046 .051 
AB .071* .046 .036* .059* .031* .047 .699* .664* .102* .193* .048 
те = 25 
А .055 2049  .044  .057* .036* .046 .507* 498" .086* .149* .048 .053 
B .060* .053  .044  .054  .038* .047 .404* .394* .077* .130* .043 .047 
AB 0.69* .057* .047  .059* .039* .049 .131* .710* .095* .208* .045 .051 
Mean familywise risk of Type 1 error 
.064 .052  .042  .057  .036  .047 .540 520  .089  .158 .046 .052 


Note. BK = Bartlett and Kendall; BS = Box-Scheffé; BF = Brown and Forsythe absolute deviation, 


from the trimmed mean; BF man = Brown and Forsythe absolute deviation from the median; ЈК = Jack- 
knife; OW = Overall and Woodward. When т; = 16, the subsample size for the BS test was 4, and the 


ВЕм test used а 12.5% trimmed mean. When плу = 


25, the subsample size for the BS test was 5, and the 


ВЕм test used an 8.0% trimmed mean. For all probabilities, a = .05. 
* p < .05 for deviation from the expected familywise risk of Type I error. 


distributed, all can be compared for power. 
As expected, the two tests based on classical 
theory, the BK and OW, are more powerful 
than any of the more robust tests. As predicted 
the BK is always more powerful than the OW. 
Of the two tests that always provide good 
control of FWI, the BF man is always more 
powerful than the BS. 

Under the X°, populations, only tests that 
provide reasonable control of FWI are in- 
cluded. Although the JK is of borderline status 
in that FWI was slightly inflated, which 
inflates power also, it was included in Table 3 
for comparison to the BS and BF man. The 
power differences between the three tests are 
relatively small, with the largest difference 
only .065, 


Discussion 


If the experimenter is confident that the 
underlying data are normally distributed, any 
of the six tests covered could be considered 
adequate in terms of practical control of FWI. 
The choice between tests would then be based 
on power. The tests (under the Normal 
distribution) are listed from the most powerful 


to the least powerful (from left to right, 
respectively) in Tables 2 and 3. There is a 
gradual reduction as you go from left to right 
in each row, though many adjacent differences 
are certainly not significant. Thus, if the 
experimenter is confident of normality, the 
BK test is best, although the difference 
between the BK and the OW test is not large. 

However, the experimenter often has little 
grounds for confidence in the shape of the 
underlying distributions, particularly with 
small to medium ns. If the populations are 
leptokurtic, the BK and OW tests can have 
absurdly large FWI values that far exceed 
the nominal alpha. Similarly, this study has 
confirmed Brown and Forsythe’s (1974) finding 
that their BD, test is not robust when skewed 
populations are encountered. Only three tests 
remain relatively robust under all forms of 
populations studied to date. Of these, the JK 
is the most powerful, but it is accompanied by 
a slight rise in FWI under leptokurtosis, 
sometimes reaching an FWI as large as two 
alpha. The BFman test is next in power but 
has the disadvantage that it is a test of average 
deviations rather than variances, and Fellers 
(Note 2) suggests that it is erratic for ns aS 
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Table 3 
mpirical Power Probabilities for a Two-Factor Independent-Groups Design. 
Distribution 
Normal, М (0, 1) Skewed leptokurtic, x?» 
Sample ZB BEY VHC 
size BK OW JK BF». ВЕ,„аһ BS JK BF nan BS 
туь = 16 
.672 .590 .546 .538 441 378 .281 237 .216 
B 155 .686 .656 .642 .562 .483 .307 .291 .259 
уь = 25 
А „874 .818 .817 .156 .694 .649 .366 .368 330 
B 938 898 910 852 817 .756 429 .465 418 


‘small as 5. The BS test deals more directly 
jk with variances as such and always has an 
FWI z o, but it is lowest in power and has 
the disadvantage that different random sub- 
samples might yield different outcomes. 

Thus there is no one test that can be 
universally recommended without qualifica- 
tion. O'Brien (1978, in press) gives reasons 
why the power relations of the several tests 
Vary with the kurtosis of the populations. 
O'Brien recommends the BF man test and the 
use of the JK on s? directly (rather than on 
In s? as in the present study). The authors note 
that platykurtosis is rare in the behavioral 
data we have seen, since skewed and/or 
leptokurtic data is more common. If the 
experimenter is reasonably confident that 
the data are not platykurtic, the minimal 
computations needed for the BK test are a 
reasonable first step. 

If this test is not significant; the null 
hypothesis is retained. However, if the BK 
is significant, there is the possibility that the 
result is a Type I error due to leptokurtosis in 
the population. Thus it is desirable to also 
reject the null hypothesis by one of the robust 
tests before making strong interpretations 
about heterogeneous spread. This two-step 
process is not ideal, but neither are any of the 
tests investigated to date. 


| 1 А FORTRAN computer program for the JK, BFman, 
or BS tests may be obtained by sending a computer 
| tape to the first author. 


Note. KB = Bartlett and Kendall; BS = Box-Scheffé; ВЕм = Brown and Forsythe absolute deviation 
from the trimmed mean; ВЕ „аһ = Brown and Forsythe absolute deviation from the median; JK = jack- 
knife; OW = Overall and Woodward. When nj = 16, the subsample size for the BS test was 4, and the 
ВЕм test used a 12.5% trimmed mean. When nj, = 25, the subsample size for the BS test was 5, and the 
ВЕ м test used an 8.0% trimmed mean. For all probabilities, а = .05. 
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The Role of Fear in Theories of Avoidance 
Learning, Flooding, and Extinction 


Susan Mineka 
University of Wisconsin—Madison 


The course of fear conditioning and extinction in the avoidance-learning con- 
text is complex. This article summarizes the major lines of evidence that dem- 
onstrate a dissociation or desynchrony between measures of fear and avoidance 
responding. The evidence bearing on the role of fear in theories of avoidance 
learning and extinction is reviewed and critically evaluated. In addition, re- 
search is discussed regarding the determinants of fear over the course of avoid- 
ance acquisition, flooding, and extinction. Particular emphasis is placed on 
discussing the extent to which fear extinction is necessary and/or sufficient 
for avoidance response extinction both with conventional extinction procedures 
and with response prevention techniques. 
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| Та animals and humans fear has long been 
"assumed to play an important role in the 
mediation of avoidance behaviors that in 
turn have often been assumed to underlie 
a variety of neurotic behaviors. The avoid- 
ance behavior that frequently accompanies 
a state of fear has actually been considered 
by some theorists to be one of the response 
systems inherent in our definition of fear 
^«itself. Lang (1968, 1971), for example, has 
argued that fear is a complex construct that 
" in humans includes at least three different 
" response systems—verbal/cognitive (subjec- 
tive), motor (behavioral avoidance), and 
psychophysiological. These three response 
systems do not always covary together, and 
treatments designed to reduce so-called fear 
may, at least initially, affect one system but 
not the others. (See also Hodgson & Rach- 
'man, 1974; Rachman & Hodgson, 1974.) 
Other theorists, whose primary interest has 
been avoidance behavior in infrahuman or- 
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ganisms, have attempted to understand the 
role that fear plays in mediating the acquisi- 
tion, maintenance, and extinction of learned 
avoidance responses. Over the past 15 years 
or so, it has become increasingly apparent 
that the role of fear in mediating any of 
these facets of learned avoidance behavior 
is at best not a simple one. In particular, 
there is often a marked dissociation between 
fear and avoidance responding (Riccio & Sil- 
vestri, 1973) that has led a number of the- 
orists to question whether fear plays any 
role at all in mediating avoidance responding 
(e.g., Herrnstein, 1969; Hineline, 1977). 
This article has three goals. First, after a 
discussion of the measurement of fear in 
animals, there is a brief review of the evi- 
dence on dissociation or desynchrony be- 
tween fear and learned avoidance behavior. 
Given this background, the second goal is to 
evaluate comprehensively and critically the 
role that fear plays in various theories of 
avoidance acquisition, maintenance, and ex- 
tinction, Because the majority of the work in 
this area has centered on the question of 
the role that fear extinction plays in mediat- 
ing avoidance response extinction through 
response prevention or flooding techniques, 
particular emphasis is placed on work in 
that area. The third goal of this article is to 
review, where pertinent evidence is avail- 
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able, what determines the course of fear 
acquisition and extinction in avoidance-learn- 
ing contexts. In particular, attention is fo- 
cused on how the avoidance context influ- 
ences the dynamics of fear conditioning and 
extinction. 


Measurement of Fear 


Conditioned fear emerges as the result of 
pairing a neutral stimulus with an aversive 
or noxious unconditioned stimulus. Over the 
past 50 years, numerous response systems 
have been shown to be sensitive to the effects 
of aversive conditioning procedures: defeca- 
tion, heart rate, suppression of ongoing con- 
summatory or operant appetitive behavior, 
learning a response to escape from a fearful 
stimulus, passive avoidance of a fearful stim- 
ulus, facilitation of ongoing operant avoid- 
ance behavior, and so forth. Each one of 
these indices of conditioned fear has been 
more or less extensively validated by deter- 
mining whether the degree of fear as mea- 
sured by that index is sensitive to condition- 
ing parameters that would be expected on 
some a priori basis to affect the level of fear 
that is conditioned (e.g., intensity of the 
aversive stimulus, number of conditioned 
stimulus – unconditioned stimulus [CS-US] 
pairings, etc.). These validation procedures 
have, of course, each assumed that there is 
some degree of correlation between the mag- 
nitude of the observable response and the 
internal state of fear itself (McAllister & 
McAllister, 1971). 

Experiments in the area of fear and avoid- 
ance have tended to use one of four of the 
indices of conditioned fear mentioned previ- 
ously. Probably the most widely used index 
of the results of an aversive conditioning 
procedure has been the conditioned emo- 
tional response ( CER) index, first developed 
by Estes and Skinner (1941). They observed 
that hungry rats trained to barpress for food 
reinforcement decreased their rate of respond- 
ing when a warning signal was presented for 
an impending aversive event (usually electric 
shock). Estes and Skinner labeled this pat- 
tern of suppression of ongoing operant ap- 
petitive behavior "conditioned anxiety," and 

over the past 35 years, this phenomenon has 
been shown in numerous studies to be a 
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reliable and sensitive index of aversive con- , 


ditioning in a variety of different species! 


(Davis, 1968). The term "conditioned emo- 
tional response" was first used by Hunt and 
Brady (1951) to describe this phenomenon, 
and it is now the term most widely used to 
describe this index of conditioned fear or 
anxiety, two terms that are most often used 
interchangeably by experimental psycholo- 
gists. Although others have used the term 
CER in a more general way to describe the 


state that has presumably been conditioned x 


in an aversive conditioning procedure, no 
matter how it is being measured (e.g., heart 
rate, suppression of operant appetitive be- 
havior, facilitation of avoidance behavior, 
etc.), in this article the term will be used 
only to refer to this one particular index of 
conditioned fear, that is, suppression of op- 
erant appetitive behavior. It should be em- " 
phasized that in spite of the wide use of the 
CER as an index of fear or aversive condi- 
tioning, there is as yet no consensus as to 
why positively reinforced behavior is sup- 
pressed during a stimulus paired with an 
aversive stimulus such as shock. (See Black- 
man, 1977, for one current discussion of this 
topic.) 3 
A second and closely related index of 
aversive conditioning used in some of the 
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more recent experiments described later is i} 


that developed by Sidman, Herrnstein, and 
Conrad ( 1957), Herrnstein and Sidman 
(1958), and Rescorla and LoLordo (1965). 
These investigators found that when stimuli 
paired with shock were presented to monkeys 
or dogs who were responding on a Sidman 
(unsignaled) avoidance baseline, the rate оѓ? 
avoidance responding increased dramatically 
for the duration of the warning stimulus. 
This facilitation or energization of an operant 


maintained by negative reinforcement has АА 


subsequently been used extensively as an in- 
dex of conditioned excitatory and inhibitory 
States based on aversive reinforcers (e.g., 
Rescorla & Solomon, 1967; Weisman & Lit-* 
ner, 1969, 1972). Scobie (1972) and Morris 
(1974) have even Suggested that it may be 
à more sensitive index of fear or aversive 
conditioning than the CER because it can 
Sometimes detect evidence of conditioning 
when the CER does not, 
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It should be noted here that the same 
warning stimulus can be used to produce 
either suppression or facilitation of ongoing 
operant behavior, depending on whether that 
operant is maintained by positive or nega- 
tive reinforcement. For example, Sidman 
(1958) reported that monkeys that were 
maintained on a concurrent chain-pulling re- 
sponse for food reinforcement and lever- 
press response for shock avoidance showed 
suppression of the chain-pull response and 


"yw facilitation of the lever-press response dur- 


ing a warning stimulus for shock. Such re- 
sults present one of the strongest lines of 
evidence that there is indeed some central 
mediating state that is being conditioned in 
an aversive conditioning procedure. Out of 
convenience, many experimental psycholo- 
gists have chosen to use the word fear to de- 
" scribe that state, which may manifest itself 
through a facilitation or suppression of on- 
going behavior (Rescorla & Solomon, 1967). 
In addition, such results underscore the point 
that it cannot be suppression or facilitation 
per se that is being conditioned in such ex- 
periments; rather, it seems that some cen- 
tral state is being conditioned, the motivat- 


ах properties of which manifest themselves 


differently according to which other motivat- 
ing state is maintaining the operant be- 
havior. 

A third index of fear used in some of the 
experiments reported later is the passive 
avoidance of a place in which aversive con- 
ditioning trials have previously occurred. 
When given a choice, animals will tend to 
avoid any such stimulus or place previously 

*paired with an aversive stimulus. To maxi- 
mize the incentive to approach a fearful 
place, animals are often food deprived prior 
to the test, and their latency to enter and 
eat in the fearful place is measured. The 
fourth and final index of fear used by some 
investigators in this area is the conditioned 
heart rate response. There appears to be 

~ some variability between species in the na- 
ture of this response, since rats tend to show 
heart rate decreases, whereas dogs and mon- 
keys tend to show heart rate increases 
(Brady & Harris, 1977). 

Early observations that there is often a 
lack of concomitance among these different 
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results of an aversive conditioning procedure 
date back at least as far as Gantt’s (1937, 
1953) results which indicated that different 
components of a response conditioned with 
shock as the US develop at different rates 
and persist for different periods of time, and 
the result is a disharmony or cleavage in be- 
havioral, somatic, and psychophysiological re- 
sponse systems. Gantt used the term schizo- 
kinesis to describe this phenomenon that he 
studied most extensively in dogs. More re- 
cently, similar observations have been made 
in several other species. For example, De- 
Toledo and Black (1966) and Brady, Kelly, 
and Plumlee (1969) have reported that in 
an aversive conditioning procedure, rats and 
monkeys show more rapid acquisition of 
suppression of operant appetitive behavior 
(CER) than of a heart rate conditioned re- 
sponse. Together, such results indicate that 
the search for any one uniquely valid index 
of fear or aversive conditioning in animals 
will be futile, just as recent evidence from 
humans has suggested it should be (e.g., 
Lang, 1968). 

This brief review of the measurement of 
fear in animals certainly indicates that the 
task of those interested in the relationship 
between fear and avoidance responding is not 
an easy one. Given the wide range of mea- 
sures that may be sensitive to the effects of 
an aversive conditioning procedure, theorists 
interested in this relationship should ideally 
have at hand data that uses multiple-re- 
sponse measurements, Certainly, results that 
emerge using one index of fear cannot be 
assumed to apply if a different index of fear 
is being used. Unfortunately, however, most 
investigators in this area have used only one 
index response at a time in their attempts 
to understand the role of fear in avoidance 
behavior. Nevertheless, numerous interesting 
results have emerged over the past 20 years 
that do help us understand the extent to 
which fear? does or does not play a role 


in avoidance-learning contexts. More gen- 
erally, we will see that even when the role 


1The term fear will be used in the present article 
to refer to any of the four patterns of results of 
aversive conditioning procedures that have been 
described previously. 
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of fear in mediating avoidance responding 
is in doubt, the question of what happens 
to fear over the course of avoidance acquisi- 
tion, maintenance, and extinction is still an 
interesting and important one. All the condi- 
tions necessary for the acquisition and ex- 
tinction of fear are automatically present in 
the avoidance-learning context, and yet the 
CS-US contingency in this aversive condi- 
tioning context is unusual, that is, it is a 
partial reinforcement schedule with nonrein- 
forced CS events whose abbreviated and 
varying durations are determined by the 
avoidance latency of the subject. As is dem- 
onstrated, the dynamics of fear condition- 
ing and extinction in this unusual context 
are more complex than those traditionally 
seen in more straightforward Pavlovian fear 
conditioning paradigms, 


Dissociation Between Fear and Learned 
Avoidance Behavior 


Interest in the association between fear 
and avoidance responding probably stems 
from the early appeal of Mowrer's (1947) 
two-process theory that ascribes a major role 
to fear as a motivational state in the acquisi- 
tion of learned avoidance responses, Accord- 
ing to this theory, fear, which is conditioned 
on early trials of a signalized avoidance pro- 
cedure, serves to motivate the learning of a 
response that serves to reduce the fear. Re- 
inforcement for avoidance responding comes 
from termination of the fear-evoking CS. In 
this theory fear is necessary to motivate the 
response; once fear has disappeared or ex- 
tinguished, the avoidance operant should 
also extinguish. And given a traditional ex- 
tinction procedure in which shock is no 
longer presented, fear extinction should be 
both necessary and sufficient for avoidance 
response extinction. Other two-process the- 
orists such as Schoenfeld (1950), Sidman 
(1953), and Dinsmoor (1954) have not in- 
voked the motivational concept of fear in 
their theories but rather have maintained 
that stimuli paired with shock early in avoid- 
ance become noxious or aversive; conse- 
quently, their removal is reinforcing and the 
avoidance operant is learned, Although these 
theorists do not use the motivational con- 
cept of fear, the aversive or noxious quality 
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of the stimuli whose removal provides rein- 
forcement for the avoidance response should 
extinguish according to the laws of classical 
conditioning. Hence the same basic predic- 
tions follow from these theories as from 
Mowrer’s theory regarding the necessary and 
sufficient conditions for avoidance response 
extinction. 

The relationship between fear and avoid- 
ance responding is considerably more com- 
plex than these early two-process theories 


A 


predicted. First, there is now considerable 2) 


evidence that fear of the CS, as indexed by 
the CER, becomes attenuated over the course 
of avoidance learning. Kamin, Brimer, and 
Black (1963) and Starr and Mineka (1977) 
found that rats trained to a criterion of 27 
consecutive avoidance responses (CARs) 
showed less suppression during the CS than 
those trained to a criterion of only 9 CARs. 
Mineka and Gino (Note 1) have further 
shown that this attenuation of the CER does 
not occur because the avoidance response it- 
self is about to extinguish, Animals trained 
to a criterion of 27 CARs show approxi- 
mately the same resistance to extinction as 
do animals trained to only 9 CARs. So fear 
of the CS, as indexed by the CER, is 
clearly not monotonically associated with the 
strength of an avoidance response, although 
it should be noted that no one has yet dem- 
onstrated avoidance responding in the com- 
plete absence of fear of the CS. Using a 
different index of fear or stress, Brady 
(1965) also reported a dissociation between 
avoidance performance and a physiological 
correlate of the CER—increases in plasma 
17-hydroxycorticosteroid (17-OH-CS) levels. 
Brady's monkeys showed progressive eleva- 
tion of 17-OH-CS levels over the first 72 
hours of avoidance training, which was then 
followed by declining levels of 17-OH-CS 
Over the succeeding weeks of avoidance 
training (Brady & Harris, 1977). Similarly, 
Coover, Ursin, and Levine (1973) reported 


that plasma-corticosterone levels in rats were ~ 


considerably elevated following early avoid- 
ance training sessions but that after many 
(17) training Sessions, when avoidance per- 
formance was asymptotic, plasma-corticoster- 


one levels only showed small increases over 
basal levels, 
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In a somewhat different vein, Rachlin and 
J^ Herrnstein (1969) have reported a dissocia- 
tion between the suppressive effects of an 
aversive shock and its capacity to sustain 
avoidance. In their second experiment, they 
found that pigeons who showed little sup- 
pression of keypecking with noncontingent 
shock (CER procedure) did show pro- 
nounced negative choice (avoidance) for that 
component of the schedule. At a minimum, 
these results suggest that the rate of re- 
194 sponding (essentially the CER index of 
fear) during a stimulus signaling noncon- 
tingent shock is not well correlated with the 
capacity of that stimulus to sustain avoid- 
ance. Such results certainly present a di- 
lemma to theorists who maintain that there 
is any kind of simple relationship between 
the capacity of a stimulus to sustain avoid- 
“ance and its capacity to suppress operant 
д appetitive responding. 

A second prediction of two-process theory 
regarding fear and the extinction of avoid- 
ance responding has -also not been substan- 
tiated. In particular, fear extinction does not 
appear to be necessary for the extinction of 
the avoidance operant, Although Black 
(1959) did find that the heart rate condi- 
tioned response (CR) extinguished consid- 
erably more rapidly than the avoidance re- 

è sponse in dogs, he found no significant cor- 
relation between the speed of extinction of 
the cardiac CR and the avoidance response. 
Using the CER rather than heart rate as an 
index of fear, Kamin et al. (1963), by con- 
trast, found evidence of avoidance response 
extinction in the absence of much fear ex- 
“tinction. Rats that were extinguished to a 
moderate criterion (5 consecutive failures to 
respond) were still quite fearful of the CS 

as indexed by the CER, and animals that 
sf% were extinguished to a more stringent cri- 
terion (20 consecutive failures to respond) 
still showed nonzero levels of fear. Other 
d investigators who used flooding or response 
= prevention techniques to hasten rapid ex- 
tinction of an avoidance response have 
reached similar conclusions regarding the 
lack of necessity of fear extinction in pro- 
ducing avoidance extinction. Coulter, Riccio, 
and Page (1969), for example, have shown 
that animals whose avoidance response has 
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been extinguished following flooding аге more 
fearful than animals extinguished with a 
conventional extinction procedure. In addi- 
tion, Mineka and Gino (1979a) have shown 
that an amount of response prevention suf- 
ficient to hasten avoidance response extinc- 
tion does not reduce fear of the CS. So 
with flooding, as with conventional extinction 
procedures, fear extinction is not necessary 
for extinction of the avoidance operant. 

This brief summary highlights the evidence 
showing dissociation between fear and 
learned avoidance behavior (see also Hodg- 
son & Rachman, 1974; Rachman, 1976; 
Rachman & Hodgson, 1974; Riccio & Sil- 
vestri, 1973). Considerably more discussion 
and elaboration of this evidence, as well as 
of the determinants of this dissociation, is 
made in the remainder of the article. We 
now turn to a discussion of the role of fear 
in various theories of avoidance acquisition 
and maintenance. Attention is focused on 
how some of these theories have evolved to 
handle this evidence on dissociation between 
fear and avoidance. 


Fear in Avoidance Acquisition and 
Maintenance 


The Role of Fear in Various Theories of 
Avoidance Learning 


Of all the theories discussed here, Mow- 
rer’s (1947) two-proces theory clearly 
ascribes the most important role to fear at 
various stages of avoidance acquisition, 
maintenance, and extinction. This theory 
may, in fact, provide a plausible account for 
how an avoidance response is initially 
learned, but it is the phenomenon of the 
persistence of learned avoidance responses 
that has most intrigued and plagued learn- 
ing theorists for the past 25 years. And so, 
although two-process theory in its various 
forms has dominated both the theorizing and 
the experiments done in the field of avoid- 
ance learning, its most serious shortcomings 
have consistently been in its inability to ex- 
plain satisfactorily the high resistance to ex- 
tinction of well-learned avoidance responses 
(e.g Solomon, Kamin, & Wynne, 1953). 
The dilemma for two-process theory is that 
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after dozens or hundreds of consecutive 
avoidance responses, the source of reinforce- 
ment for responding is no longer apparent 
because each successful avoidance trial con- 
stitutes a Pavlovian extinction trial. Hence 
the fear CR should gradually extinguish, thus 
removing CS termination as a possible source 
of reinforcement. After that, the avoidance 
operant should proceed to extinguish. But 
as we have already seen, even when the fear 
CR does become attenuated, the avoidance 
operant does not immediately begin to ex- 
tinguish (Kamin et al., 1963; Starr & Min- 
eka, 1977; Mineka & Gino, Note 1). 

Solomon and Wynne (1954) attempted to 

rescue two-process theory from this dilemma 
by adding to it the two principles of anxiety 
conservation and the partial irreversibility 
of the conditioned fear response learned 
with traumatic shock. These two principles 
have not proved very useful, however, in 
explaining the results of experiments demon- 
strating high resistance to extinction of 
avoidance responses learned with only mod- 
erate levels of shock (Brush, 1957) and con- 
versely the relatively rapid extinction of the 
CER, even with traumatic shock (Annau & 
Kamin, 1961). Rescorla and Solomon (1967) 
further revised two-process theory to account 
for the apparent lack of fear in the well- 
trained animal (e.g, Kamin et al, 1963). 
They postulated that fear is a central state 
and therefore that lack of concomitance be- 
tween peripheral measures of fear and avoid- 
ance responding does not constitute evidence 
against two-process theory. This central state 
of fear, however, is subject to the normal 
laws of Pavlovian conditioning, including 
extinction, Thus it is still unclear why this 
central state of fear does not extinguish in 
the well-trained animal, and if it does, where 
the motivation and reinforcement for con- 
tinued responding come from. 

Konorski (1948) and Soltysik (1963) at- 
tempted to explain why the fear CR should 
not extinguish by postulating that it is pro- 
tected from extinction by the avoidance 
response that becomes a CS— for shock (in- 
hibitory CS predicting no shock). One prob- 
lem with this explanation is that it is based 
on _ protection-from-extinction experiments 
done with appetitive responses; no one has 
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demonstrated protection from extinction 
when the CS— follows the CS+, as in the 
avoidance case (see LoLordo & Rescorla, 
1966; Seligman & Johnston, 1973). Further- 
more, the theory explains why fear should 
persist and provide motivation for respond- 
ing. However, the evidence discussed previ- 
ously (e.g, Kamin et al, 1963; Starr & 
Mineka, 1977) suggests that fear attenuates 
over the course of avoidance training. 

The more recent safety signal or positive 


reinforcement revision of two-process theory -s^ 


also assumes that the avoidance response be- 
comes a CS— for shock (Bolles, 1970; Weis- 
man & Litner, 1969, 1972). However, this 
theory does not require that the CS— protect 
the CS+ from extinction because the CS— 
assumes the role of a positive reinforcer. The 
animal continues to make avoidance re- 
sponses because the response itself (CS—) 
becomes a positive reinforcer, and fear is 
not necessary for continued motivation of 
the avoidance response once the response 
has become a good CS—. The chief problem 
with this explanation is that no one has 
demonstrated that a CS— continues to be a 
positive reinforcer when it is no longer pre- 
sented in a fear-eliciting situation (Grossen, 
1971; LoLordo, 1969; Seligman & Johnston, 
1973). Thus the safety-signal account must 
posit that the CS+ was protected from ex- 
tinction to account for the maintenance of 
the avoidance response as a positive rein- 
forcer (CS—), Until independent evidence 
exists that residual fear of the CS+ or of the 
situational cues remains during asymptotic 
avoidance when the CS— is a positive rein- 
forcer, the safety signal account of the ех- 
treme persistence of avoidance responding 
in extinction is incomplete. 

Other theorists (e.g., D’Amato, 1970; 
Herrnstein, 1969; Hineline, 1977) have 
further de-emphasized the role of fear in 
avoidance learning and have instead empha- 
sized the CS’s role as a discriminative cue, 
which “sets the occasion” for responding. ' 
The response is presumed to be reinforced by 
a reduction in shock frequency rather than 
by reduction of fear. One of these theorists’ 
strongest lines of evidence against the role 
of the CS as the motivational mediator of an 
avoidance response comes from the results of 
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several experiments that attempted to deter- 
ine if an animal could learn to avoid a CS 
or discriminative stimulus (SP) for an avoid- 
ance response. For example, Sidman (1955) 
pretrained cats and rats on an unsignaled 
shock-delay procedure and then introduced 
a 5-sec preshock cue that could be delayed 
or removed. He found that most of the re- 
sponses occurred during the cue and that the 
2595-3096 of the responses that did occur 
., in the absence of the cue mostly occurred in 
"J'«postshock bursts. These results suggest that 
although the removal of a cue can maintain 
responding, its delay apparently cannot. 
From the traditional two-process standpoint, 
one might expect that the animals would 
learn to avoid a conditioned aversive stimu- 
lus if its removal acts as a reinforcer, and 
yet this does not appear to happen. These 
“results and others summarized by Hineline 
) strongly suggest that the CS in a discrimina- 
tive avoidance procedure “cannot be seen as 
simply providing a classically conditioned 
surrogate for the shock, for several effects 
are independent of its relation to shock" 
(p. 396). These theorists do not deny that 
Pavlovian conditioning of fear may go on in 
"avoidance training and may even affect the 
rate of responding (Rachlin & Herrnstein, 
1969, p. 90), but they do strongly assert that 
№. such “classically conditioned responses are 
not a requirement for the instrumental be- 
havior" (Herrnstein, 1969, p. 61). These 
theories are, therefore, relatively uninterested 
in what happens to fear over the course of 
avoidance learning, although they would 
probably maintain that the determinants of 
any fear that does exist should be the result 
of no more than just the Pavlovian contin- 
gencies inherent in the procedure. 
Seligman and Johnston (1973) have re- 
ve cently proposed a cognitive theory of avoid- 
ance learning that ascribes a role to fear 
only in the initial stage of training. In the 
sæ early phase of learning, fear is conditioned 
“to the CS and may be involved in the elicita- 
tion of responses. Gradually, however, two 
expectancies are acquired that serve to main- 
tain the response: an expectancy that a re- 
sponse will be followed by no shock and an 
expectancy that no response will be followed 
by shock. If no shock is assumed to be pre- 
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ferred to shock, these two expectancies are 
sufficient to maintain responding even after 
fear has extinguished. This theory then pre- 
dicts that fear and avoidance behavior will 
not always be well correlated. It further 
assumes that the degree of residual fear at 
any phase will be a function of how many 
Pavlovian fear extinction trials have occurred 
as a result of successful avoidance responses. 


The Determinants of Fear Over the 
Course of Avoidance Learning 


These newer theories of avoidance learning 


were developed partially as a result of the 


inability of two-process theory to explain 
satisfactorily the dissociation or desynchrony 
between fear and avoidance responding. Each 
of these theories postulates some new mecha- 
nism other than fear to explain the persist- 
ence of avoidance responses, for example, the 
discriminative role of the CS in setting the 
occasion for a response that results in a re- 
duction of shock frequency or the role of 
response-outcome expectancies in motivating 
responses that produce preferable outcomes 
(i.e., no shock). An explanation of why there 
is a marked dissociation between fear and 
avoidance behavior has received only cursory 
attention (e.g., Seligman & Johnston’s, 1973, 
assumption that fear extinguishes as a result 
of a simple Pavlovian fear extinction pro- 
cess). However, even if fear does not play a 
central mediating role in avoidance, the 
avoidance training procedure automatically 
programs the necessary and sufficient condi- 
tions for the Pavlovian conditioning of fear. 
Therefore, the question of what happens to 
that fear over the course of avoidance main- 
tenance and extinction remains one of enor- 
mous practical and theoretical importance at 
least for those interested in fear, even if not 
for those interested in avoidance. 

In a preliminary attempt to study the de- 
terminants of fear over the course of avoid- 
ance learning, Starr and Mineka (1977) 
tested the hypothesis that attenuation of fear 
over the course of avoidance learning results 
from Pavlovian fear extinction. In a replica- 
tion and extension of Kamin et al.’s (1963) 
widely cited study that used the CER as an 
index of fear, Starr and Mineka compared 
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fear of the CS in rats trained to 3, 9, or 27 
CARs (avoidance-learning [AL] groups) 
with that of their strictly yoked partners 
(yoked avoidance-learning [YAL] groups), 
who received the same pattern of CS and US 
events but had no avoidance response avail- 
able. The usual attenuation of fear occurred 
in the well-trained animals, (The AL-27 
group showed less suppression than the less 
well-trained animals—the AL-3 and AL-9 
groups.) However, the yoked group (YAL- 
27), which had received the same Pavlovian 
sequence of CSs and USs as the AL-27 group, 
did not show attenuation of fear. These re- 


sults suggest that the response contingency . 


per se contributes to the fear attenuation that 
occurs in well-trained animals. 

That a simple Pavlovian fear extinction 
account does not sufficiently explain attenu- 
ation of fear is further supported by other 
results of Starr and Mineka’s (1977) first 
experiment. Comparisons were made with 
a third set of groups that were yoked only 
for the excitatory Pavlovian trials that oc- 
curred during the course of avoidance train- 
ing. These yoked fear conditioning (YFC) 
groups received the same number and pat- 
tern of CS-US pairings as did their AL 
masters, but they received no nonreinforced 
CSs, None of the ҮЕС groups differed from 
each other or from any of the YAL groups. 
The YFC-27 group did, however, show more 
fear than the AL-27 group. Because the 
YFC-27 and YAL-27 groups did not differ 
in fear, whereas the YFC-27 and AL-27 
groups did, simple Pavlovian fear extinction 
cannot account for the attenuation of fear 
observed in the AL-27 group. The lack of 
difference between the YAL-27 and YFC-27 
groups is particularly striking in light of the 
fact that the YAL-27 (and AL-27) group 
had an average of 6395 nonreinforced CS 
trials, whereas the YFC-27 group had only 
had reinforced CSs, 

To determine which aspect of the response 
contingency accounts for the decline in fear 
observed in the AL-27 group, Starr and 

Mineka (1977) performed a second experi- 
ment to assess the role of feedback from 
avoidance responses in producing this at- 
tenuation. One group of animals was trained 
to a criterion of 36 CARs in a paradigm in 
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which each response was followed by an ex- 
teroceptive feedback signal (AL-36-FS). A" 
second group of animals was strictly yoked | 
to the AL group (YAL-36-FS), that is, they 
did not have an avoidance response available, 
but they did have a feedback stimulus mim- 
icking the response of the AL group. A third 
group was also yoked to the AL-36-FS group 
except that they received no feedback signal 
(YAL-36-NFS). Both groups (AL-36-FS 
and YAL-36-FS) that received feedback dis- 
played less fear as indexed by the CER than" 
did the group that received no feedback 
(YAL-36-NFS), so the response contingency 
per se is not necessary for the attenuation 
of fear. The feedback from the avoidance 
response is sufficient to produce the atten- 
uation. 

The mechanism through which feedback, 
produces this attenuation of the CER is not 
clear. One possibility is that less fear is 
conditioned to the CS when an FS is present 
(either in the form of an avoidance response, 
or an exteroceptive signal). This could occur 
if an FS were to functionally reduce the in- 
tensity of the US by partially inhibiting the 
fear reaction that would otherwise persist for 
several seconds following US termination.? 
A second mechanism that might account for 
the role played by feedback in the attenua- 
tion of fear is that fear may extinguish 
faster when an FS is present, An FS that 
becomes a powerful conditioned inhibitor of 
fear may reduce the overall level of fear 
(e.g., Seligman, 1968). Extinction of fear of 
a CS presented against such a background 
might proceed at a faster rate. Althoug 
there is as yet no direct evidence bearing on 
this possibility, it is contrary to the pre- 
dictions of the Rescorla-Wagner (Rescorla 
& Wagner, 1972) model that assumes stimuli 
compete for inhibitory strength. If powerful 
inhibitory stimuli are already present, this 
should reduce the amount of inhibitory 
strength that can accrue to a CS on any 
extinction trial, so groups with feedback 
should show less extinction of fear of the CS 
than groups without feedback. At present, 
the viability of either of these mechanisms 
for explaining the role of feedback in pro- | 
ducing an attenuation of fear remains to be 
determined. 
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Other investigators have also observed 
attenuation of fear over the course of avoid- 
ance learning (e.g., Linden, 1969; Weisman 
& Litner, 1972), but this is not a universal 
result, Morris (1974) observed no attenua- 
tion of fear in rats trained to 27 CARs when 
he used a transfer of control test on a Sid- 
man avoidance baseline (Rescorla & Lo- 
Lordo, 1965); that is, the 27 group showed 
facilitation of avoidance during the CS equal 
to that of the 9 group. Morris used an FS 
following avoidance responses, and this may 
account for his failure to observe attenuation 
because attenuation may already have oc- 
curred in his AL-9 groups that received an 
exteroceptive FS, or as discussed earlier, 
less fear may be conditioned in the first 
place when an FS is present. Alternatively, 
different results may be obtained in animals 
“tested for fear with the CER as opposed to 
Дх the Sidman procedure. The weight of evi- 
dence indicates that fear does attenuate with 
extended avoidance training, although which 
mechanism produces this attenuation is not 
yet clear. It should again be noted, how- 
ever, that fear has never been demonstrated 
to extinguish completely (e.g; by а CER 
test or with the Sidman procedure) even 
with extensive training. 
That fear diminishes over the course of 
è a run of consecutive avoidance responses 
would not be particularly surprising even 
to a two-process theorist, if the avoidance 
response concurrently became weaker. If 
this were the case, one would not be so in- 
clined to speak of a dissociation between 
fear and avoidance. Recent results of Mineka 
“and Gino (Note 1), however, indicate that 
the avoidance response is not weaker after 
a run of 27 consecutive responses than after 
a run of only 9 responses. Animals trained 
to a criterion of 27 CARs, in a situation 
comparable to that of Starr and Mineka 
(1977), are equally resistant to extinction 
„as animals trained to a criterion of 9 CARs. 
" As indicated by Mackintosh (1974, p. 334), 
this result creates a serious theoretical prob- 
lem for theorists who argue that fear mo- 
tivates the responding of a well-trained ani- 
mal. By contrast, these results do not present 
a problem for other theorists who postulate 
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other sources of motivation or reinforcement 
for avoidance. 


Fear and the Extinction of Avoidance 
Responses 


The Role of Fear Extinction in Traditional 
Avoidance Extinction 


The dissociation between fear and avoid- 
ance behavior discussed so far has been the 
diminution of fear as the avoidance response 
becomes better learned. A two-process the- 
orist might assume that fear would ulti- 
mately diminish to a sufficient extent to 
cause extinction of the avoidance response. 
As demonstrated earlier, the traditional two- 
process theory of avoidance in fact assumes 
that fear extinction precedes and determines 
avoidance response extinction, but this simple 
analysis is not correct: The direction of the 
dissociation between fear and avoidance be- 
havior is not always the same. Rather, as 
extinction of the avoidance response begins, 
fear of the CS remains fairly intense. Kamin 
et al. (1963) reported that animals at a 
moderate extinction criterion (five consecu- 
tive failures to respond) were nearly as fear- 
ful of the CS as indexed by the CER as ani- 
mals that had received no extinction trials. 
Clearly, animals who have reached a mod- 
erate avoidance extinction criterion have not 
done so because their fear has extinguished. 
Animals must begin to stop responding for 
some reason other than that their fear has 
extinguished. As extinction of the response 
proceeds, fear extinction is likely to follow. 
Unfortunately no single experiment has com- 
pared fear attenuation (and extinction) 
across the course of the acquisition of a 
well-trained avoidance response and its ex- 
tinction. It is possible that fear may tem- 
porarily increase when the animal begins to 
cease responding. This possibility is sug- 
gested by a comparison of the results of 
Kamin et al.’s two experiments, which un- 
fortunately are not strictly comparable be- 
cause of differential delay of the time when 
the CER test was made. Kamin et al.’s ani- 
mals who had made five consecutive non- 
responses (failures to respond in the CS-US 
interval) were more afraid of the CS than 


994 


were the animals who had made 27 consecu- 
tive responses. 

This dissociation between fear and avoid- 
ance responding during extinction is nowhere 
more apparent than in the literature on the 
flooding of avoidance responses. Flooding 
or response prevention techniques are effec- 
tive in producing rapid extinction of well- 
learned avoidance responses, even though 
Such responses are resistant to extinction 
with conventional extinction procedures. 
These techniques involve prolonged exposure 
to the CS, either with the response forcibly 
prevented or with CS termination noncon- 
tingent on the response. A variety of theo- 
ries has emerged to explain the efficacy of 
these techniques. As is demonstrated, these 
theories vary in their ability to accommodate 
the evidence on dissociation between fear and 
avoidance responding. No one of these theo- 
ries can adequately account for all of the 
relevant data. 


The Role of Fear Extinction in 
Theories of Flooding 


Two-Process Fear Extinction Theory 


Two-process theory in its various forms 
predicts that response prevention or flooding 
techniques should be effective in hastening 
extinction of avoidance responses, By the 
two-process account, flooding techniques that 
allow extended nonreinforced exposure to 
the CS should assure extinction of the fear 
CR, thus eliminating both the motivation 
and the reinforcement for continued re- 
sponding. 

The evidence relevant to this traditional 
two-process account is mixed. The discussion 
of this evidence is organized around three 
general, sometimes overlapping questions, 
First, to what extent is fear extinction neces- 
sary and/or sufficient for a flooding effect? 
Second, which results of response prevention 
experiments are difficult to accommodate 

within a two-process framework? And third, 
what evidence exists that some process other 
than fear extinction must also be operating 
during flooding? 
Is fear extinction necessary and/or suf- 
ficient? Perhaps the most obvious attempt 
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to demonstrate that fear extinction during 
flooding is not necessary for avoidance ex-\ 
tinction is that of Marrazo, Riccio, and 
Riley (1974). They argue that fear extinc- 
tion cannot account for flooding results be- 
cause a group in their experiment that re- 
ceived reinforced CS presentations during 
flooding extinguished as rapidly as a group 
that received nonreinforced CS presentations, 
Fear could not have extinguished in the 
former group, so the effect of flooding was 


not solely attributable to Pavlovian fear ex-< 


tinction. There are, however, two problems 
with this interpretation. First, Bersh and 
Miller (1975) showed that Marrazo et al.’s 
results were due to their use of long (5-sec) 
shocks during flooding. These long shocks 
seemed to result in jumping and rearing 
behavior being conditioned to the situation, 
which in turn facilitated extinction by serv- 
ing as incompatible responses, It is impor- 
tant to emphasize, however, that this is not 
the same process as that involved in regular 
flooding. When they gave short (j-sec) 
shocks during flooding, the avoidance re- 
Sponses were much slower to extinguish than 
with regularly flooded animals. Second, there 
are some logical problems with Marrazo et. 
al.’s conclusions, That their two groups (fea 
conditioning and fear extinction) showed 
equally rapid extinction of the avoidance 
response does not imply that they did so for 
the same reason, Their results show at most 
that fear extinction is not necessary for 
avoidance extinction, but it may well be suf- 
ficient and may even be a necessary part of 
the traditional flooding process. 

Presenting further problems to a two-pro- 
cess account, a number of investigators have 
noted that fear of the CS remains following 
extinction after flooding. Page (1955) and 
Coulter et al. (1969) showed that rats that 
reach an extinction criterion following re- 
Sponse prevention in a one-way shuttlebox 
show greater fear of the safe side, as indexed 
by а passive avoidance test, than do rats 
reaching the same extinction criterion fol- 
lowing a regular extinction procedure, Lin- 
ton, Riccio, Rohrbaugh, and Page (1970) 
later demonstrated that a group receiving the 
response blocking procedure shows a small 


decrement in fear as compared to nonblocked, у | 


у 


i 


nonextinguished controls, but rats that had 
received blocking and extinction trials showed 
even less fear, and rats that had been ex- 
tinguished in the regular fashion showed the 
least fear. 

There are, however, several reasons why 
these results of Coulter et al. (1969) and 
Linton et al. (1970) alone may not destroy 
the fear extinction account of flooding. First, 
Mackintosh (1974) has suggested that per- 
haps only a certain threshold amount of fear 
must be elicited to motivate an active avoid- 
ance response. Flooding may reduce the level 
of fear below this critical threshold but not 
enough to abolish fear, as indexed on a pas- 
sive avoidance test. A recent experiment par- 
tially consistent with this idea has been 
reported by Monti and Smith (1976). They 
found that flooded rats demonstrated less 
“fear of the CS, as indexed by a CER test, 

than did control rats who had spent a com- 

parable amount of time in the home cage. 

The flooded rats did not, however, show zero 

levels of fear, and the difference between 

the two groups was significant for only the 
first three trials of the CER test (see also 

Corriveau & Smith, 1978). More definitive 
yore Support for Mackintosh’s idea would be pro- 
[ vided by a demonstration that this amount 

of flooding was sufficient to rapidly extin- 

» guish the rats' avoidance response as com- 

pared to the effect of the home cage control 
treatment on the avoidance response. Such 
a demonstration is necessary because recent 
results of Mineka and Gino (1979a) indi- 
cate that an amount of flooding sufficient to 
reliably hasten the extinction of a well- 
*learned shuttlebox avoidance response (20 
trials) is not sufficient to reduce fear of the 
CS. A greater amount of flooding (30 vs. 20 
trials) does reliably hasten avoidance ex- 
„У tinction and reduce fear of the CS (Mineka 
& Gino, 1979a). At present, then, there is 
no good evidence to support Mackintosh’s 
»., threshold idea: Fear extinction may occur 
» during flooding, but it does not appear to 
be necessary for avoidance response ex- 
tinction. 

There is, however, a second reason why 
the Coulter et al. (1969) and Linton et al. 
(1970) results alone do not destroy the fear 
extinction account of flooding. Shipley, Mock, 
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and Levis (1971) have criticized the Coulter 
et al. and Linton et al. experiments because 
total nonreinforced exposure to the CS was 
vastly different in blocked, blocked-then- 
extinguished, and normally extinguished 
groups. Their results suggest that when 
amount of nonreinforced CS exposure is held 
constant, residual fear, as indexed by a pas- 
sive avoidance test, does not differ across 
blocked and nonblocked regularly extin- 
guished groups. Furthermore, they found 
that total CS exposure time was similar in 
blocked and nonblocked groups when the ex- 
tinction criterion was finally met. They in- 
terpret these results as supporting the two- 
process explanation that response prevention 
produces its effect through fear extinction. 
Baum (1971) also showed that flooded and 
nonflooded groups that had reached the same 
extinction criterion recovered equally from 
extinction when a loud buzzer was presented. 
He took this to indicate that residual fear 
in all groups following extinction was roughly 
equivalent, thereby contradicting Coulter et 
al.’s conclusion. There are as yet no experi- 
ments using CER as an index of fear that 
compare residual fear in blocked-then-ex- 
tinguished and regularly extinguished groups. 
Such experiments are necessary before any 
definitive conclusions can be reached regard- 
ing amounts of residual fear following flood- 
ing versus conventional extinction procedures. 
Shipley et al.’s use of a passive avoidance 
test as an index of fear confounds behavioral 
passive avoidance with fear in a situation 
in which the two groups have had differen- 
tial opportunity for such behavior to have 
been reinforced, and Baum's recovery pro- 
cedure is hardly a validated index of fear. 
(See also Corriveau & Smith, 1978, for a 
more complete discussion of these issues.) 
Berman and Katzev (1972) have also 
shown the importance of nonreinforced CS 
exposure in an experiment that equated CS 
exposure across five groups. Four groups 
received one of four different flooding treat- 
ments, each following two sessions of two- 
way shuttlebox avoidance acquisition. The 
fifth CS-time-control group received as many 
trials of response-contingent CS termination 
as were necessary to equate total CS ex- 
posure during this treatment phase to that 
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in the four flooded groups. This latter CS- 
time-control group extinguished faster than 
a nontreated control group and did not differ 
significantly from two of the four flooded 
groups. (The two groups whose responses 
were blocked during treatment extinguished 
faster than the two groups for whom re- 
sponses were allowed but for whom CS ter- 
mination was not response contingent.) Ber- 
man and Katzev point out that these results 
suggest that caution is necessary in interpret- 
ing the results of nearly all response pre- 
vention experiments in which total non- 
reinforced CS exposure is confounded with 
the response prevention procedure itself. 
The Monti and Smith (1976), Shipley et 
al. (1971), and Berman and Katzev (1972) 
results do counter some of the arguments 
against the fear extinction account of flood- 
ing made by Page (1955), Coulter et al. 
(1969), and Linton et al. (1970) who all 
found differences in residual fear in flooded 
and regularly extinguished animals. These 
results all suggest that total nonreinforced 
CS exposure—which presumably allows for 
fear extinction to occur—may be the crucial 
variable in producing rapid extinction of 
avoidance. It must be emphasized, however, 
that this conclusion is based on the as yet 
unsupported assumption that the amount of 
fear extinction of an avoidance CS is directly 
related to the amount of nonreinforced CS 
exposure. Shipley (1974) did report that this 
is the case for straightforward fear condition- 
ing and extinction. However, as yet no one 
has studied this issue directly in the avoid- 
ance/flooding situation, that is, whether fear 
extinction of an avoidance CS (as measured 
by a CER test) is a simple function of total 
amount of nonreinforced CS exposure. The 
results of Starr and Mineka (1977) indicate 
that the dynamics of fear conditioning in 
the avoidance situation are not identical to 
those in a more straightforward fear condi- 
tioning situation (cf. Starr & Mineka’s yoked 
groups). In addition, Monti and Smith found 
that response prevention was more effective 
in eliminating fear conditioned in a classical 
paradigm than in eliminating fear conditioned 
over the course of avoidance learning. And 
perhaps most important are the results of 
Mineka and Gino (1979a) which indicate 
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that with a large amount of CS exposure 
(600 as opposed to 400 sec), the forced. 
aspect of the flooding procedure is more ef- 
fective in reducing fear of the CS than is an 
equal amount of the self-exposure that occurs 
in traditional extinction. So caution is clearly 
necessary in extrapolating from results which 
indicate that nonreinforced CS exposure is di- 
rectly related to avoidance response extinction 
(Berman & Katzev, 1972; Shipley et al., 
1971) to the conclusion that fear extinction 
is mediating that extinction of the avoid- +4 
ance response. (See also the paradox dis- 
cussed later presented by the Berman & Kat- 
zev, 1972, and Shipley et al., 1971, results.) 
Such caution seems particularly important 
given the results of Mineka and Gino 
(1979a) which indicate that fear as indexed 
by CER suppression does not diminish, given 
an amount of flooding that reliably hastens 
extinction of the avoidance response. Over- 
all, the weight of the evidence seems to indi- 
cate that fear extinction is not a necessary 
part of flooding for avoidance response ex- 
tinction, although given the results discussed 
so far, it may well be sufficient. 

Other problems for a two-process fear ex- 
tinction account. A number of other experi- 
ments have been considered by some review- 
ers such as Baum (1970b) to raise problems 
for the two-process account of flooding. For 
example, Benline and Simmel (1967) have 
results which suggest that the effects of 
flooding procedures may produce only tem- 
porary decrements in avoidance responding, 
perhaps through the learning of competing 
responses rather than through the extinction 
of fear. In their experiment, rats that had 
received 40, 80, or 160 blocking trials over a 
5-day period showed response decrements on 
the first few sessions of extinction as com- 
pared to a nonblocked control group. By the 
4th and Sth day of extinction, however, the 
blocked groups were responding as fast and 
as often as the nontreated control group. 
These results are difficult to interpret, though,’ 
because the control group itself showed по 
signs of extinction over 5 days (a question- 
able result considering that there had been 
only 50 acquisition trials on the ist day): 
This makes the meaning of any spontaneous 
Tecovery in the blocked groups unclear. 
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Other experiments (e.g., Polin, 1959; Shear- 
p: man, 1970) have also given multiple ex- 
tinction sessions, and they did not see this 
pattern of spontaneous recovery in flooded 
groups. Actually, there is no a priori reason 
why spontaneous recovery of the avoidance 
response, even if it were convincingly demon- 
strated, should provide strong evidence 
against a two-process account of flooding. 
Spontaneous recovery of any conditioned 
response can be expected to occur following 
„extinction (Kimble, 1961; Pavlov, 1927). If 
the conditioned fear response shows spon- 
taneous recovery, then the avoidance re- 
sponse might be expected to recur also. 

Potentially more damaging to a two- 
process account of flooding are the results 
reported by Werboff, Duane, and Cohen 
(1964). These investigators found a disso- 
' ciation between autonomic (heart rate) in- 
dices of fear and avoidance responding; rats 
that had undergone a treatment similar to 
flooding showed greatly elevated heart rates, 
even though they were no longer responding. 
Although Rescorla and Solomon's (1967) 
version of two-process theory attempts to 
get around the problem of dissociation be- 
tween peripheral and other indices of fear 
by postulating that fear is a central state, 
their theory has not been taken to postulate 
that peripheral indices can exist in the ab- 
sence of the central state (although the re- 
verse can be true) as the Werboff et al. data 
indicate. At a minimum these data extend 
the observations of dissociation between fear 
and avoidance following flooding to include 
such peripheral (psychophysiological) indices 
of fear. 

Two other lines of research present prob- 
lems to a two-process account of flooding 
because the results cannot easily be ex- 
plained by a two-process account. Leder- 
hendler and Baum (1970) reported that 
mechanical disruption of their rats' behavior 
during flooding (when abortive avoidance 
responses and  íreezing were occurring) 
greatly enhanced the efficacy of the flooding 
treatment as compared to the efficacy of 
the flooding for normally flooded animals. 
They interpret their results as supporting re- 
laxation theory discussed later and point 
out that a two-process account has difficulty 
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explaining the results. Furthermore, Baum 
(19702) found that a loud buzzer during 
flooding enhances the efficacy of the flood- 
ing treatment. Since a loud buzzer should, if 
anything, increase the ambient level of fear, 
a two-process account of flooding has diffi- 
culty explaining why a loud buzzer enhances 
efficacy of the treatment. It may be that an 
increase in the ambient level of fear produced 
by a loud buzzer enhances the efficacy of 
flooding by habituating the animal to the 
state of fear itself rather than by extinction 
of the fear CR. (See Watson & Marks, 1971, 
for a similar argument for humans.) Baum 
(19702) himself argues for a distraction in- 
terpretation of the results, which is orthogo- 
nal to a two-process account. 

Evidence for another process other than 
fear extinction. Overall, the above experi- 
ments indicate that a two-process account of 
flooding does not account convincingly for 
all of the relevant data and that fear ex- 
tinction is not a necessary part of flooding. 
But even if total nonreinforced CS exposure 
were the critical variable, as Shipley et al. 
(1971) and Berman and Katzev (1972) have 
suggested, a paradox more damaging to the 
two-process account than the experiments 
discussed above remains to be resolved: If 
flooded animals have had more CS exposure 
and therefore more extinction of the fear CR, 
then they should (and do) stop responding 
sooner than nonflooded animals. But if, as 
Berman and Katzev suggest, extinction of 
the avoidance response is solely a function 
of extinction of the fear CR, which in 
turn is solely a function of total nonrein- 
forced CS exposure, then groups that have 
met the same extinction criterion should 
have done so because their fear CR has ex- 
tinguished equally. This should occur re- 
gardless of whether the response has extin- 
guished following flooding or following con- 
ventional extinction procedures. Yet the Page 
(1955), Coulter et al. (1969), and Linton 
et al. (1970) studies have indicated that 
groups reaching the same extinction criterion 
do not demonstrate equal amounts of the 
fear CR. Their flooded groups may indeed 
have had less CS exposure and therefore 
more residual fear than their nonflooded group 
(cf. Shipley et al, 1971), but their avoid- 
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ance response did extinguish. Again we see 
that although complete fear extinction may 
be sufficient for avoidance response extinc- 
tion, it does not appear to be necessary. 
Some other form of learning in addition to 
extinction of the fear CR must generally 
also occur during flooding and contribute 
to rapid extinction of the avoidance response, 

Even Berman and Katzev (1972) have 
some evidence indicating that Pavlovian fear 
extinction is probably not all that occurs 
during flooding. Their blocked-spaced trial 
group did extinguish significantly faster than 
their CS time control group, which indicates 
that some other kind of, learning occurred in 
the former group. So if Pavlovian extinction 
of the fear CR cannot explain all of the rele- 
vant data on flooding, what other kind of 
learning can be taking place during flooding 
that could account for its efficacy in hasten- 
ing avoidance extinction? 

The safety signal revision of two-process 
theory must argue that the positive reinforc- 
ing properties of the response as a CS— ex- 
tinguish during flooding. The avoidance re- 
sponse cannot be made during flooding, so 
reinforcement no longer occurs in the avoid- 
ance apparatus. This theory predicts that a 
CS— established during avoidance learning 
would no longer serve as a Positive rein- 
forcer if the effects were measured following 
flooding (as contrasted to the results of 
Weisman and Litner, 1969, when the effects 
of the CS— were measured before any flood- 
ing or extinction trials). No such evidence 
exists, so the adequacy of this account of 
flooding cannot be assessed. Some theorists 
(e.g., Seligman & Johnston, 1973) have even 
argued that safety signal theory cannot pre- 
dict that flooding should work because if the 
response does not occur, how could the CS— 
properties of its feedback ever extinguish? 
Furthermore, the complex changes in fear 
that occur during flooding are of little in- 
terest to safety signal theorists because fear 
is not involved in asymptotic avoidance ac- 
cording to this theory. 


Competing-Response Theory 


Page (1955), Coulter et al. (1969), and 
Linton et al. (1970) have argued that what 
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is learned during flooding is a response com- 

peting with that learned during acquisition: ~ 
"Since CS onset results in a fear response 
which is reduced by fear offset, then any 
response 5 makes when the CS is terminated 
will be adventitiously reinforced" (Coulter 
et al, 1969, p. 380). Although these inves- 
tigators stay within a two-process framework, 
the emphasis has changed from what happens 
to the Pavlovian fear CR to what happens 
to the instrumental response made to reduce 
that fear CR. This theory nicely explains 
the dissociation between the fear CR and 
avoidance responding. Furthermore, Shipley 
et al. (1971) have noted that blocked groups 
show lower activity levels than do non- 
blocked groups. Shearman (1970) has also 
noted that all of the 15 of his 40 flooded 
animals who made no responses over 9 days 
of extinction also made no intertrial interval 
responses, which thus indicates low activity 
levels. These low activity levels could indi- 
cate that freezing has become the “compet- 
ing response,” The Marrazo et al, (1974) 
and Bersh and Miller (1975) experiments 
discussed Previously certainly indicate that 
competing responses learned during flooding 
can mediate the extinction of avoidance re- 
sponding. No experiment has conclusively ` 
demonstrated, however, that competing re- 

sponses either do mediate extinction or are 

necessary for such extinction, It should be 

noted that this conclusion is in the same vein 

as that reached regarding the necessity of 

Pavlovian fear extinction. As demonstrated 

earlier, even those experiments (e.g, Bersh 

& Paynter, 1972) purporting to demonstrate 

that fear extinction must contribute to avoid- 

ance extinction remain inconclusive because 

the Page (1955), Coulter et al. (1969), Lin- 

ton et al. (1970), and Mineka and Gino 

(1979a) data still stand: Without much | 
fear extinction, the avoidance response can | 
extinguish, Analogously here, Black's (1958, 
1959) results from flooding done under 
curare seem to show that a learned compet- 
ing response is not necessary for extinction 
to occur: Dogs given no opportunity to 
learn a competing motor response because 
they were paralyzed by curare when flooding 
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was carried out still showed rapid avoidance 
response extinction. 


Relaxation Theory 


Baum (1970b) has criticized the compet- 
ing-response theory on several different 
counts. His own experiments indicate that 
“undifferentiated exploratory behavior and 
grooming" (1970b, p. 281) tend to replace 
the extinguished avoidance response rather 


"than any specific response such as crouching 


or freezing. In addition, he notes that a com- 
peting-response hypothesis has difficulty ex- 
plaining why higher shock intensity and over- 
training decrease the efficacy of a fixed 
amount of flooding. One could argue that a 
better learned avoidance response has more 
difficulty being overcome by a competing 
response, but one could also predict that 
adventitious reinforcement for a competing 
response in groups trained with higher shock 
intensity should be greater, thus increasing 
the efficacy of flooding. Baum’s own analysis 
of what happens during flooding is that the 
animal learns to relax in the presence of the 
CS, an idea stemming from Denny’s (1971) 
relaxation theory of avoidance learning. Evi- 
dence consistent with Baum’s relaxation 
theory has been provided recently by Hawk 
and Riccio (1977). These investigators rea- 
soned that if relaxation responses are respon- 
sible for avoidance extinction, then a tech- 
nique that hastens the emergence of relaxa- 
tion responses should enhance the efficacy of 
flooding. They presented an independently 
established CS— (presumably an elicitor of 
relaxation responses) during flooding for one 
group, and they did find more rapid extinc- 
tion in that group. Unfortunately for Baum’s 
theory, a group that received a novel CS 
during flooding showed rates of extinction 
comparable to those of the CS— group. 
Actually, this relaxation account is similar 
to the fear extinction account of flooding, 
except that Baum (1970b) requires that the 
animal’s normal, nonfearful behavioral reper- 
toire (ie., relaxation responses) return be- 
fore one assumes that fear has extinguished. 
Hence the problems with this account are 
the same as those discussed previously re- 
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garding the fear extinction account. Baum 
(1970b) himself admits that his relaxation 
theory “still fails to explain evidence of fear 
(and no relaxation) following response pre- 
vention, even though the avoidance response 
has been extinguished” (p. 282). Further- 
more, Morokoff and Timberlake (1971) have 
results which indicate that relaxation re- 
sponses are at least not necessary for rapid 
extinction to occur, Their group that showed 
the most rapid avoidance response extinction 
showed six times as many fear responses 
(evidence of nonrelaxation) as did a group 
that extinguished substantially more slowly. 

Baum concluded his review (1970b) of the 
three main accounts of flooding by stating 
that no one of these three accounts seems to 
explain all the relevant experiments and that 
all three accounts may be partially correct. 
His conclusion still stands. In addition, 
Baum points out that the process most in- 
volved in a given experiment (Pavlovian 
fear extinction, competing-response learning, 
or active relaxation) may depend on what 
particular parameters and procedures are 
used, More direct support of this point is 
given later. 


Cognitive Theory 


Another account of how flooding produces 
its effect has recently emerged as part of 
Seligman and Johnston’s (1973) comprehen- 
sive cognitive theory of avoidance learning 
based on Irwin’s (1971) cognitive theory of 
motivation. As discussed earlier, in this the- 
ory conditioned fear plays a role only in 
the acquisition of the avoidance response; 
in the well-maintained response, fear no 
longer plays a motivating role but is replaced 
by expectancies that responding produces 
no shock and that not responding is followed 
by shock and by a preference for no shock 
as compared to shock. The extreme persist- 
ence of avoidance responding in extinction is 
easily explained by this theory because fear 
may extinguish without a change in the 
expectancies that responding is necessary to 
avoid shock. Typically, the animal does not 
wait around long enough in the presence of 
the CS to have its expectancies disconfirmed. 
According to this cognitive theory of avoid- 
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ance, flooding hastens extinction of the avoid- 
ance response because the animal's expect- 
ancy that not responding will lead to shock 
is disconfirmed, and the more disconfirmation 
the animal receives, the faster it should stop 
responding. Seligman and Johnston claim 
that this theory can easily account for the 
previously discussed dissociation between 
extinction of fear and extinction of the avoid- 
ance response because disconfirmation of the 
expectancies governing responding could oc- 
cur faster than Pavlovian fear extinction. 
Thus, as in the competing-response account, 
the emphasis here is that the response may 
change (ie. extinguish) in the absence of 
any change in the level of fear. 

At first glance, the attractiveness of this 
cognitive account of flooding lies in its ability 
to explain virtually any outcome of a flood- 
ing experiment; because extinction of con- 
ditioned fear is neither necessary nor suf- 
ficient for extinction of the avoidance re- 
Sponse, the theory can predict either more 
or less fear in blocked as compared to nor- 
mally extinguished groups. The fact that 
results indicate, if anything, more fear in 
blocked than in nonblocked groups presents 
no problem to the account, but neither would 
the opposite results, Failure of flooding to 
produce rapid extinction can be attributed 
to insufficient disconfirmation of the govern- 
ing expectancies, However, here also lies the 
problem with this cognitive account of flood- 
ing. Because there is no independent way of 
measuring the governing expectancies, the 
cost of this flexibility is that it makes this 
theory impossible to test. This is orthogonal 
to any criticism of the cognitive theory of 
avoidance learning as a whole because most 
existing data on avoidance seem to be inter- 
pretable within the theory. (But see earlier 
discussion of Starr & Mineka, 1977, for one 
exception.) It certainly does detract, how- 
ever, from its usefulness as an alternative 
to other current accounts of how flooding 
works, 

Although formally the cognitive theory of 
avoidance learning is different from the dis- 
crimination theory of D’Amato (1970) and 
Herrnstein (1969), the latter theory’s ac- 
count of flooding would probably be similar; 
flooding would probably be seen as working 
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by facilitating a detection of the change in 
reinforcement contingencies between acquisi- A 
tion and extinction. Additionally, the re- 
sponse might be thought to extinguish be- 
cause it no longer brings reinforcement in 
the presence of the SP. No predictions about. 
fear or fear extinction during flooding would 
be made. The problems with this account _ 
are similar to those with the cognitive ac- 
count: There is no independent way of as- 
sessing whether an amount of flooding that 
was insufficient to reduce resistance to ex- 
tinction had failed because the response still” 
produced reinforcement in the presence of 
the S^, 


Summary 


Four accounts of how flooding hastens 
extinction of avoidance responses have been 
discussed. None is completely adequate, It 
is important to recognize that different learn-^ 
ing processes may underlie the flooding ef- 
fects with different kinds of avoidance re- 
Sponses. In particular, fear extinction may 
be more central to the extinction of some 
avoidance responses than to others. The 
myriad of experiments investigating flooding 
have involved not only different species 
(dogs for Black, 1958; Carlson & Black, 
1959; Solomon et al, 1953; rats for all 
Other studies) and different responses (one- 
way shuttlebox, jump-up box, two-way shut- 
tlebox) but also Tesponses trained to vastly 
different criteria (varying from three con- 
Secutive avoidance responses with less than 
10 training trials to several hundred training 
trials over many days). In general, experi-* 
ments taken to support two-process fear ex- 
tinction accounts have used fairly well- 
learned two-way shuttlebox responses (e.g., 
Berman & Katzev, 1972; Monti & Smith, 
1976; Polin, 1959; Shearman, 1970; but 
Mineka & Gino, 19793, is an exception). 
Experiments taken to Support competing- 


well-learned one-way shuttlebox responses 
(eg, Coulter et al, 1969; Linton et ај, 
1970; Page, 1955). Baum (1970b) has used ^ 
moderately | well-learned jump-up box re- | 
Sponsés to support his relaxation theory. ~ 
Thus direct Comparisons among these ex- | 
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y Я periments must be made with extreme cau- 

„Аіоп, Baum has suggested that which ac- 
count of flooding is most applicable may de- 
pend on the particular avoidance response 
being studied as well as the particular pa- 
rameters of acquisition, shock intensity, and 
so forth. With special regard to understand- 
ing the dissociation between fear and avoid- 
ance, it is important to determine the extent 
of such dissociation following flooding of 
different avoidance responses. 
. The hypothesis that different learning pro- 
cesses mediate flooding effects in different 
avoidance situations may seem unnecessarily 
complicated. However, there are other indi- 
cations of the different natures of two-way 
shuttlebox learning and one-way or jump-up 
box learning (e.g., Bolles, 1970; Seligman, 
Maier, & Solomon, 1971; Stampfl & Levis, 

“1973; Turner & Solomon, 1962). Bolles 
(1970), for example, has shown that the CS 
termination contingency (the source of fear 
reduction in traditional two-process theories 
and the source of informational feedback in 
more recent theories) is not important in one- 
way avoidance responding, although it is im- 
portant in two-way shuttlebox responding. 


- Levis and Stampfl (1972) and Stampfl and 


Levis (1973) have shown that responding to 
a serial CS is different in the one-way and 
» the two-way situations. In a one-way appa- 
ratus, rats respond primarily to the first seg- 
ment of a CS, although fear of the final seg- 
ment is higher (Boyd & Levis, 1976). In the 
two-way shuttlebox, by contrast, rats respond 
primarily to the second segment of a serial 
CS. Seligman et al. (1971), in accounting for 
“the failure to demonstrate learned helplessness 
in the one-way situation, point to the fact 
that one-way learning can be place learning, 
whereas two-way shuttlebox learning must of 
necessity be response learning. With these 
facts in mind, the idea that different learning 
processes may mediate the flooding effect in 


ye, one-way and two-way situations seems rea- 


sonable, In two-way response-learning situa- 
tions, any learning process that affects the 
motivation or reinforcement for responding 
(eg. Pavlovian fear extinction and removal 
of the response/CS termination contingency) 
may well be sufficient to hasten extinction of 
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the response. Whether fear extinction is neces- 
sary is much less clear given the results of 
Mineka and Gino (1979a). In the one-way 
situation, by contrast, reducing the motivation 
for responding or the response/CS termina- 
tion contingency during flooding may not 
even be sufficient to hasten extinction of the 
response. If, as Seligman et al. (1971) and 
Bolles (1970) suggest, “running [or jumping] 
from a dangerous place to a safe place [is] 
an innate response" (Seligman et al., 1971, p. 
370), then the animal may need either to 
learn a competing response or to actively 
relax in the formerly dangerous place or to 
have a cognitive change in its expectancy as 
to what is a safe place. So for one-way re- 
sponses, fear extinction may be neither neces- 
sary nor sufficient for avoidance response ex- 
tinction. 


Nonspecificity of Flooding Effects and 
Implications for Theories of Flooding 


Although there are major differences among 
the four major theories of flooding, they do 
have certain elements in common. The Pav- 
lovian fear extinction account and the learned 
competing-response account are both em- 
bedded in a two-process framework, although 
they differ on which of the two processes is 
affected during flooding. The relaxation ac- 
count is similar to the Pavlovian fear extinc- 
tion account, except that it requires that the 
animal's normal nonfearful behavioral reper- 
toire have returned before fear is acknowl- 
edged to have extinguished. The notion of 
learning to relax, reminiscent of Wolpe’s 
(1958) reciprocal inhibition theory in which 
relaxation is learned as a response to pre- 
viously fear-evoking stimuli also has features 
in common with a learned competing response, 
albeit of a different sort than that specified 
by Page (1955) and Coulter et al. (1969). 
In addition, the cognitive account has simi- 
larities to the competing-response account in 
that although fear may remain, the response 
made in the presence of that fear changes, 
either because of a change in act-outcome ex- 
pectancies or because of adventitious rein- 
forcement for the new response. 


In addition to the above common points, all 
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four accounts share an implicit assumption 
that the effects of flooding should be quite 
specific (Bersh & Keltz, 1971). This speci- 
ficity should manifest itself in two ways. First, 
because all four theories are associative in 
nature, they are silent on the issue of how 
nonassociative changes could be involved in 
flooding effects. Second, they all predict that 
the effects of a flooding treatment should be 
relatively specific to the avoidance response 
that has been flooded, that is, they do not 
predict that a flooding treatment applied to 
one avoidance response could have transsitua- 
tional effects, such as hastening the extinction 
Of a second avoidance response learned to a 
different CS. Both of these assumptions have 
been challenged recently. 

Problems for the first aspect of the speci- 
ficity assumption are best illustrated by a 
recent set of experiments reported by Craw- 
ford (1977). Crawford found that confinement 
in novel or fearful places produced nearly 
as large a facilitation of jump-up response 
extinction as did a response prevention pro- 
cedure, She argues that these results are best 
explained by a species-specific defense re- 
sponse (SSDR) account of flooding (Bersh 
& Keltz, 1971). This account bears some re- 
semblance to the competing-response account, 
except it emphasizes that the new response 
emerges spontaneously when the dominant 
SSDR is punished or suppressed, not as a re- 
sult of adventitious reinforcement, that is, it 
is essentially a nonassociative account of flood- 
ing (Bolles, 1972). By this account, during 
response prevention as well as during con- 
finement in a novel or fearful place, freezing 
becomes the dominant SSDR, and so when 
extinction of the avoidance response begins, 
freezing replaces jumping or fleeing as the 
dominant response. 

Two other lines of evidence also suggest 
that nonassociative changes, such as a change 
in the SSDR hierarchy, produce effects that 
look much like traditional flooding effects. 
This questions the extent to which flooding 
acts via associative changes. Baum and Le- 
Clerc (1974) found that an irrelevant stress 

procedure—requiring rats to swim for 5 min- 
utes—produced as large an effect on avoid- 
ance extinction as did a regular response 
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prevention procedure. In addition, Monti and 
Smith (1976) reported that confinement ind 
a shuttlebox alone, with no CS presentations, 
produced as large an effect on fear (CER) ex- 
tinction as did a traditional flooding proce- 
dure. Together these results suggest that non- 
associative changes—perhaps a change in 
SSDR hierarchy—can produce effects that ap- 
pear identical to flooding effects, 

There is one serious problem, however, in 
extending this conclusion to the degree that д 
Crawford (1977) has done: Showing that а ^ 
change in SSDR hierarchy can produce flood- 
inglike effects does not mean that flooding 
normally acts via this process. For example, 
Baum and LeClerc (1974) went on to show 
that their irrelevant stress procedure probably 
did not produce its effect through the same 
process as did the response prevention proce- 
dure. When a 2-hour time delay was inter- ' 
polated between the treatment procedures and 
extinction, the irrelevant stress procedure no 
longer had an effect, although the response 
prevention procedure did. This would be ex- 
pected if irrelevant stress produces its effect 
through temporary nonassociative changes. 
Neither Crawford nor Monti and Smith 
(1976) ran such time delay groups to deter-,,4 
mine whether their nonspecific procedures 
Were as effective after a delay as were their 
traditional flooding procedures. So, although 
there is some indication that nonassociative 
changes can produce floodinglike effects, it is 
not yet clear to what extent similar nonasso- 
Ciative changes mediate the effects of con- 
ventional flooding procedures. 

The second aspect of the Specificity assump-. 
tion mentioned above—that flooding one 
avoidance response should not hasten the ex- 
tinction of a second response learned to a 
different CS—has also been challenged by a 
recent set of experiments reported by Mineka 
(1976). In her first experiment, Mineka ex- 
amined the comparative effectiveness of rele- 
vant and irrelevant flooding in hastening the, 
extinction of a two-way shuttlebox response. | 
All rats were trained to perform two different | 
avoidance responses—a one-way jump-up re- 
sponse with a tone CS and a two-way shuttle- 
box Tesponse with a light CS. Flooding the 
Jump-up response hastened extinction of the a 
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shuttlebox response to nearly as large an ex- 


» tent as did flooding the shuttlebox response 
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itself, These results demonstrate that flooding 
one avoidance response can hasten the extinc- 
tion of a different response learned to a differ- 
ent CS. In other words, the effects of a flood- 
ing experience can be more general than 
previously thought. 

Mineka (1976) then asked whether gen- 
eralization of fear extinction across CS mo- 
dalities (Pavlov, 1927) could have mediated 
this irrelevant flooding effect. By this account, 
extinction of fear to the tone CS and/or 
jump-up box grid that occurs during jump-up 
box flooding generalizes to produce extinction 
of fear of the shuttlebox light CS and/or 
shuttlebox grid. This in turn could reduce the 
motivation for responding in the shuttlebox 
sufficiently to account for the more rapid rate 


“of extinction in the irrelevant flood group as 
compared to the control groups. By this hy- 


LS 


pn; 


pothesis, the effect of the irrelevant flooding 
treatment should be somewhat smaller than 
that of the relevant flooding treatment, and 
this effect was found. Extinction across CS 
modalities is not expected to be as complete 
as extinction of the CR that has actually 
undergone the extinction procedure (Konor- 
ski, 1948). Mineka reasoned that if this hy- 
pothesis were correct, then any fear extinction 
procedure for one CS should also hasten the 
extinction of the shuttlebox avoidance re- 
sponse, that is, flooding of an irrelevant avoid- 
ance CS would not be necessary to see the 
effect. 

In a second experiment, Mineka (1976) 
found evidence to support this hypothesis. 
"Two groups of rats were trained to perform 
a shuttlebox avoidance response and were 
given Pavlovian fear conditioning trials with 
a different CS in a jump-up box. One group 
also received fear extinction trials in the 
jump-up box, whereas the second group was 
returned to their home cages for a compara- 
ble amount of time. Both groups were then 


‘tested for extinction of the shuttlebox avoid- 


ance response. The fear conditioning and ex- 
tinction group showed substantially faster ex- 
tinction of the shuttlebox avoidance response 
than did the fear conditioning alone control 
group. So it seems that an irrelevant Pavlovian 
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fear extinction process can mediate the ex- 
tinction of the shuttlebox avoidance response. 
This suggests that the Pavlovian fear extinc- 
tion that occurred during irrelevant flooding 
in the previous experiment may well be the 
factor that mediated shuttlebox extinction 
there also. Mineka (1976) concluded that 
these experiments suggest that generalization 
of extinction across CS modalities can mediate 
extinction of a shuttlebox avoidance response. 

However, this conclusion that generaliza- 
tion of fear extinction across CS modalities 
mediated the irrelevant flooding effect is now 
questionable; more recent evidence from 
Mineka's laboratory described earlier (Mi- 
neka & Gino, 1979a) suggests that fear 
extinction does not mediate the relevant flood- 
ing effect. This raises the question of what 
is mediating the irrelevant flooding effect. 
Mineka (1974, 1976) rejected on a variety 
of empirical and theoretical grounds the com- 
peting-response, relaxation, and cognitive ac- 
counts of flooding as explanations for this 
effect, Crawford's (1977) SSDR hypothesis, 
however, provides a possible explanation of 
this irrelevant flooding effect that is worthy 
of investigation. This hypothesis would state 
that the irrelevant flooding procedure acts in 
the same way as does relevant flooding or con- 
finement in fearful places—by producing a 
change in the dominant SSDR from fleeing 
to freezing. 

There is one major weakness in the SSDR 
hypothesis that must be investigated before 
its feasibility can be determined. Mineka 
(1976) found in a third experiment that ir- 
relevant flooding did not hasten jump-up box 
extinction, that is, there is an asymmetry to 
her irrelevant flooding effect. Yet the SSDR 
hypothesis originated from results with jump- 
up extinction, so irrelevant flooding should 
hasten jump-up extinction. There are two pos- 
sible resolutions of this apparent paradox. 
First, the irrelevant flooding experiments used 
a better trained jump-up response than did the 
brief confinement experiments, and the con- 
finement effect may only occur with less well- 
learned responses. Second, the brief confine- 
ment experiments used a different extinction 
procedure than did the irrelevant flooding ex- 
periment. In the former experiments the ani- 
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mals started extinction on the grid floor, and 
if they had any tendency to freeze, there was 
nothing to break up that tendency; by con- 
tinuing to freeze they could rapidly meet the 
extinction criterion. Mineka, by contrast, used 
Baum's (1970a, 1970b) procedure, in which 
the animals are dumped off the safety ledge 
at the start of the first extinction trial. This 
dumping procedure may break up any ten- 
dency for freezing (perhaps it is even a mild 
punishment for freezing.) Hence any change 
in SSDR hierarchy that may have occurred 
during flooding or confinement in novel places 
might be reversed. The feasibility of these 
explanations can be determined by experi- 
ments designed to test whether the brief con- 
finement effect occurs with the Baum/Mineka 
(ledge) extinction procedure and whether 
Mineka's irrelevant flooding procedure has- 
tens jump-up extinction if the grids extinction 
Procedure is used. (See Mineka & Gino, 
1979b, for results which indicate that the ef- 
fect of confinement in novel or fearful places 
does not occur either when the ledge extinc- 
tion procedure is used or when the grids ex- 
tinction procedure is used with a better 
learned response such as that used by Mineka, 
1976). 

In sum, there are Several aspects to the 

nonspecificity of flooding, but the implications 
of findings in this area for our understanding 
of what goes on during traditional flooding 
procedures are as yet unclear. First, confine- 
ment in novel, fearful, or stressful places can 
produce floodinglike effects, but the extent to 
Which these nonassociative effects contribute 
to traditional response prevention effects is 
unknown. It is unlikely, given Baum and Le- 
Clerc's (1974) time delay results as well as 
the results of Mineka and Gino (1979b), that 
purely nonassociative changes account for all 
of the effects of response prevention. Second, 
the effects of flooding one avoidance response 
can have transsituational effects and hasten 
the extinction of a second response that was 
learned to a different CS, but it is not yet 
known whether this irrelevant flooding effect 
is mediated by associative factors (e.g., gen- 
eralization of extinction across CS modalities) 
or by nonassociative factors such as those 
suggested by Crawford's ( 1977) hypothesis. 
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The Role of Fear in Alternative Extinction 
Procedures for Avoidance 


In recent years the conventional extinction 
Procedure used to study the persistence of 
avoidance responses has received a substantial. 
amount of criticism. In studies of the extinc- 
tion of appetitive responses, the positive rein- 
forcer—food or water—has traditionally been 


that extinction of avoidance should be stud- 4 


avoidance contingency. Davenport and Olson 
(1968) reported rapid extinction of a discrimi- 
native bar-press avoidance response when both 
these sources of reinforcement were removed. 
Reynierse and Rizley (1970) subsequently 
confirmed these results with a shuttlebox'| 
avoidance response; removing the CS termina- | 
tion contingency or presenting CSs and USs 4 
randomly were both more effective procedures 
for eliminating avoidance than was the con- 
ventional procedure. Bolles, Moot, and Gros- | 
sen (1971) further parceled out the relative 
contributions of CS termination versus shock 
avoidance to the extinction of avoidance and 
concluded that the shock avoidance contin“ 
gency was crucial in creating an avoidance 
response that is highly resistant to extinction. 
Animals receiving shocks either randomly dur- 
ing extinction or as Punishment for avoidance 
responses showed rapid extinction of the 
avoidance response, CS termination proved to 
be important only as long as the responses | 
continued to avoid shock. This last finding cons"! 
flicts with the report of Katzey that CS ter- 
mination is *xtremely important in maintain- 
ing avoidance Tesponding at a high rate when. 
an avoidance contingency is in effect. Katzev 
reports low rates of responding with delayed. 
CS termination and Shock avoidance ; Bolles | 
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et al, on the other hand, report relatively 
Ps high rates of responding with delayed CS ter- 
mination and shock avoidance.? 

It is now clear that the extreme persistence 
of avoidance responses is largely a result of 
using the conventional extinction procedures 
in which both CS termination and shock 
avoidance continue to be sources of reinforce- 
ment for responding in extinction. Because 
maintenance of the response/reinforcement 
contingency is necessary for a response that 

Fis highly resistant to extinction, some theorists 
(e.g., Mackintosh, 1974) have concluded that 
the longstanding question wrestled over by 
learning theorists of how to explain the per- 
sistence of avoidance responses in the face of 
apparent nonreinforcement (i.e., no shock) is 
really a pseudoproblem. These arguments are 

“x, Well-founded insofar as they constitute a sound 
analysis of what the reinforcing events and 
А response/reinforcement contingencies are in 
avoidance learning. Such arguments fail to 
recognize, however, that the conventional ex- 
tinction procedure that produces such extreme 
persistence is still interesting in its own right. 

Aronfreed (1968) has cogently argued that 
the process of a child’s internalizing control 

веку over his/her own behavior closely parallels the 
learning and maintenance of avoidance re- 
sponses as they have conventionally been 
studied in animals. He argues that the extreme 
resistance to extinction seen in animals’ avoid- 
ance responses (e.g., Solomon et al., 1953) 
is analogous to the observation that behav- 
ioral changes that occur in the process of hu- 
man socialization are also highly resistant to 
¿changes in external contingencies. Children 
and adults frequently persist in responses 
originally acquired under circumstances of 
aversive control even when the aversive out- 
comes would no longer occur. Aronfreed fur- 
ther contends that this is most likely to occur 
in cases in which a child never had a cogni- 
tive representation of the original contingen- 
cies and so is unlikely to become aware of a 
change in the contingencies. This is obviously 
parallel to the case in which an animal learns 
an avoidance response that persists when con- 
ventional extinction procedures are used be- 
cause it never stays around long enough to 
sample or become aware of the new contin- 
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gencies. The newer extinction procedures, dis- 
cussed previously, which change the CS termi- 
nation and/or shock avoidance contingencies, 
forcibly expose the animal to the change in 
contingencies; hence extinction proceeds at a 
much faster rate. 

Therefore, we can conclude that the extreme 
persistence of avoidance responses may not 
be as surprising or puzzling as it was once 
conceived to be and yet still concede that 
the question of what learning processes under- 
lie the ultimate extinction of these responses 
is still an important and interesting one. If 
we are interested in the natural course of ex- 
tinction, given the original contingencies, the 
processes underlying flooding or response pre- 
vention techniques, which hasten extinction, 
also remain important topics of investigation. 
These techniques are particularly interesting 
in light of the parallel dissociations between 
fear and avoidance behavior, on the one hand, 
and between subjective, behavioral, and psy- 
chophysiological indices of fear in humans 
undergoing systematic desensitization or flood- 
ing therapy, on the other hand. 

One interesting issue that has not yet been 
investigated is the extent to which the dis- 
sociation between fear and avoidance that is 
observed following flooding and conventional 
extinction procedures also occurs with the 
newer procedures that remove the CS termi- 
nation and/or shock avoidance contingencies. 
Some evidence (e.g., Coulter et al., 1969; Lin- 
ton et al, 1970) suggests that the learning 
processes underlying flooding are different 
than those underlying the conventional ex- 
tinction procedures because different residual 
amounts of fear remain. Other evidence (e.g., 
Berman & Katzev, 1972), however, suggests 
that flooding and regular extinction act via the 
same underlying process—nonreinforced CS 
exposure, which produces fear extinction. 
Riccio and Silvestri (1973) have speculated 
that extinction procedures that remove the 


?]t should be noted that punishment of avoid- 
ance often results in increased rather than reduced 
resistance to extinction—the so-called vicious circle 
phenomenon. (For extensive reviews of the litera- 
ture on the determinants of which result is likely 
to occur in a particular situation, see, e.g., Brown, 
1969, and Mackintosh, 1974.) 
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response/CS termination contingency may be 
more effective in eliminating the “motiva- 
tional” (i.e., fear-eliciting) properties of the 
CS than are conventional extinction or re- 
sponse-blocking procedures. There is no com- 
pelling evidence to support or refute this 
speculation at present. It is unlikely that re- 
moving the shock avoidance contingency (e.g., 
Davenport & Olson, 1968) could produce ex- 
tinction by fear extinction. This procedure is 
more likely to produce its effect through learn- 
ing of a competing response or by a change 
in response/outcome expectancies. Recently, 
in fact, Overmier and Brackbill (1977) dem- 
onstrated that an effective avoidance extinc- 
tion procedure involving noncontingent US 
presentations leaves fear of the CS intact. 
They interpret their results in terms of the 
independence of CS-fear and fear-avoidance 
response links in the avoidance-learning situa- 
tion (see also Overmier & Bull, 1969). The 
results can also be interpreted in terms of 
the dissociation between fear and avoidance 
responding discussed earlier. Further explora- 
tion of these different avoidance procedures 
is particularly important because learning 
theorists and behavior therapy researchers 
are beginning to speculate, on the basis of 
animal experiments, about which contingencies 
it is most important to remove during de- 
sensitization and flooding therapy in humans 
(e.g., Riccio & Silvestri, 1973; Wilson, 1973). 

In sum, we can see that there are multiple 
ways of extinguishing an avoidance response. 
What factors these procedures have in com- 
mon is not yet clear, although it seems certain 
that multiple learning processes are involved. 
Experiments designed to determine the degree 
of residual fear following these different pro- 
cedures will be of particular interest. 


Conclusion 


Fear and fear extinction do not appear to 
play any simple role in avoidance acquisition, 
maintenance, and extinction. There is often 
a marked dissociation or desynchrony between 
fear and avoidance behavior, and the deter- 
minants of this dissociation are as yet poorly 
understood. Fear attenuates over the course 
of avoidance learning; yet this cannot be ex- 
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| 
plained by simple Pavlovian fear extinction 
(Starr & Mineka, 1977). The question of what - 
continues to motivate avoidance responding | 
as fear diminishes remains a hotly debated is- 
sue (e.g., D’Amato, 1970; Herrnstein, 1969; 
Mackintosh, 1974; Seligman & Johnston, 


1973). These new theories of avoidance have | 


emerged in large part as a result of the in- 
adequacies of traditional two-process theory 
in explaining the dissociation between fear 
and avoidance behavior. Each theory postu- 
lates some new mechanism to explain well- 
maintained avoidance responding and extinc- 
tion of responding. In so doing, these theories 
have tended to ignore or treat in only a cur- 
sory fashion the interesting determinants of 
this dissociation, However, fear remains a 
salient feature of avoidance learning, and the 
question of what determines its Course at 
various stages of avoidance acquisition, main- 
tenance, and extinction remains one of great 
practical and theoretical importance, at least 
for those interested in general questions about 
the determinants of fear and fear extinction 
in more complex situations than those tradi- 
tionally studied by theorists of classical con- 
ditioning. у 

The determinants of fear extinction and its 
role in mediating avoidance extinction and the 
effects of response prevention techniques are 
also complex. Fear extinction is certainly not 
а necessary precursor of avoidance response 
extinction. With some avoidance responses 
(e.g., two-way shuttlebox), fear extinction 
may be sufficient to Cause avoidance extinc- 
tion, whereas with others, fear extinction may 
not be sufficient (e.g., one-way responses). 
Further work is needed to clarify the situa- 
tions in which fear extinction is more or less 
central to avoidance response extinction. 

The dissociation between fear and avoid- 
ance responding often Observed following 
flooding is of particular interest because it 
may be functionally analogous to the disso- 
ciation frequently seen among different ele- 
ments of the phobic response following sys- 
tematic desensitization and flooding therapy 
(eg., Lang, 1968, 1971; Rachman & Hodg- 
son, 1974; Riccio & Silvestri, 1973). For 
example, patients frequently report that they 
can approach their phobic Object (ie, the 
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avoidance component is gone) but that they 
still feel afraid of it (ї.е., the subjective feel- 
ing of fear and sometimes the psychophysio- 
logical concomitants of this feeling are still 
present). Although conditioned avoidance re- 
sponses are often considered to be a poor 
model for human phobias (e.g., Costello, 
1970; Seligman, 1971), many of the same 
variables found to enhance the effectiveness of 
flooding are also found to be important in 
flooding therapy with humans. (See Baum, 
1970b, and Marks, 1972, for reviews.) So 
further work on the determinants of fear ex- 
tinction during flooding is important for our 
understanding not only of the role of fear in 
the extinction of avoidance but also of a num- 
ber of analogous problems in human behavior 
therapy techniques. 


Reference Note 


1, Mineka, S., & Gino, A. Dissociation between CER 
and extended avoidance performance. Manuscript 
submitted for publication, 1979. 
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Applied statistics textbooks generally recommend the use of the chi-square tests 
of homogeneity and independence with 2 X 2 contingency tables only when the 


rule-of-thumb criterion to be unnecessarily restrictive, but has not explored the 


. expected frequency of each cell is five or more. Recent research has shown this 


accuracy of the chi-square tests when the total number of observations is less 
than 20 or when the expected frequencies fall well below one—the primary 
issues considered in this article. The chi-square tests of homogeneity and inde- 
pendence were found to provide reasonably accurate estimates of Type I error 
probability for ЈУ > 8. Certain alternatives to the chi-square tests are considered. 


“а There are three distinct models for deriving 
1 probability statements for 2 X 2 contingency 
*tables (Kendall & Stuart, 1967, pp. 549-555). 
In Model 1, a researcher specifies both sets of 
marginal frequencies before the data are 
collected ; this is Fisher's exact test. In Model 2, 
one set of marginal frequencies is fixed, the 
other being free to vary randomly; this is 
termed a test of homogeneity. In Model 3, 
"neither set of marginal frequencies is fixed; 
this is typically described as a test of in- 
у dependence. The chi-square test has been shown 
to provide approximate, but accurate, proba- 
bility estimates in Case 2 (test of homogeneity) 
and Case 3 (test of independence) for sample 
sizes as small as У = 20 if the Yates correction 

is not applied (Camilli & Hopkins, 1978; 
Roscoe & Byars, 1971). The accuracy of chi- 
„square tests of homogeneity and independence 
with sample sizes smaller than 20 does not 
appear to have been explored. Although Model 

1 is rarely consonant with empirical data, the 
chi-square test in which the continuity 
correction has been made estimates proba- 
bilities associated with Model 1. In Model 2 

or Model 3 chi-square applications, the con- 


m. tinuity correction should not be applied, since 
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it decreases the accuracy of the related proba- 
bility statements (Camilli & Hopkins, 1978). 


Method 


This study examined the effects of the four in- 
dependent variables listed below on the actual propor- 
tion of Type I error in relation to the nominal alpha 
values of .10, .05, and .01: (a) total number of observa- 
tions, У, in the 2 X 2 contingency table (N = 4, 8, 12, 
16, and 20); (b) row marginals fixed (Model 2) versus 
random (Model 3); (c) relative frequencies of the V 
observations in the two row categories when row 
marginal frequencies (mi. and mz.) were fixed (m./ms. 
= 1.0 and .33); when row frequencies were random, 
the row marginal proportions varied randomly about 
the parameters ті. and тз., with т. = 1 — m.: = .500, 
.707, and .794; and (d) proportion of N observations 
in the two random column categories т.л and m.» 
(та = 1 — r.z = .500, .707, and .794). (The propor- 
tions employed by Roscoe & Byars, 1971, and Camilli & 
Hopkins, 1978, were used in this study to facilitate 
comparisons of findings.) 


Results 
Model 2: Chi-Square Test of Homogeneity 


The actual proportions of the 10,000 repli- 
cations in which the observed chi-square value 
exceeded the critical chi-square value at 
nominal alpha levels of .10, .05, and .01 are 
given in the upper portion of Table 1 for 
selected Model 2 chi-square applications. For 
example, the second row of Table 1 indicates 
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Table 1 

Actual Proportions of Type I Errors in 10,000 Replications Given by Chi-Square Tests of à 
Homogeneity (Model 2) and Independence (Model 3) in Selected 2 X 2 Contingency Tables + 

alt ii а bunte q- a ye OS IV SUE _____________ 


Random 
column Row frequency/ Proportion of Type I 
proportions proportion errors (á)^ Uncom- 
ааа a putable* 
та та N m. па. 10 .05 01 х? 
з 
Model 2 
5 5 4 2 2 .12244 1224“ .0000 1294 
8 4 4 .0715 0715 0058 74 
12 6 6 „1426 .0576 .0066 3 
16 8 8 .0746 .0703 .0048 1 
20 10 10 .1160 .0400 0132 0 
.194 .206 4 1 3 .1172 1172 .0000 3920 
8 2 6 .1069 .0274 0110 1631 
12 3 9 .1076 .0222 0178 611 
16 4 12 .1052 .0318 .0228 253 
20 5 15 .1088 0358 0188 100 
Model 3 
5 dy 4 т1.=.5 42 =.5 1077 1077 .0000 2359 
8 .1084 .0656 .0082 173 
12 „1527 .0572 .0066 12 
16 1114 .0708 .0083 1 
20 1203 .0507 .0117 0 
d) 5 4 194 — .206 .0759 .0759 .0000 4722 
8 „0684 .0319 .0062 1602 
12 .1006 .0390 .0047 613 
16 .1063 0411 .0057 231 
20 .1048 0416 .0061 94 
.794 .206 4 .794 206 .0570 .0570 .0000 6342 * 
8 0824 0303 0161 2913 
12 .0896 0403 .0117 1221 
16 .0967 .0435 0183 482 
20 0928 0483 0143 205 


» Pseudorandom numbers were generated by the International Mathematical and Statistical Libraries 
subroutine GGUB (IMSL, 1977). In addition to the 25 simulations reported in Table 1, 25 additional 


апарса were performed with row and column parameters falling between the extremes represented in 
able 1. 


b op = :0030, :0022, and .0010 for а = .10, .05, and .01, respectively. 
* In certain instances of the 10,000 replications in which the expected cell frequencies were very small, the* 
random number generating process yielded a column with a zero frequency, hence the chi-square was not 


computable. One can subtract this figure from 10,000 to recalculate the proportion of Type I errors in the 
sample space in which the chi-square was computable. 


4 Because of the discreteness in the empirical sampling distributions, the proportions 
З e г i of Type I errors 
are identical at a = ,10 and а = .05 (and in all other simulations in which Му. dá | 


that when the 2 X 2 contingency table con- 


Model 3: Chi-Square Test for Independence 
tained N = 8 observations divided equally а orem 


between the two rows (mı. = m: = 4) and 
when the proportion of N observations falling 
in each column varied randomly about 
.5 = та = т.о, the observed proportions of 
Type I errors (&) were .0715, .0715, and .0058 
at the nominal alpha values of .10, .05, and .01, 
respectively. 


Selected results from applications оба 
Pearson’s chi-square to null situations in whi 
both row and column proportions differed 
randomly from the specified parameters т. 
and т. are given in the lower portion of 
Table 1. In all 50 simulations (25 of which are 
reported in Table 1), the expected frequencies | 


E 


2X2 CONTINGENCY TABLES 


| of at least two of the four cells were five or 
у less; the expected frequency of one or more 
cells was extremely small (one or less) in 24 

of the simulations. 
For both models, when У > 8, the nominal 
alpha values were reasonably accurate at 
а = .05 (the proportion of Type I errors 
varied between .0222 and .0715) and a = .01 
| (the proportion of Type I errors varied be- 
| tween .0047 and .0228). For N = 4, the 
accuracy at a = .10 was adequate, but for 
je a = .05 and а = .01, accuracy was very poor. 
The issue of power is critically important. 
A later section demonstrates that when the 
two factors are very highly correlated, the 
statistical power of the chi-square test with 
very small Ns is very low. But alternatives 
can provide a more powerful test under 

certain conditions. 


Alternatives to 2 Х 2 Chi-Square Tests 


To illustrate the first alternative, consider 
a 2X2 contingency table in which N — 10 
and nu = 4, m2 = 1, па = 1, and по = 4, 
hence mi = m. = ma = па = 5. The proba- 
bilities for this particular configuration of 
frequencies for Models 2 and 3 are related to 
the probability of Model 1 by binomial 
“multipliers (Kendall & Stuart, 1967, pp. 
" 551-555): 


| b ny. поп то! 
pm 
N ny Ws пао Iss" 


№ = Ze Jena —73535 (2) 
n.a 


(t) 


+ Рз = Zu ena = m). — (3) 
nn. 


Using Equation 1, pı (Fisher's exact proba- 
. bility) is found to be .0992. Using Equations 
4 2 and 3 and supposing that 4.4 = #1. = .5, 
. the probabilities of the observed 2 X 2 con- 
figuration can be determined: 


Py be = „(ај = .0992(.2461) = .0244 
and 
Bs = 6.2461) = .0060. 


But these probabilities are for observing this 
. Particular frequency configuration in the 


> 


1013 


sample space. The random marginal fre- 
quencies allow many more possible frequency 
configurations under Model 2 than under 
Model 1 and many more still under Model 3. 
Indeed, unless ЈУ is very small (10 or less), the 
process of enumerating the various frequency 
configurations, and hence deriving a proba- 
bility distribution, is not practicable without 
the assistance of computer programs. Further- 
more, this “exact” distribution will only be as 
useful as 11. and 7». are accurate. 

A modification of Fisher’s exact test, 
proposed by Tocher (1950), can be used to 
compensate for the discreteness of sampling 
distributions if one desires to make decisions 
at precise conventional alpha values such as 
а = .05. In Tocher’s procedure, one selects 
alpha (o) and then calculates the cumulative 
Model 1 distribution for the given configura- 
tion: Ја is defined as the probability of all 
more extreme configurations (in our example, 
nu = по = 5 and ny = mi = 0), and Ly 
= .0040 using Equation 1); Lə is the proba- 
bility of the obtained configuration (.09921) 
plus Lı, hence L = .1032. If Lə € а, Ho is 
rejected; if Li >a, Ho is tenable. But in 
situations like ours in which Lz > а > Ly, the 
ratio Ris determined: R = (a — L1)/ (Lə — Lı). 
In our example, 


R = (.05 — .004)/(.1032 — .004) = .464. 


A random number (X) is then selected from a 
uniform distribution within the interval (0, 1). 
If X € К, Hois rejected; otherwise Ho remains 
tenable; or given our data set, the probability 
that Ho will be rejected is .464. The exact level 
of significance equals XL; + (1 — ХОДА. 

Tocher's test (also known as the random- 
ized exact test) has been shown to be the 
uniformly most powerful unbiased test for all 
three models (Kendall & Stuart, 1967, p. 554; 
Tocher, 1950). The degree of difference in 
power does not seem to have been explored for 
very small Ns. 


Power Comparisons 


The relative power of the chi-square and 
Tocher’s exact test was compared for 12 
Model 2 and Model 3 situations that were a 
subset of the original 50 simulations, except 
that the cell proportions (т;;) varied from .4 
to .1. At а = .05 and at N = 8, the power 
efficiency of the chi-square ranged from .18 to 
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.64, with the lesser power differences occurring 
with larger mu/m1.— Tn/75*. values. The 
differences in power were much less for 
12 € N < 20, with the power efficiency of the 
chi-square ranging from .64 to 1.05. The gain 
from Tocher’s test is negligible if N > 20 
(Starmer, Grizzle, & Sen, 1974). The absolute 
power of the chi-square or the Tocher tests 
exceeded .80 only when effects were very large 
(eg, mu/m. =.8 and пут; = 2) and 
N 216 with а = .05. Researchers working 
with precious observations that may limit № 
should probably relax alpha to .10 and make 
directional tests to increase power. 

Many behavioral researchers will find 
Tocher’s test philosophically objectionable in 
spite of its greater power, since the gain in 
power is affected not by data but by the “luck 
of the draw." 


Conclusions 


In 2 X 2 contingency tables, the chi-square 
test provides a quick and reasonably accurate 
Type I probability statement for tests of 
homogeneity or tests of independence if 
N 28 If N <5, а = 10 should be used, 
since а = .05 and а = .01 become very 
inaccurate. 

Fisher's exact test can also’ be employed. 
but is very conservative for Model 2 and 
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Model 3 applications (Camilli & Hopki 
1978). The most powerful, but philosophi 
controversial alternative is Tocher's e 
test, which has greater power than the 
square test, especially when XN < 20. 
increase power if N cannot be increased 
researchers should consider relaxing alpha. to 
10 or more and making one-tailed tests. 
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Psychological Control of Essential Hypertension: 


Review of the Literature and Methodological Critique 


Peter Seer 
Department of Psychiatry, University of Auckland, Auckland, New Zealand 


Recent studies (1971-1978) that investigated psychological approaches to the 
treatment of essential hypertension are reviewed. Twenty studies that use tech- 
niques of biofeedback, relaxation, and meditation training are summarized in 
table form. They are subjected to a detailed methodological critique, and sug- 
gestions for methodological improvements and directions for future research 
are proposed. Most experiments demonstrated blood pressure reductions too 
small to be of clinical significance. A combination of biofeedback and relaxa- 
tion/meditation with other behavioral techniques appears most promising, and 
suggestions for a more comprehensive approach to assessment and training are 
made. Although studies comparing biofeedback and relaxation/meditation were 
inconclusive, relaxation/meditation is suggested to hold more promise because 
it requires no sophisticated technology and has been reported to simultaneously 


“А reduce other stress-related complaints. 


Ra In recent years we have witnessed an in- 
creasing interest in the psychological control 
of essential hypertension and of disease in 
general. This trend appears to be related to 
a shift in disease patterns. On the one hand, 
we can observe a drastic reduction in con- 
wo diseases and, on the other hand, a 
ramatic increase in degenerative disorders 
such as coronary heart disease and essential 
|; hypertension (Stoyva, 1976). There is an ex- 
tensive body of research indicating that es- 
sential hypertension can be related to dys- 
functional habits of living and of responding 
to an environment of ever increasing com- 
plexity (Gutmann & Benson, 1971; Henry & 
Cassel, 1969). Further related to this trend 
sis the growing recognition that the pharma- 
cological treatment of essential hypertension 
has many undesirable side effects (Bulpitt & 
Dollery, 1973) and is far less effective than 

У drug advertisements lead one to believe 
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‘manuscript. 
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who is now at Fachbereich Psychologie, Rehabilita- 
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(Kannel & Dawber, 1973; LoGerfo, 1975). 
The development and scientific study of non- 
drug alternatives in the control of essential 
hypertension is therefore highly desirable. 
Psychological approaches such as biofeedback, 
relaxation, and meditation constitute such an 
alternative. The purpose of this article is to 
present a review of recent research in this 
field (1971-1978), to critically evaluate its 
methodology, to make special recommenda- 
tions for minimal methodological require- 
ments, and to make suggestions for future 
research. 


Definition, Incidence, and Classification of 
Essential Hypertension 


Hypertension has become an epidemic of 
major proportions in western society. Esti- 
mates of its prevalence vary from 10% to as 
high as 30% of the total adult population, 
depending on the definition of what consti- 
tutes high blood pressure (Stamler, Stamler, 
Riedlinger, Algera, & Roberts, 1976). Al- 
though there exists considerable disagreement 
over the definition of hypertension, it is safe 
to say that for persons of up to 50 years of 
age, a blood pressure of 145/95 mmHg would 
be classified as mild hypertension. Hyperten- 
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sion in its early stages is asymptomatic, that 
is, it is not accompanied by any overt warn- 
ing signs. Consequently, as many as 50% of 
all cases of hypertension go undetected 
(Onesti, Kim, & Moyer, 1973). More than 
90% of all cases of hypertension are of un- 
known etiology; they fall into the category 
of primary or essential hypertension. The re- 
mainder, labeled secondary hypertension, is 
due to identifiable renal, endocrine, neuro- 
genic, and other disorders; this latter category 
of secondary hypertension will not be dis- 
cussed in this article. 


Hypertension as a Major Risk Factor 


It has been firmly established that moderate 
to severe hypertension increases morbidity as 
well as mortality from many diseases, es- 
pecially cardiovascular disorders. The most 
comprehensive and conclusive epidemiological 
study in this field is the Framingham study 
(Kannel, 1976), which has been in continuous 
operation since 1948. This study has found 
hypertension to be one of the most robust 
predictors of such life-threatening disorders 
as myocardial infarction, congestive heart 
failure, stroke, and damage to. kidneys, eyes, 
and other organs. Risk was found to increase 
proportional to increases in blood pressure, 
and hypertensives were found to be three 
times more likely to develop cardiovascular 
disease than normotensives. Both systolic and 
diastolic blood pressure were found to be of 
equal importance. 

If we consider that, for example, in the 
United States more than 5096 of all deaths 
are linked to cardiovascular and cerebrovas- 
cular disorders, and if we consider the enor- 
mous suffering of those affected and the stag- 
gering costs involved, it becomes obvious that 

prevention and treatment of hypertension and 
other known risk factors is of paramount 
importance. 


Consequences of Reducing Elevated 
Blood Pressure 


The importance of controlling moderate to 
severe hypertension for the reduction of mor- 
bidity and the prolongation of life has been 
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most convincingly demonstrated in two major 
double-blind studies by the Veterans Adminis-» 
tration Cooperative Study Group on Anti- 
hypertensive Agents (1967, 1970). Results 
clearly indicated that in comparison to ex- 
perimental subjects who received active phar- 
macological treatment, a significantly higher 
percentage of placebo group subjects devel- 
oped congestive heart failure, stroke, renal 
damage, retinopathy, and accelerated hyper- 
tension. However, pharmacological treatment 
did not significantly reduce the incidence of 
myocardial infarction. 
Medical opinion differs considerably with 
regard to the value of the treatment of mild 
hypertension, in the absence of clear evidence 
of its general preventative worth. However, 
the control of even small blood pressure ele- 
vations has been shown to reduce cardio- 
vascular disease in subjects who simultane- 
ously display other cardiovascular risk fac- 
tors such as excessive weight, smoking, and 
elevated cholesterol levels ( Kannel, 1976). 


Psychophysiological View of 
Essential Hypertension 


Despite some 40 years of intensive re-d 
search into the mechanisms and causes of 
essential hypertension, its etiology is still un- 
known. A number of physical correlates such 
as hereditary predisposition, salt intake, over- 
weight, and abnormalities in the renin-angio- 
tensin system (Kannel & Dawber, 1973) 
have been isolated, but researchers disagree 
widely as to their respective importance. 

Temporary rises in blood pressure in re- 
sponse to events that are subjectively per- 
ceived as exciting, demanding, or distressing 
have been observed, Essential hypertension 
has been found in people whose adaptive 
capabilities had been overtaxed in situations 
such as natural disaster or war, hazardous 
work environments and excessive work pres- 
sures, job loss and unemployment, migration 
and urbanization, and others, Further, re- 
search has shown that sustained blood pres- 
sure elevations can be produced experimen- 
tally in animals by prolonged elicitation of 
the emergency reaction, using such proce- 
dures as electrical stimulation of the hypo- 
thalamus, exposure to various aversive stim- 


uli, classical and operant conditioning, and 

| disruption of normal social interrelationships. 
"These blood pressure elevations were found 
to persist even after the termination of the 
aversive event. The role of psychosocial fac- 
tors in essential hypertension has been re- 
viewed by Gutmann and Benson (1971) and 
Henry and Cassel (1969). 

The apparent role of psychosocial factors 
has led to the development of psychological 
approaches to the treatment of essential hy- 
pertension. These approaches are usually 
based on a psychophysiological model in 
which the repeated and prolonged elicitation 
of the “emergency reaction" (Cannon, 1932) 
with its characteristic blood pressure lability 
eventually leads to stabile hypertension in 
predisposed individuals (Gutmann & Benson, 
1971; Henry & Cassell, 1969; Stoyva, 1976). 

+ Most studies attempting to control essential 
hypertension with the psychological tech- 

- «niques of relaxation/meditation and biofeed- 
back aim at reducing the sympathetic ner- 
vous system activity that mediates the emer- 
gency reaction. It is important to stress that 
the exact mechanism linking transient rises 
in blood pressure with sustained elevations is 
still unknown and that other models can also 

"&be used to understand the disorder. 


9 Description of Psychological Techniques 
Used in the Control of Essential 
Hypertension 


The most widely researched psychological 

. approaches to the control of essential hyper- 

tension fall into two major groups, namely, 
.relaxation/meditation and biofeedback. 


Relaxation/ Meditation 


Progressive relaxation. Among the ap- 
proaches explicitly aimed at producing physi- 
cal relaxation, one of the most widely used 
is Jacobson’s (1970) "progressive relaxa- 
tion." Here, the person is instructed to tense 
and relax various groups of striate muscles 
throughout the body, starting, for example, 
with the hands and arms and then progress- 
ing to facial muscles and muscles in the 
trunk and legs. It is worth mentioning that 
Jacobson (1939) was the first to observe 
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decreases in blood pressure concomitant with 
muscular relaxation. Progressive relaxation 
is also an integral part of “metronome-con- 
ditioned relaxation," a technique developed 
by Brady (1973). Here, the person is first 
taught to tense and relax major muscle 
groups. This is followed by more general in- 
structions to “re-lax and let go" paced by a 
metronome set at 60 beats/min. 

Meditation techniques. Meditation tech- 
niques, as a rule, were conceived within a 
particular philosophical and religious con- 
text. Only recently have they been used out- 
side this context or been combined with 
various behavior change methods (Shapiro & 
Zifferblatt, 1976). Meditation techniques are 
difficult to define as a whole; only the “con- 
centrative” meditation techniques will be con- 
sidered here. (For a detailed discussion of 
meditation from a psychological point of 
view, see Naranjo & Ornstein, 1971.) Their 
common feature seems to lie in learning to 
direct one’s attention toward a “mental de- 
vice" (Benson, 1975) or a focus with which 
the student becomes passively absorbed. A 
variety of mental devices such as mantras, 
chants, prayers, visual symbols, or one's 
breathing or heartbeat are used in the vari- 
ous meditation techniques. Their function is 
to reduce or eliminate conceptual thinking 
(*the mental chatter") and to facilitate the 
development of an encompassing focus on 
the present moment and concomitant feelings 
of calm. and relaxation. A common phenome- 
non in both relaxation and meditation tech- 
niques is that the practitioner frequently 
finds his or her attention shifting away from 
the mental device toward unrelated thoughts, 
ideas, images, preoccupations, worries, or 
sensations. These task-irrelevant thought 
processes are usually not dealt with satisfac- 
torily in the relaxation techniques, whereas 
the meditation approaches commonly con- 
tain explicit instructions for dealing with 
them. 

One of the more common meditation tech- 
niques is attention to the ingoing and out- 
going flow of one’s breath without controlling 
it. This technique of breath meditation has 
been popularized and simplified for use with 
hypertensive patients by Benson (1975) and 
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has been applied in several hypertension 
studies. It requires the person to think of 
the number 1 (or count up to higher num- 
bers) with each exhalation and to return to 
breath counting whenever distractions occur. 

The techniques described so far all in- 
volve attending to physical sensations or 
processes (e.g. breathing). In transcendental 
meditation (TM; Mahesh Yogi, 1968) a 
mantra or meaningless sound is introduced 
as an attentional focus, and the person is 
taught to repeat the sound mentally in an 
effortless way. 

Physiological effects and active components 
of relaxation/meditation techniques. A con- 
siderable amount of research data on the 
physiological effects of relaxation/meditation 
has been accumulated in recent years. Re- 
sults clearly indicate that physiological effects 
are not consistent across different forms of 
relaxation, and meditation and may even 
differ within a given technique depending on 
such variables as mode of instruction, con- 
text within which the technique is taught, 
number and length of training sessions, and 
type of subject population (Woolfolk, 1975). 
Regarding this latter variable, Davidson and 
Schwartz (1976) have convincingly argued 
that different clinical problems manifest 
themselves in different modes and physiologi- 
cal systems. Therefore, they may require 
different —relaxation/meditation techniques 
that specifically respond to the involved cog- 
nitive, somatic, or attentional processes. 

However, there is considerable evidence 
that common to these techniques is the elici- 
tation of a general physiological pattern that 
Stoyva and Budzynski (1974) termed “cul- 
tivated low arousal.” It is characterized by 
decreases in pulse and respiratory rate, de- 
creases in muscle tonus and oxygen con- 
sumption, and increases in skin resistance, 
This “relaxation response” (Benson, 1975) 
is seen as incompatible with and counter- 

acting the emergency reaction. Its frequent 
elicitation is assumed to lead eventually to a 
reduction in sympathetic activity and con- 
sequently to a lowering of blood pressure. 
Benson (1975) defined four components 
of the common active ingredients of relaxa- 
tion/meditation: (a) a quiet environment, 
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(b) a decreased muscle tonus, (c) a passivi 
attitude, and (d) the restriction of one's at- 
tention to a mental device. This latter com- 
ponent is generally seen as most importan 
in explaining the mechanisms that underli 
relaxation/meditation (Naranjo & Ornstein, 
1971). There is a great need for more re- 
search, as the only well-controlled study 
(Smith, 1976) found that neither a mental 
device nor a passive attitude were necessary 
components of meditation. 

Common to all relaxation/meditation tech- 
niques is that they require the student to 
intersperse his or her ongoing activities once! 
or twice daily with a 15-20-minute period of 
just sitting quietly. This component alone may 
sufficiently explain the observed changes. 
Several studies comparing the physiological) 
short-term effects of TM with a control con- 
dition of just sitting found that they both 
resulted in comparable physiological changes 
(Travis, Kondo, & Knott, 1976; Treichely 
Clinch, & Cran, 1973; Walrath & Hamilton, 
1975). 


Biofeedback 


In contrast to the relaxation/meditation 
techniques, biofeedback requires a highly 
sophisticated technology that has been de- 
veloped only in recent years, In blood pres- 
sure biofeedback training, pressure is re- 
corded on a continuous or noncontinuous 
basis, and the information is fed back to the 
subject in the form of a light and/or sound 
signal or by letting subjects directly observe 
their blood pressure record. This allows the 
subject to become aware of fluctuations in 
blood pressure and to learn to exercise vol- 
untary control. 

Constant cuff technique. Shapiro and his 
associates (e.g. Shapiro, Tursky, Gershon, 
& Stern, 1969) have made the most impor- 
tant contribution in this field with their 
“constant cuff” method that allows continu- 
ous, beat-by-beat feedback of blood pressure: 
It involves mounting a crystal microphone 
inside a standard pressure cuff and placing 
lt over the brachial artery. The cuff is in- 
flated and set at a constant pressure close 
to the person's average systolic or diastolic 
blood pressure. Whenever the person's blood 
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pressure rises above this set level, a Korot- 
koff sound can be detected by the micro- 
phone. With each beat of the heart, the 
person receives information (yes-no feed- 
back) as to whether his or her blood pressure 
is above or below the set average blood pres- 
sure (binary feedback). After a trial of 50 
consecutive heartbeats, the cuff is deflated 
for 30 sec and then set at a new level de- 
pending on the average blood pressure in the 
previous trial, One session consists of 25 
(1969) and 
Shapiro, Schwartz, and Tursky (1972) were 
the first to show that small but reliable 
changes in systolic and diastolic blood pres- 
sure could be achieved with this method in 
normotensive volunteer subjects. (For a re- 
view of blood pressure biofeedback studies 
with normotensives, see Blanchard & Young, 
1973.) Benson, Shapiro, Tursky, and Schwartz 
(1971) were the first to demonstrate the 


“potential usefulness of blood pressure bio- 


feedback training in the treatment of essen- 
tial hypertension. Since then several other 
blood pressure biofeedback studies have been 
conducted, which are presented in the fol- 
lowing section. 

The constant cuff method has recently 
been further improved (Elder, Longacre, 
Welsh, & McAfee, 1977) by adding a track- 
ing device that monitors blood pressure beat 
by beat and automatically adjust cuff pres- 
sure every three to four heartbeats. Two 
pressure cuffs, one on each arm, that are 
alternately inflated for 100 sec are used. 
Feedback is given via an audible tone that 
changes in pitch with fluctuations in blood 
pressure. So far this method has only been 
reported in experiments with normotensives 
(Elder, Welsh, Longacre, & McAfee, 1977) 
but is likely to be used with hypertensive 
subjects in the near future. 

Pulse wave velocity technique. Pulse wave 
velocity as an indirect measure of mean 
arterial pressure has recently received con- 
siderable attention (e.g., Gribbin, Steptoe, & 
Sleight, 1976). It has also been used for 
feedback purposes with essential hyperten- 
sives (Walsh, Dale, & Anderson, 1977). In 
Walsh et al. pulse wave velocity was deter- 
mined by measuring the pulse transit time 
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between the heart's right ventricular action 
and the finger pulse. It was fed back to the 
subject both auditorily, in the form of a 
tone that became higher or lower as the pulse 
transit time increased or decreased, and visu- 
ally, on an oscilloscope (analogue feedback). 
Pulse transit time has been found to relate 
inversely to mean arterial pressure (Steptoe, 
Smulyan, & Gribbin, 1976). Despite its 
great advantage of avoiding the intrusive 
effects of constant cuff pressure and repeated 
cuff inflations, this technique has so far not 
been applied on a larger scale with essential 
hypertensives. 

Noncontinuous blood pressure feedback 
techniques. Two noncontinuous feedback 
techniques have been studied in the control 
of essential hypertension. Elder, Ruiz, De- 
abler, and Dillenkoffer (1973) applied a 
blood pressure recorder that automatically 
measured diastolic blood pressure every 2 
min for a total of 20 successive determina- 
tions. Subjects received a red-light signal 
(and verbal praise) contingent on increas- 
ingly larger blood pressure reductions, Sub- 
jects in Shoemaker and Tasto’s (1975) ex- 
periment were able to directly observe their 
blood pressure recording at 90-sec intervals 
via a mirror placed above the chart recorder. 
Two straight lines representing the person’s 
pretrial average systolic and diastolic blood 
pressures were superimposed on the record- 
ing, and subjects were instructed to lower 
their blood pressure so that the recorded 
blood pressure would fall below the lines. 

Noncontinuous blood pressure feedback 
has the disadvantage of not being sufficiently 
sensitive to cope with the inherent variabil- 
ity of blood pressure. Shannon, Goldman, 
and Lee (1978), comparing three types of 
systolic blood pressure feedback with normo- 
tensive subjects, found continuous (beat-by- 
beat) binary feedback clearly superior to a 
relatively continuous proportional and a non- 
continuous (75-sec intervals) proportional 
feedback condition, This suggests that the 
greater the time lag between beat-by-beat 
blood pressure changes and feedback, the less 
effective is the technique. 

Physiological effects and active compo- 
nents of blood pressure biofeedback. Al- 
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though a great number of experiments with 
normotensive subjects have been conducted 
to tease out the essential ingredients in blood 
pressure feedback, the exact mechanisms and 
processes involved are still unknown. Whereas 
the нона approaches aim at 
an indirect control of blood pressure via a 
generalized relaxation response, blood pres- 
sure biofeedback aims at specifically or di- 
rectly altering blood pressure as such. But 
experimental data are inconsistent. Some 
researchers have confirmed this specificity 
of learned blood pressure control (e.g., Kristt 
& Engel, 1975), whereas others have reported 
concomitant changes in other cardiovascular 
parameters (e.g., Fey & Lindholm, 1975). 
, It is generally assumed that the beat-by- 
beat feedback of blood pressure is the most 
important element in the technique. How- 
ever, there is experimental evidence that 
normotensive and hypertensive subjects can 
change blood pressure by instruction alone, 
that is, by simply being told to change 
blood pressure in the desired direction with- 
out instructions on how to achieve this. 
Redmond, Gaylor, McDonald, and Shapiro 
(1974), using essential hypertensives as sub- 
jects, found instructions alone to be effective 
in reducing systolic and diastolic blood pres- 
sure by 8-14 mmHg and 6-11 mmHg, re- 
spectively. Results with normotensives have 
been inconsistent. In Steptoe’s (1976). ex- 
periment, normotensive subjects who received 
instructions alone reduced blood pressure 
equally as effectively as subjects who, in 
addition, received pulse wave velocity feed- 
back. The addition of exteroceptive feed- 
back, however, enhanced the increase condi- 
tion, In a subsequent study, Steptoe (1977) 
controlled for environmental stimulation by 
exposing subjects in the instruction-only 
group to identical visual displays as feedback 
subjects. Results favored the feedback con- 
dition that produced greater increases in 
pulse transit time and by inference, greater 
decreases in mean arterial blood pressure. 
Much more could be said about this issue, 
but it would be beyond the scope of this 
article. (For a detailed discussion, see Brener, 
1974, and Shapiro, 1977.) 
More research is necessary to determine 
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the active ingredients in blood pressure bio- 


feedback training. The role of attentional, } 


cognitive, and imagery processes as well as 
the role of reinforcement and individual dif- 
ferences is far from clear. To develop the 
most effective blood pressure technique, stud- 
les are needed that compare the constant 
cuff and the pulse velocity techniques, con- 


tinuous and noncontinuous, binary and pro- | 


portional feedback, and systolic and diastolic 
blood pressure feedback. 
Electromyograph (EMG) and Galvanic 


GSR feedback, alone or in combination with 
other techniques, have also been studied in 
the control of essential hypertension, These 
techniques do not aim at specifically con- 
trolling blood pressure but rather aim at fa- 
cilitating a general relaxation response. In 


two unpublished studies (Love, Montgomery, ' 


|; 
Skin Response (GSR) feedback. EMG and | 


& Moeller, Note 1; Montgomery, Love, & Á 


Moeller, Note 2) subjects were trained in 
EMG feedback, progressive relaxation, and 
autogenic training; Patel (1973, 1975a) and 
Patel and North (1975) combined EMG and 
GSR feedback with various relaxation/medi- 
tation techniques. 


Review of the Literature on the Psychological ' 


Control of Essential Hypertension 


The 21 most recent and important single- 
group and between-groups studies in the 
field are reviewed and summarized in table 
form. They have been grouped into the fol- 
lowing four categories: (a) blood pressure 
biofeedback studies (Table 1); (b) relaxa- 


tion/meditation studies (Table 2); (c) mixed | 


studies, that is, studies in which various feed- 
back and relaxation/meditation techniques 
have been combined (Table 3); and (d) 
comparative studies, that is, studies compar- 
ing blood pressure biofeedback and relaxa- 
tion/meditation training (Table 4). These 
tabular presentations are followed by a de- 
tailed methodological critique including sug- 


= 


gestions for future research and a discussion ^ 


of results. 


Methodological Critique 
A careful inspection of the tables clearly 


reveals that most of the reviewed studies à 
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have methodological faults, many of them 
serious ones. A definitive evaluation of the 
benefits of psychological procedures in the 
control of essential hypertension is therefore 
not possible. The following methodological 
discussion not only points out the faults in 
these studies but also suggests methodological 
requirements for future research. 

Subject selection. In many of the studies, 
subject-selection criteria were not reported. 
For example, length of time a person had 

, been diagnosed as essential hypertensive, age 
limits, and the required blood pressure level 
for inclusion into the experiment were often 
not defined. Some researchers worked with 
young subjects in whom hypertension is a 
rare phenomenon (e.g. the average age of 
subjects in the Stone and DeLeo, 1976, study 
was 28 years) or with borderline hyperten- 

t sives (e.g., Shoemaker & Tasto, 1975; Sur- 
wit, Shapiro, & Good, 1978). Also, in several 

“Tcases the diagnosis of essential hypertension 
was not verified (e.g., Elder & Eustis, 1975). 
It is recommended that subject samples be 
homogeneous with respect to (a) an estab- 
lished diagnosis of essential hypertension, 
(b) a minimum hypertension history, and 
(c) a minimum pretrial blood pressure. 

Y Concurrent pharmacological treatment. 
Hypotensive medication and tranquilizers 
were being taken by subjects in 14 out of 
the 21 studies, In most studies, drug dosages 
were systematically stabilized or decreased 
during the training period. In some investiga- 
tions, however, drugs or drug dosages were 
changed for medical reasons unrelated to the 
research project during the training period 

А (Taylor, Farquhar, Nelson, & Agras, 1977) 
or during follow-up (Blackwell et al., 1976). 
This, of course, makes a meaningful interpre- 
tation of data difficult. It is possible that 
certain hypotensive drugs interact with vari- 
Ous training procedures, but this has not yet 
been researched. Deabler, Fidel, Dillenkoffer, 

„апа Elder (1973) trained “both a drug and 

tno-drug group in progressive relaxation and 
hypnosis, and found no apparent Drug X 
Training interaction. Unfortunately, no sta- 
tistical analyses were performed, and the 
Overall quality of the study was poor. One 
other aspect worth mentioning here is that 


> 
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subjects on medication who participate in 
biofeedback or relaxation/meditation train- 
ing may change their drug-taking behavior 
as a result of the training. So far, drug com- 
pliance has not been investigated or reported 
in any of the relaxation/meditation or bio- 
feedback studies. To avoid serious interpre- 
tative problems, it is recommended that in- 
vestigators treat subjects who receive medi- 
cation as a distinct group from those who do 
not. 

Baseline assessment of blood pressure. 
As mentioned before, blood pressure responds 
to a great variety of environmental and in- 
ternal factors and exhibits considerable di- 
urnal and beat-to-beat variability. Dollery 
(1973) has shown that systolic blood pres- 
sure during a 24-hour period can be as low 
as 65 mmHg during sleep and as high as 
170 mmHg (and over) during maximal exer- 
tion. Shapiro and Surwit (1976) mention 
that a series of measurements taken every 
half minute can have a range of up to 30 
mmHg. It is well documented (Dunne, 1969; 
Pickering, 1968) that persons react to the 
taking of blood pressure with elevations in 
pressure. In the course of repeated assess- 
ments, as subjects adapt to the laboratory 
situation, blood pressure typically decreases. 

In the studies reviewed, several research- 
ers report no baseline period other than the 
measurement of blood pressure immediately 
prior to the first training session (Deabler 
et al., 1973; Elder & Eustis, 1975; Goldman, 
Kleinman, Snow, Bidus, & Korol, 1975; 
Kristt & Engel, 1975; Patel, 1973; Walsh 
et al, 1977; Hager & Surwit, Note 3), or 
they reported baseline measures for less than 
1 week (Elder et al, 1973; Patel & North, 
1975; Shoemaker & Tasto, 1975; Surwit et 
al, 1978). It is highly likely that in some 
studies, training effects were confounded with 
blood pressure variability and the effects of 
adaptation and resting. In single-case experi- 
mental research, measurements of blood pres- 
sure should be repeated until they stabilize. 
No clear guidelines have yet been established 
for group studies. However, it is safe to say 
that four assessment sessions over a 4-week 
period is a minimum requirement. 

Control groups. The arguments given 
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previously not only suggest the use of ex- 
tended baseline assessment of blood pressure 
but also the inclusion of a waiting list con- 
trol group. Type, length, and number of 
assessment sessions should be identical for 
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all groups throughout the baseline and train- 
ing or control period. A further reason for 
the inclusion of a control group is that the 
mere fact of being attended to, the fact that 
something is being done about their prob- 


! 


Psychological Control of Essential Hypertension: Biofeedback Studies 


Concurrent 
pharmacological 
n treatment Baseline Training procedure 
Benson et al. (1971) 
7 6 subjects on Average of 11 sessions Systolic BP feedback: 
stable dosage over 2 weeks (sessions Constant cuff method, contingent slide 
of hypotensive run until BP projection, and monetary rewards 
medication stabilized) 


Elder et al. (1973) 


18 Several subjects on 1 session of 20 BP 1. Diastolic BP feedback (n = 6): 
stable (?) dosage determinations Visual signal every 2 min. contingent 
of various CNS on diastolic BP reduction 
depressants 2. Diastolic BP feedback, contingent 

verbal approval ( = 6) 
3. Control group (п = 6): 
Asked to relax and lower BP, no 
feedback 
Elder & Eustis (1975) 
22 20 subjects on stable No baseline period, Diastolic BP feedback: 


(?) dosage of 
psychotropic or 
hypotensive 
medication 


only 


pretrial baseline 


Visual feedback (green light for changes | 
below, red light for no change or | 
increase above basal pressure) at 

1-min. intervals, verbal reinforcement 


———M—M————wgwieeroremeienn no 0 
Kristt & Engel (1975) 


5 All subjects on No baseline period, 
stable dosage of pretrial baseline 
hypotensive only 
medication 


Systolic BP feedback: 
Constant cuff method, display of 
cumulative performance scores; Week — . 
1: Raise systolic BP; Week 2: Lower ~ 
systolic BP; Week 3: Alternately 
lower, raise, lower BP within single 
session 
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Patel, 1975a; Stone & DeLeo, 1976) sub- 
jects were not allocated in random fashion. 
In addition, several of the control conditions 
were inadequate. In some studies, especially 
those investigating within-session changes, 


~ –=– 


(and follow-up) 


and follow-up 


Statistical 
analysis 


Control 


Average of 22 45- 
min. sesssions on 
consecutive days 
(sessions run until 
no reduction for 5 
sessions) 


Benson et al. (1971) 


Within-session decrease: 
—17 mmHg systolic, 5 
out of 7 subjects 
responded 


Significant within- 


session reduction 
(ANOVA) 


Extensive baseline; no 
control groups; no 
across-sessions mea- 
sures; no follow-up 


7 40-min. sessions 
over 3 days, 1- 
week follow-up 


Elder et, al. (1973) 


1. 7% reduction in diastolic 


BP by Sessions 7 and 8 


2. 20% reduction in diastolic 


BP in Sessions 3-8 


3. No significant changes in 


BP 


Group 2 significantly 


superior to Groups 1 
and 3 in sessions 3-8; 
Group 1 superior to 3 
in sessions 8 and 9 
(anova); reductions 
maintained at follow- 
up, (n = 11) no sta- 
tistical tests 


Control group; subjects 
started on salt-free 
diet 3 days prior to 
training; no extended 
baseline and follow-up; 
selective reporting and 
lack of detail in results 
section 


Spaced sessions 
(n = 19): 
8 sessions of 20 
trials each over 7 
weeks, 1-month 
follow-up 
Massed sessions 
(n = 4): 


Elder & Eustis (1975) 


% difference from basal 
pressure presented in 
graphical form only 


Within-session reduction : 


Approximately 5% sys- 


Significant difference 


between first and 
second half of train- 
ing sessions (Mann- 
Whitney U) 


lem, may bring considerable relief to sub- 
i jects, with subsequent falls in blood pressure 
over time. Only 10 out of the 21 studies 
reported here included a control condition, 
but in 3 of these (Goldman et al, 1975; 
Table 1 (continued) 
Frequency and 
duration BP changes 
of training pre/posttest 


tolic + diastolic (some- 
what larger for massed 
sessions) 

Follow-up: 
Within-session reduc- 
tions approximately 3% 
systolic and diastolic 


No properly verified 
diagnosis of essential 
hypertension ; no con- 
trol groups; data 
based on % difference 
from basal pressure, 
which was assessed 
only once in first train- 
ing session ; data re- 
ported in confusing 
way 


№ 10 sessions over 12 

days, no follow-up 

42 sessions of 3 blocks 
of 10 trials each 


А over 3 weeks, 3- 


month follow-up 


Kristt & Engel (1975) 


Subjects reliably increased 


and decreased systolic BP, significant for all 3 


ability maintained at conditions 
follow-up No statistical tests 
Pretest follow-up BP reduc- reported 


tions as recorded by 4 
subjects at home: 

—18.2 mmHg systolic 
— 7.5 mmHg diastolic 


Trend analysis statistics No control groups; no 


clinic assessment of 
pre-follow-up changes 
in BP reported 


\ 


(table continued on pages 1024-1025) 
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Concurrent 
pharmacological "у 
n treatment Baseline Training procedure 
Goldman et al. (1975) 
11 None No baseline period, 1. Systolic BP feedback (и = 7): 
pretrial baseline Constant cuff method 
only 2. Control group ( = 4): 
Asked to relax in their own best way 
Kleinman et al. (1977) 
8 None 3 2-hour clinic Systolic BP feedback: 


sessions over 3 


Constant cuff method 


weeks, BP record- 

ing by subjects 5 

times daily over 2 | 

weeks 
| 
| 
| 

È 


Nole. BP = blood pressure; CNS = central nervous system; ANOVA = analysis of variance. Reported | 


control subjects either did not sit for the 
same length of time as subjects in the train- 
ing conditions (Deabler et al., 1973; Shoe- 
maker & Tasto, 1975) or attended fewer 
sessions than the training group (Goldman 
et al, 1975; Taylor et al., 1977). 

As in all forms of psychological interven- 
tion, nonspecific treatment factors undoubt- 
edly play an important role in biofeedback 
and relaxation/meditation training. Various 
experiments (e.g., Goldring, Chasis, Schrei- 
ner, & Smith, 1956; Grenfell, Briggs, & Hol- 
land, 1964) have suggested that placebo 
treatments can have dramatic effects on 
blood pressure, although these results were 
confounded by the fact that treatment coin- 
cided with the beginning of longer hospital- 
ization periods. In the present review only 
two studies included controls for nonspecific 
treatment effects. Taylor et al. (1977) used 


t 


nondirective discussion groups in which sub- 
jects monitored and explored life's tensions 
and discussed solutions. Frankel, Patel, Hor- | 
owitz, Friedwald, & Gaardner (1978) com- 
pared noncontingent diastolic blood pres- 
sure feedback with contingent feedback and. 
also used a waiting list control group. In 
both studies no significant blood pressure 
reductions were observed in the control 
groups. Unfortunately, it was not determined 
whether the nonspecific treatment control | 
conditions were as credible as the actual | | 
training procedures, As Kazdin and Wil- 
coxon (1976) pointed out, subjects who аге“ 
exposed to control conditions that are less 
credible than the treatment condition are less 
likely to expect improvement, Expectancy of 
improvement is one of the most powerful 
nonspecific treatment factors, and it can be 
hypothesized that the apparent beneficial 
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o Frequency and 
duration 
of training 

(and follow-up) 


BP changes 
pre/posttest 
and follow-up 


Statistical 
analysis 


Control 


Goldman et al. (1975) - 


1. 9 2-hour sessions of 1. Within-sessions: 
30 trials each over —7 mmHg systolic 
9 weeks Across-sessions: 
2. 3 2-hour sessions —6 mmHg systolic 
over 3 weeks —15 mmHg diastolic 
. Within-session 7 
—1 mmHg systolic 
Across-sessions: 
4 mmHg systolic 
—4 mmHg diastolic 


Within-session : 
Reductions significant 
for Group 1 only 
(ANOVA) 

Across-sessions : 
Changes significant 
for Group 1 on dia- 
stolic BP only 
(ANOVA) 


Groups 1 and 2 had dif- 
ferent pretraining BP, 
unsophisticated BP 
measurement; no 
random allocation ; 
different number of ses- 
sions given to Groups 
1 and 2; no follow-up 


Kleinman et al. (1977) 


9 2-hour sessions of — Within-session: 


25-30 trials each a. Control session: 

over 9 weeks; —.5 mmHg systolic 
follow-up BP —1.6 mmHg diastolic 
recorded by sub- b. Feedback session : 
jects up to 4 —4 mmHg systolic 
months —4 mmHg diastolic 


Across-sessions (as recorded 
by subjects) : 
—8 mmHg systolic 
—9 mmHg diastolic 
Maintenance of BP reduc- 
tion at follow-up (n = 3) 


* 


Within-session : 
Reduction significant 
for Group b only on 
systolic BP (¢ test) 


No control groups; no 
report of laboratory- 
recorded across-sessions 
changes 


Across-sessions : 
Reductions significant 
for systolic and 
diastolic BP (¢ test) 

No statistical test on 
follow-up data 


blood pressure changes refer to across-session changes unless otherwise specified. 


effect of a particular treatment may simply 
be due to this effect (Borkovec & Nau, 
1972). For future research it is therefore 
important not only to include nonspecific 
control groups but also to ascertain that 
* there is equal expectancy of improvement 
across conditions (Steinmark & Borkovec, 
1974). 
Length of training and follow-up period. 
A further drawback of many of the studies 
is that they use short training periods of 
sometimes not more than 1 week (Deabler 
et al, 1973; Elder et al., 1973) or use only 
~ а small number of training sessions. In stud- 
ies reporting negative or statistically signifi- 
cant but clinically irrelevant results, it is 
therefore difficult to evaluate whether the 
technique was inefficient or the training was 
not intensive enough. In addition, follow-up 
assessment was frequently omitted or cov- 


ered too short a period of time. The long- 
term durability of training effects is clearly 
a crucial issue that needs much more careful 
attention in future research. A minimum 
training period of 3 months and a follow-up 
period of 1 year is therefore suggested. Even 
longer follow-ups are necessary to determine 
whether and how the psychological control 
of essential hypertension effects morbidity 
and mortality. 

Home practice and life-style changes. 
Sixteen out of 21 researchers asked their 
subjects to practice their respective tech- 
niques daily at home. Often, however, de- 
tails of home practice procedures were not 
given, and only one researcher (Frankel et 
al, 1978) reported on actual compliance 
rates. Despite this lack of systematic data, 
there is evidence which suggests that regular 
daily home practice is crucial in achieving 
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and maintaining reductions in blood pres- treatment, in which medication is usually 
sure. The commitment involved is similar to taken on a regular daily basis and is a life- 
if not greater than that in pharmacological long commitment. It is recommended that home 


Table 2 
Psychological Control of Essential Hypertension: Relaxation/ Meditation Studies 


Concurrent 
pharmacological 
n treatment Baseline "Training procedure 
Deabler et al. (1973) 
21 9 subjects on stable No baseline period, 1. Progressive relaxation and hypnosis (n = 6) 
dosage of hypo- pretrial baseline only 2. Progressive relaxation, hypnosis, and 
tensive medica- drugs (п = 9) 
tion assigned to 3. Control group (n — 6) 
drug group 7 BP checks over 4-5 day period 
Benson et al. (1974a) 
22 None About 6 sessions over TM 
6 weeks 
Benson et al (1974b) 
14 All subjects on stable About 6 sessions over TM 
dosage of hypo- 6 weeks 
tensive medica- 
tion 
Blackwell et al. (1976) 
7 All subjects on stable Up to 10 sessions over TM 
dosage of hypo- period of up to 10 
tensive medica- weeks 
tion throughout 
training period ; 
dosage changes 
during follow-up 
Stone & DeLeo (1976) 
19 None 14 determinations over l. Breath-counting meditation (n — 14) 
10-14 days 2. Control group (n — 5) 


6 BP checks over 6-month period 


T 


— 


а 
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practice frequencies be recorded and reported 


| as data in all future research. 
It is worth adding that subjects who regu- 
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larly practice a relaxation/meditation tech- 
nique may as a consequence undertake 
changes in their life-style. To date, such 


p oo ——— ħõōõI 


Frequency and 


duration BP changes 
of training pre/posttest and 
(and follow-up) follow-up Statistical analysis 


Control 


Deabler et al. (1973) 


8-9 sessions over Within session reductions — Within-session reductions 


Selective reporting of 


4-5 days in last session: significant for Groups 1 results; no across-ses- 
1. —17 mmHg systolic and 2 (ANOVA); no sions comparisons; 
—19 mmHg diastolic statistical tests per- training period too 
2. —16 mmHg systolic fomed to compare short ; controls re- 
—14 mmHg diastolic groups ceived pretrial readings 
3. no significant changes only and did not sit for 
same length of time; no 
; follow-up 
t Benson et al. (19742) 


Reductions significant for 
systolic and diastolic 
BP (t test) 


—7 mmHg systolic 
—4 mmHg diastolic 


6 sessions of 1-1} 
hours each (and 
checking sessions) 
over 6 months 


Subjects were self-selected ; 
no control groups; no 
follow-up 


Benson et al. (1974b) 


Reductions significant for 
systolic and diastolic 
BP (t test) 


—11 mmHg systolic 


ў 6 sessions of 13-2 
—5 mmHg diastolic 


hours each (and 
checking ses- 
sions) over 5 
months 


Subjects were self-selected ; 
no control groups; lack 
of detail in results sec- 
tion; no follow-up 


Blackwell et al. (1976) 


No statistical tests per- 
formed for the whole 


group 


Clinic measures: 
—4 mmHg systolic 
—2 mmHg diastolic 
Follow-up: 
—3 mmHg systolic 
—4 mmHg diastolic 


6 sessions of 13-2 
hours each (and 
10 checking ses- 
sions) over 9-12 
è weeks, 6-month 
follow-up 


Recording of BP at home 
‘and clinic; no control 
groups; changes in drug 
treatment during 
follow-up 


Stone & DeLeo (1976) 


Reduction significant for 
Group 1 only on mean 
arterial pressure (t 
test) 


5 20-min. sessions 
over 6 months 


1. Supine: 
—9 mmHg systolic 
—8 mmHg diastolic 
Upright: 
—15 mmHg systolic 
—10 mmHg diastolic 
2. Supine: 
1 mmHg systolic 
2 mmHg diastolic 
Upright: 
—2 mmHg systolic 
+0 mmHg diastolic 


m^ 


Assessment of various bio- 
chemical variables; in- 
dependent observer ; no 
random allocation ; no 
follow-up 


(table continued on pages 1028-1029) 
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Concurrent 
pharmacological 


n treatment Baseline 


2 sessions over 2 
months 


31 All subjects on 
hypotensive 
medication, 
changes during 
training period 
for 14 subjects 


2. Nonspecific therapy (n — 10): 
Nondirective discussion groups (and 
self-monitoring) 

3. Progressive relaxation (n = 10), 
breathing, imagery exercises and 
self-monitoring 


No. of sessions 
not reported, 
3-month baseline 
period 


20 9 subjects on 
stable dosage of 
hypotensive medi- 
cation 


Pollack et.al. (1977)* 


TM 


Note. BP = blood pressure; TM = transcendental meditation; ANOVA = analysis of variance. 


potential changes (e.g., diet, exercise, smok- 
ing, alcohol consumption, use of drugs) 
have not been monitored but may well make 
some contribution to the reduction of blood 
pressure. 

Assessment and reporting of training ef- 
fects. The quality of studies was highly 
variable both in terms of their assessment 
procedures and the completeness of reports. 
In this context it is important to point out 
the difference between within-session and 
across-sessions measures. Within-session mea- 
sures are typically taken either while the 
person is practicing a self-regulatory tech- 
nique or at the end of the training session 
when blood pressure is likely to be lowest. 
To determine whether results achieved under 
training conditions have generalized to non- 
training conditions, blood pressure has to 
be measured and reported across sessions, 
that is, independent of or immediately prior 
to the self-regulatory practice. All blood 


Р 
Training procedure 
Taylor et al. (1977) 
1. Medication control group (n — 11) 
f 


pressure values presented in Tables 1-4 were 
based оп across-sessions measures unless 
otherwise specified. 

In several studies, baseline blood pressure 
values were compared with those at the 
end of the last training session (Elder & 
Eustis, 1975; Elder et al, 1973; Goldman ; 
et al, 1975; Patel, 1973; Walsh et al, 
1977). This of course, creates biased results 
that are likely to reflect both the effects of 
adaptation and resting and those of specific 
training. 

In other studies, across-sessions measures 
were simply not reported (Benson et al., 
1971; Deabler et al, 1973; Elder et al, | 
1973; Elder & Eustis, 1975; Kleinman et © 
al., 1977), although these data must have | 
been taken at least in the biofeedback stud- | 
les, if only for calibration purposes. Finally, | 
some experimenters relied on blood pressure 
values as recorded by the subjects themselves 
without reporting any assessment of relia- 
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Frequency and 
duration BP changes 
of training pre/posttest and 
(and follow-up) follow-up 


Statistical analysis Control 


Taylor et al. (1977) 


—1 mmHg systolic 
+0 mmHg diastolic 
Follow-up: 
—7 mmHg systolic 
—2 mmHg diastolic 
. —3 mmHg systolic 
—2 mmHg diastolic 
Follow-up: 
—4 mmHg systolic 
—4 mmHg diastolic 
. —14 mmHg systolic 
—5 mmHg diastolic 
Follow-up: 
—12 mmHg systolic 
—6 mmHg diastolic 


For Groups 2 апа 1. 
3, 5 30-min. 
sessions over 8 
weeks, 6-month 
follow-up 


Group 3 superior to 


No significant difference 


Independent observers; 
therapist unaware of 
results; no waiting list 
control group; medica- 
tion changes; unsophis- 
ticated measurement of 
BP 


Groups 1 and 2 on 
systolic BP only, 
significant difference 
between Groups 1 and 2 


between groups for 
systolic and diastolic 
BP at follow-up 
(sign test) 


Pollack et al. (1977)* 


After 3 months: 
—10 mmHg systolic 
—2 mmHg diastolic 
After 6 months: 
—6 mmHg systolic 
—2 mmHg diastolic 


" 6 sessions of 14-2 
hours each (and 
10 checking ses- 
sions) over 6 
months 


Only reduction of systolic 


Measurement of plasma- 
renin activity ; no con- 
trol groups; incomplete 
reporting; no follow-up 


BP after 3 months 
significant, all other 
comparisons not 
significant 


a All values are approximate. 


` bility or comparative laboratory results 
(Kleinman, Goldman, Snow & Korol, 1977; 
Kristt & Engel, 1975; Hager & Surwit, Note 
3). 
For future research the simultaneous use 
and detailed reporting of within-session and 
across-sessions measures are strongly recom- 


х mended. Temporary blood pressure reduc- 


< 


~ 


tions that occur only during training are of 
. limited clinical importance. On the other 
hand, exclusive reliance. on across-sessions 
measures prohibits the acquisition of infor- 
mation on the short-term effectiveness of a 
given technique. The concomitant use of 
both measures is particularly relevant in 
understanding failure to respond to training. 
Regarding the quality of blood pressure re- 
cording, researchers are urged to employ in- 
dependent observers and to use either auto- 
mated recording devices (Krausman, 1975; 
Tursky, 1974) or the random-zero-sphygmo- 
_ manometer (Wright & Dore, 1970). 


~ 


Generalization of training effects. The 
generalization of training effects is an im- 
portant issue that has received much dis- 
cussion but has not yet been systematically 
studied. Although several experimenters used 
different settings for training and assessment 
(Benson, Rosner, Marzetta, & Klemchuck, 
1974a, 1974b; Blackwell et al., 1976; 
Frankel et al., 1978; Pollack, Weber, Case, 
& Laragh, 1977; Taylor et al., 1977), in 
the majority of studies reviewed here, train- 
ing effects were assessed in the same en- 
vironment in which training took place. It 
can therefore be argued that training effects 
were specific to the particular experimenter 
and to the laboratory setting in which sub- 
jects were trained and that in some cases the 
obtained changes would not have been main- 
tained in a nonlaboratory environment. So 
far this latter issue has not been scientifically 
investigated. 

Further, all training and assessments have 
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Table 3 


PETER SEER 


Psychological Control of Essential Hypertension: Mixed Studies | 


Concurrent 
pharmacological 
n treatment 


Baseline 


Training procedure 


Patel (1973) 


20 All subjects on hypotensive 
medication, 12 subjects 
stopped or reduced medica- 
tion during experiment 


20 All subjects on hypotensive 
medication, 2 subjects 
stopped medication dur- 


No controlled 1. Various relaxation and meditation 
baseline assessment techniques and GSR feedback 
(n = 20) 
Patel (1975а) 
No controlled 2. Control group (medication only) 
baseline assessment (n = 20): 
$ hour resting instead of 
training | 


ing experiment 


Patel & North (1975) | 


34 АП subjects on stable 3 sessions on 3 1. Various relaxation and meditation 
dosage of hypotensive separate days techniques, GSR and EMG 
medication feedback, and self-control pro- 

cedures (n — 17) 
2. Control group (medication only) 
(n = 17): Í 
4 hour resting instead of training 
Frankel et al. (1978)* 

22 7 subjects on stable dosage 8 determinations over 1. Diastolic BP and EMG feedback, 
of hypotensive medica- 6-8 weeks autogenic training, and progressive 
tion relaxation (n = 7): 


Constant cuff method 
2. Placebo treatment (n = 7): 
Noncontingent diastolic BP 
feedback only j 
3. Control group ( = 8): 
Weekly BP checks only | 


Note. BP = blood pressure; GSR = galvanic skin response; EMG = electromyograph. 


been conducted in a sitting or recumbent 
position in an undemanding environment 
under conditions of drastically reduced ex- 
ternal stimulation, Although blood pressure 
taken under standard resting conditions 
gives us useful information, this type of data 
tells us nothing about whether training ef- 
fects have generalized to the person’s every- 
day environment or not. Blood pressure 


changes, especially the speed and magnitude 
of elevations and their rate of recovery, are 
also of importance. It is hoped that this , 
crucial issue of generalization of training 
effects will be approached soon. It requires 
the use of portable monitoring equipment, 
which is now available (Littler, Honour, 
Pugsley, & Sleight, 1975; Littler, Honour, 
Sleight, & Stott, 1972). One perhaps more | 
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Table 3 (continued) 


Frequency and 
duration 
of training 
(and follow-up) 


BP changes 
pre/posttest 
and follow-up 
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Statistical 


analysis Control 


Patel (1973) 


36 }-hour sessions 1, Reduction from first to 
over 3 months last training session (BP 
taken during training 
session) : 
—20 mmHg systolic 
—13 mmHg diastolic 


Reduction significant for 
systolic + diastolic BP 
(t test) 


No independent observer 


Patel (1975a) 


Follow-up: 
—15 mmHg systolic 
—13 mmHg diastolic 


9-month follow-up, 19 
Group 2 same as 
Group 1, 9-month 


1. No statistical test on 
follow-up data 
2. Reductions on systolic 


No random allocation, 
medication changes 
during follow-up 


follow-up 2. —1 mmHg systolic and diastolic BP not 
—2 mmHg diastolic significant (¢ test) 
Follow-up: 
1 mmHg systolic 
—1 mmHg diastolic 
Patel & North (1975) 
12 4-hour sessions 1. —26 mmHg systolic Reductions significant Independent observer ; 


—15 mmHg diastolic 
Reduction maintained at 
follow-up 


over 6 weeks, 
3-month follow- 
up for Group 1 


for Groups 1 and 2 on 
systolic and diastolic 
BP; Group 1 superior to 
Group 2 (t test); no 
statistical tests per- 
formed on follow-up 
data 


control group, base- 
line period too short 


Frankel et al. (1978)* 


only 2. —9 mmHg systolic 
—4 mmHg diastolic 
20 sessions over 4 1. 3 mmHg systolic 
months 1 mmHg diastolic 


2. —1 mmHg systolic 
—2 mmHg diastolic 
3. 5 mmHg systolic 
1 mmHg diastolic 


No significant change in 


No significant differences 


No significant within- 


BP assessment by inde- 
pendent observer in 
different locality 
outside laboratory ; 
control groups; no 
follow-up 


BP across sessions 
(t test) 


between groups 


session reductions in 
Groups 1 or 2 


* For all measures, subjects were supine. 


practical alternative is to expose subjects to 
standard stressors under laboratory condi- 
tions. To date only one investigator (Patel, 
1975b) has studied the effects of biofeed- 
back-aided relaxation/meditation training on 
blood pressure response and recovery. In a 
preliminary experiment, 32 essential hyper- 
tensives were randomly assigned to either a 
training or a waiting list control condition. 


Maximum blood pressure elevations in re- 
sponse to a "stressful" exercise and cold 
pressor test and the time taken for recovery 
were assessed before and after 6 weeks of 
training. Relaxation training resulted in sig- 
nificant reductions in systolic and diastolic 
blood pressure elevations and recovery time 
for both tests. Control subjects did not dis- 
play any significant reductions on either 

(text continued on p. 1034) 
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Table 4 


PETER SEER 


Psychological Control of Essential Hypertension: Comparative Studies 
——————————————M———— 39 


Concurrent 
pharmacological 
n treatment 


Baseline Training procedure 


15 Not reported 


Shoemaker & Tasto (1975) 


3 sessions of 10 BP 1. Progressive relaxation (n = 5) 
determinations 2. Systolic and diastolic BP feedback (n = 5): 
each over 6 days Direct feedback from BP chart recorder at 


90-sec intervals 
3. Control group ( = 5): 
6 BP checks over 2 weeks 


24 50% on stable (?) 
dosage of hypo- 
tensive and 
psychotropic 
medication 


Surwit et al. (1978) 
2 1-hour sessions 1. Combined BP and heart rate feedback (n = 


8) 


over 1 week Constant cuff method, feedback contingent on 


simultaneous reduction of BP and heart rate 
(20 1-min. trials) 

2. Frontalis and extensor EMG feedback 
(integrated) (n — 8) 

3. Breath-counting meditation (n = 8) and 
information of BP at end of session 


30 Unspecified no. of 
subjects on stable 
dosage of hypo- 
tensive medica- 


Hager & Surwit (Note 3) 


No baseline data 1. Systolic BP feedback (n = 15) 
reported Visual feedback and performance score 
counter (portable home practice unit) 
2. Breath-counting meditation (и = 15) 


tion 
Walsh et al. (1977) 
24 12 subjects on No baseline period, Phase 1: (n = 24) 
stable (?) dos- pretrial baseline 1. Progressive relaxation 
age of hypoten- only 2. Pulse wave velocity feedback 


sive and psycho- 
tropic medica- 
tion 


Phase 2 (n — 16 of the 24 Phase 1 subjects) 
3. Training 1 and 2 combined 


Note. BP = blood pressure; EMG = electromyograph ; ANOVA = analysis of variance. 


^ - 
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"Table 4 (continued) 
LL — ETE Ur 


Frequency and 


duration BP changes 
of training pre/posttest 
(and follow-up) and follow-up Statistical analysis Control 


Shoemaker & Tasto (1975) 
6 80-min. sessions 1. —7 mmHg systolic Significant within-session and Low pretraining BP values; 


over 2 weeks —8 mmHg diastolic across-sessions reductions no extended baseline and 
2. —1 mmHg systolic in systolic and diastolic BP follow-up; control sub- 
—1 mmHg diastolic for Group 1 only (linear jects did not sit for same 
3. 2 mmHg systolic trend comparison) length of time as Groups 
1 mmHg diastolic 1 and 2 


Surwit et al. (1978) 


8 sessions of 1-14 1. 5 mmHg systolic No significant reduction Low pretraining BP values; 

hours each over Follow-up: within or across sessions; careful matching pro- 
4 weeks, 6-week 1 mmHg systolic no significant between- cedures; baseline period 
follow-up, 1- 2. 6 mmHg systolic groups differences (ANOVA) too short; no control 
year follow-up Follow-up: 1-year follow-up: group 
—4 mmHg systolic Average BP for all groups 
3. —6 mmHg systolic combined: 
Follow-up: 3 mmHg systolic 
—8 mmHg systolic 3 mmHg diastolic 
(no statistical analysis 
performed) 
Hager & Surwit (Note 3) 

20 min. twice Within-session reduction No significant differences No clinic data (i.e., ex- 
daily home for Groups 1 and 2 between Groups 1 and 2 on clusive reliance on BP 
practice sessions combined (и = 17): all comparisons (ANOVA) as recorded by subjects) ; 
over 4 weeks —4 mmHg systolic Within-session reductions no control group; no 
(self-record of —2 mmHg diastolic significant on systolic and baseline or follow-up, 
BP before and _ Across-sessions reductions diastolic BP (sign-test) high dropout rate: 8 in 
after each ses- for Groups 1 and 2 Across-sessions reductions Group 1, 5 in Group 2 
Sion) combined (и = 17): significant for diastolic BP 

—2 mmHg diastolic only (ANOVA) 


Walsh et al. (1977) 


51-hour sessions Phase 1 Within-session : No control group; incom- 
over 5 weeks, No details given; Group 2 superior to Groups 1 plete reporting; errors 
5 sessions of 7 graphical description on diastolic BP (АХОУА) due to technical problem? 
3-min. trials only Across-sessions: 
each, informa- Phase 2 No significant between- 
tion on pre/ Slight increases in BP groups difference (ANOVA) 
Post session BP (no details given) No statistical analysis 

Training period Groups 1 and 2 (com- reported 


not reported, 5 bined) reduced BP Significant difference (t test) 
1}-hour sessions, from 146/94 mmHg No statistical analysis 


3-month and at beginning of Phase reported 
1-year follow-up 1 to 134/87 mmHg at 
end of Phase 2 


3-month follow-up: 

Subjects from Group 1 
had lower systolic BP 

1-year follow-up: 

No between-groups 
difference 
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measure. With the exception of systolic blood 
pressure rises in the exercise test, differences 
were significant for all between-groups com- 
parisons. Despite some methodological short- 
comings, these results are highly relevant and 
worth replicating. 


Discussion of Results 


In the following section, the effectiveness 
of the various techniques that have been 
applied to the control of essential hyperten- 
sion is critically evaluated. Each of the 
three approaches presented in Tables 1-3, 
namely, blood pressure biofeedback, relax- 
ation/meditation, and biofeedback combined 
with relaxation/meditation, is discussed sep- 
arately and highlighted by a brief description 
of the most noteworthy studies. Finally, blood 
pressure and relaxation/meditation training 
are compared (Table 4), and explanations for 
differences in outcome are proposed. 

Blood pressure biofeedback. Ten studies 
investigating blood pressure biofeedback in 
the treatment of essential hypertension are 
presented in Tables 1 and 4, In general, re- 
sults support findings of studies using nor- 
motensives as subjects, Under laboratory 
conditions various blood pressure feedback 
techniques produced significant across-ses- 
sions reductions in systolic blood pressure 
(ranging from 6-18 mmHg) and in diastolic 
blood pressure (ranging from 8-15 mmHg). 
Two studies stand out: Benson et al. (1971) 
and Kristt and Engel (1975). 

Benson et al. (1971) trained seven sub- 
jects using the constant cuff technique. Base- 
line assessment and training sessions were 
individualized, and training continued until 
each subject showed no further reductions 
in systolic blood pressure in 5 consecutive 
sessions, The average number of training 

sessions was 22 (range = 8-34) over a 41- 
week period, and average within-session de- 
creases in systolic blood pressure were 17 
mmHg. Although this study did not include 
a control group and lacks across-sessions and 
follow-up data, it is one of the best con- 
trolled in the field. 

Kristt and Engel (1975) taught five sub- 
jects to reliably increase and decrease sys- 


PETER SEER 


tolic blood pressure within sessions. Again 
the constant cuff technique was used. Data 
were presented in graphical form only but 
suggested average increases and decreases 
in systolic pressure of 10-15 mmHg. Training: 
was conducted in 42 sessions during a 3- 
week hospital stay. After discharge from the 
hospital, subjects continued to practice low- 
ering systolic blood pressure at home on a 
daily basis. They did this by inflating a stan- 
dard cuff to their average blood pressures 
and making Korotkoff sounds disappear while” 
cuff pressure was maintained. Cuff pressure 
was then adjusted to the new lower pressure, 
and the procedure was repeated. After 3 
months of home training, blood pressure as 
measured by subjects in their home, had de- 
creased 18 mmHg systolic and 8 mmHg dia- 
stolic from pretreatment baseline. For several 
reasons these results have to be interpreted 
cautiously: (a) Subjects had shown no across- 
sessions reductions in the laboratory; (b) 
blood pressure taken by subjects at home was 
not checked against blood pressure taken by 
an independent observer; (c) there was no 
control group, and the number of subjects at 
follow-up was too small (п = 4); and (d) 
there is no way to determine whether the 
simulated blood pressure control procedure 
was instrumental in producing these reduc-: 
tions or whether other factors, such as taking 
time out to practice and relax, were the cru- 
cial components. 

Two studies (Shoemaker & Tasto, 1975; 
Surwit et al, 1978) failed to produce any 
decreases in blood pressure. In both cases ће | 
feedback techniques that were used appear | 
inappropriate, Shoemaker and Tasto applied 
а noncontinuous proportional feedback tech- 
nique in which once every 90 sec subjects were | 
shown their systolic and diastolic blood pres- 
Sure on a chart recorder and asked to reduce | 
both. This Probably was too complex a task. 
In the study by Surwit et al, which used the. 
constant cuff technique, subjects received bi- 4 
пагу feedback for simultaneous reductions in | 
blood pressure and heart rate; that is, feed- — 
back for a "correct" response was only given 
When decreases in blood pressure coincided 
with decreases in heart rate. As the authors 
themselves pointed out, this procedure was ; 
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probably ineffective because subjects had nor- 
mal heart rates to start with. 

With regard to which blood pressure feed- 
back technique is most effective, no conclu- 
sive answer is possible. Comparative studies 
with essential hypertensives have not yet been 
conducted. But there is evidence which sug- 
gests that the constant cuff technique is the 
most promising one, whereas the pulse wave 
velocity technique still needs further testing. 
However, at present we have no convincing 
indication that essential hypertensives can 
achieve clinically relevant and persistent blood 
pressure reductions through blood pressure 
feedback training. Successful blood pressure 
control may require more intensive individ- 
ualized training (e.g, Benson et al, 1971) 
and continued home practice (Kristt & Engel, 
1975), Although home practice appears to be 
crucial for the maintenance of blood pressure 

„reductions in relaxation/meditation training, 
the role of home practice and its relaxation 
components in blood pressure feedback train- 
ing is far from clear ( Tarler-Benlolo, 1978). 

Relaxation/meditation. The overall meth- 
odological quality of relaxation/meditation 
studies is better than that of blood pressure 
biofeedback studies. Subject samples are 
larger, and across-sessions measures and ade- 

quate follow-up or extended training periods 
of approximately 6 months are reported in all 
but one study (Deabler et al 1973). Blood 
pressure reductions across sessions were in the 
range of 7-14 mmHg for systolic and of 4—10 
mmHg for diastolic blood pressure. Two stud- 
les are worth describing in some detail. 

_ Stone and DeLeo (1976) compared 14 sub- 
jects who were trained in breath meditation 

„With a small control group (и = 5). Training 
consisted only of five sessions, but subjects 
Were asked to practice regularly at home. 
After 6 months of practice, systolic and dia- 
stolic blood pressure in the supine position 
were reduced by 15 and 10 mmHg, respec- 
tively, whereas blood pressure in the control 
group had slightly increased (+1/+2 mmHg). 
Plasma dopamine 8-hydroxylase levels were 
also found to be significantly reduced and to 
correlate with falls in blood pressure. Dopa- 
mine 8-ћудгохујазе is an enzyme that con- 
verts dopamine to norepinephrine and has 
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been suggested to be an indicator of sympa- 
thetic nervous system activity. The fact that 
the experimental and the control group were 
of unequal size and that subjects were not 
randomly allocated to groups necessitates 
caution in interpretation. In addition, pre- 
treatment blood pressures were in the bor- 
derline hypertensive range, and subjects 
were much younger than in all other studies 
reviewed here. 

In an exceptionally well-controlled study, 
Taylor et al. (1977) compared three groups 
of subjects on medication (n — 31) over an 
8-week period. The first group received re- 
laxation training, the second, nonspecific treat- 
ment, and the third, no treatment at all. The 
relaxation technique consisted of progressive 
muscle relaxation, the imagination of pleasant 
scenes, and self-monitoring. The relaxation 
group achieved the greatest decreases (—14 
mmHg systolic, —5 mmHg diastolic), but 
only in the case of systolic blood pressure 
were they statistically significant. At a 3- 
month follow-up assessment, relaxation sub- 
jects had generally maintained their reduc- 
tions (—12/—6 mmHg) but did not differ 
significantly from the other two groups, which 
had by then displayed further small decreases 
in blood pressure (—7/—2 mmHg for the 
medication-only and —4/—4 mmHg for the 
nonspecific treatment group). 

Several single-case studies, which so far 
have not been mentioned in this review, have 
also investigated the effects of progressive 
relaxation and the modification thereof on 
essential hypertension (Beiman, Graham, & 
Ciminero, 1978; Bloom & Cantrell, 1978; 
Brady, Luborsky, & Kron, 1974; Graham, 
Beiman, & Ciminero, 1977). The report by 
Brady et al. (1974) deserves special mention 
because it used A-B-A and A-B-A-B single- 
case designs that have otherwise not been used 
in clinical studies of essential hypertension. 
(For a discussion of single-case designs in 
clinical biofeedback research, see Barlow, 
Blanchard, Hayes, & Epstein, 1977.) After 
2-4 weeks of daily half-hour blood pressure 
assessment sessions (Baseline 1), the three 
subjects in the Brady et al. study received 
between 19 and 25 half-hour training sessions 
of metronome-conditioned relaxation training 
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(Training 1) over 4 weeks. This was followed 
by another 4 weeks of daily blood pressure 
checks without any further training practice 
(Baseline 2). One subject showed no change, 
whereas the other two showed significant de- 
creases in diastolic blood pressure (of 3 mmHg 
and 6 mmHg, respectively) during Training 
1, and significant increases in diastolic blood 
pressure (of 5 mmHg and 8 mmHg, respec- 
tively) in Baseline 2. One subject resumed 
relaxation after Baseline 2, which resulted in 
a drop in diastolic blood pressure of 13 mmHg. 
Systolic blood pressure was not reported. 

Different relaxation and meditation tech- 
niques have not yet been compared with each 
other; therefore, at this stage, little can be 
said about which technique is likely to be 
more effective. Of the three techniques that 
have been investigated, progressive relaxation 
and breadth meditation appear to have pro- 
duced slightly larger decreases in blood pres- 
sure than TM. Of the seven studies reviewed 
in Table 2, none of the four TM studies in- 
cluded a control group. 

Mixed studies. Four studies that investi- 
gated a combination of various biofeedback 
and relaxation/meditation techniques in the 
psychological control of essential hypertension 
are reviewed in Table 3. The work by Patel 
and North (1975) has shown the most im- 
pressive results of all studies discussed in this 
review. Training consisted of a combination 
of educational programs, rhythmic slow 
breathing, muscular relaxation, meditation, 
GSR and EMG feedback, and various selí- 
control procedures for coping with everyday 
difficult situations. Reductions in mean val- 
ues after a 6-week training and a 3-month 
follow-up period were 26 mmHg systolic and 
15 mmHg diastolic. The 17 control subjects 
who attended the same number of sessions 
but simply rested and relaxed on their own 
also achieved significant reductions of 9 
mmHg systolic and 4 mmHg diastolic. How- 
ever, the difference between the two groups 
was highly significant. The results are par- 
ticularly convincing in that the investigators 
used a half-crossover design in which control 

subjects later underwent the full training 
procedure and as a result reduced their blood 
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pressure by 28 mmHg systolic and 16 mi 
diastolic. (4 

Frankel et al. (1978) used a similar trai 
ing package consisting of diastolic blood pr 
sure feedback (constant cuff technique), ЕМ! 
feedback, autogenic training, and progressi 
relaxation and compared it with a group 
ceiving noncontingent diastolic blood pressuri 
feedback and with a waiting list control group, 
After 4 months of training, no significan! 
blood pressure reductions were found in an 
of the groups. The reasons for these diametri? 
cally opposed outcomes are difficult to assess; 
Because of the simultaneous application of 
several training procedures, it is impossible 
to isolate the active ingredients in these train- 
ing packages. However, training appeared (0) 
be more intensive and comprehensive in the 
Patel study, and baseline systolic blood pres- 
sure for the whole sample was also consider- 
ably higher (168/100 vs. 153/99). 

Blood pressure biofeedback and relaxation, 
meditation compared. The four studies com- 
paring blood pressure biofeedback and relaxa- 
tion/meditation techniques (Table 4) have 
not helped much in assessing the possible 
differential effectiveness of these techniques. 
Two of the studies slightly favor progressive 
relaxation over noncontinuous, proportional 
(Shoemaker & Tasto, 1975), and pulse wave 
velocity feedback (Walsh et al., 1977). The 
other two studies both find breath meditation 
equally ineffective as a portable constant cuff 
technique (Hager & Surwit, Note 3) and as 
a constant cuff technique for simultaneous 
reductions in systolic blood pressure and heart 
rate (Surwit et al., 1978). Unfortunately, the к, 
studies of Walsh et al. and Hager and 
Surwit are of poor methodological quality, 
and the studies by Shoemaker and Tasto and 
Surwit et al. probably used, as previously 
discussed, inappropriate blood pressure feed- 
back techniques. Finally, a recent experiment 
with normotensives (Steptoe, 1978), which 
compared pulse wave velocity feedback and 
breath meditation, also produced equivocal 
results. 

In drawing general conclusions regarding 
the comparative effectiveness of relaxation 
meditation and blood pressure feedback tech- 
niques, we are therefore restricted to the non; 
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comparative studies previously reviewed. Al- 
though experimental evidence is still some- 
what weak, it is safe to say that in contrast 
to blood pressure biofeedback training, re- 
laxation/meditation training has produced 
small but significant reductions in blood pres- 
sure with essential hypertensives. These re- 
ductions have been shown to persist for up 
to 6 months and to generalize to environments 
other than those in which training was con- 
ducted (e.g., Frankel et al., 1978). To ac- 

‘count for the greater effectiveness of relaxa- 
tion/meditation approaches, the following ex- 
planations are suggested. 

1, Integrated physiological pattern versus 
specificity of blood pressure control. More ef- 
fective, rapid, and persistent blood pressure 
control is likely to be attained if the self- 
regulation of blood pressure is accompanied 
by compatible changes in other physiological 

,, Parameters such as heart rate, respiratory rate, 
muscle relaxation, and so on. Relaxation/ 
meditation techniques are assumed to be more 
effective because they elicit such an inte- 
grated physiological pattern (Schwartz, 
1976), whereas blood pressure feedback 
training may only affect blood pressure with- 

| Out concomitant changes in heart rate and 
other physiological parameters. 

* 2. Home practice. As a rule relaxation/ 
meditation techniques are taught in five to 
eight sessions, but the major part of training 
actually takes place in the subject’s home 
environment in which he or she is advised to 
practice daily. These home practice sessions, 
Which are usually not part of blood pressure 
biofeedback training may be an important 
factor in explaining the superiority of relaxa- 

_ tion/meditation approaches. It is interesting 
to note that the only blood pressure feedback 
study that showed blood pressure reductions 
to persist at a 3-month follow-up was also one 
of the few studies in which subjects practiced 
reducing blood pressure at home on a daily 
basis (Kristt & Engel, 1975). 

_ 3. Patient involvement. The role of patient 
involvement in the psychological control of 
essential hypertension has so far not been 
systematically studied. But an unpublished 

report by Sherman and Gaardner (Note 4) 

Suggests that this factor may affect treatment 


B 
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outcome, They rated all available studies us- 
ing biofeedback and relaxation/meditation 
techniques according to their treatment effec- 
tiveness and the degree of patient involve- 
ment. They found a significant correlation 
indicating that the more patients were in- 
volved, the higher was the treatment effec- 
tiveness. Patient involvement was defined by 
the number of training sessions, the intensity 
of home practice and recording, awareness of 
unwanted stress, expectancy of improvement 
elicited by training instructions, and the num- 
ber of techniques used. Although Sherman and 
Gaardner did not differentiate between blood 
pressure feedback and relaxation/meditation 
training, it appears that the latter would re- 
ceive higher ratings on patient involvement 
than the former. 

Several other differences between blood 
pressure feedback and relaxation/meditation 
training with regard to patient involvement 
are worth mentioning here. In relaxation/ 
meditation training subjects are taught indi- 
vidually or in groups by an instructor who is 
likely to respond to individual questions, who 
gives encouragement, and who may serve as 
a model. In blood pressure biofeedback train- 
ing, the patient-instructor contact is probably 
much shorter and less personal than in re- 
laxation/meditation training. It is mainly the 
*machine" that is in the role of the instruc- 
tor. Training is also likely to take place in a 
more technical, controlled, and alien environ- 
ment. In addition, repeated cuff inflations and 
the maintenance of cuff pressure may be un- 
pleasant and distracting for some subjects. 

Finally, relaxation/meditation approaches 
have the advantage of also positively affecting 
stress-related complaints such as insomnia 
(e.g., Borkovec & Hennings, 1978) and other 
clinical problems (Shapiro & Giber, 1978) and 
of increasing self-reported measures of health, 
performance, and well-being (Peters, Benson, 
& Porter, 1977). Subjects experiencing such 
changes may well become highly motivated 
to continue home practice on a regular basis. 

If techniques of relaxation/meditation are 
combined with self-control procedures and 
with EMG and GSR feedback training, blood 
pressure effects are likely to be even larger 
(Patel & North, 1975). The combined use of 
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diastolic blood pressure feedback training and 
other techniques for relaxation has produced 
negative results (Frankel et al., 1978). On the 
other hand, Fey and Lindholm (1978), who 
worked with normotensive subjects, found that 
blood pressure feedback (constant cuff tech- 
nique), when combined with progressive re- 
laxation training, produced larger within-ses- 
sion reductions in systolic blood pressure than 
progressive relaxation alone. However, in a 
follow-up session, when no feedback was pro- 
vided, the two groups no longer differed. 
Feedback training then appears to improve 
or facilitate within-session control of blood 
pressure but tends to lose its effect when it 
is withdrawn. 


Issues of Experimental Design 


All of the studies that have been reviewed 
in Tables 1—4 are outcome studies of either 
between-groups or single-group designs, except 
for one study of single-case experiments (Ben- 
son et al, 1971). The inadequacies of the 
single-group design are obvious and need no 
further mention here. Accordingly, the dis- 
cussion concentrates on the pros and cons of 
between-groups versus single-case experimen- 
tal designs. Two major advantages of the be- 
tween-groups comparison design stand out. 
(a) It allows the investigation of the com- 
parative effectiveness of two or more training 
procedures, and (b) results have a wider ap- 
plication. In contrast, generalizations based 
on single-case experiments have to be made 
cautiously. 

Large individual differences in response to 
relaxation/meditation and biofeedback train- 
ing have been reported by many researchers. 
Single-case experiments have the great ad- 
vantage of treating intrasubject and intersub- 
ject variability not as error but as informa- 
tion (Barlow et al., 1977). One consequence 
of intersubject variability is of particular con- 
cern in group designs, as Hersen and Barlow 
(1976) have pointed out: 


When broad divergence . . . occurs among clients 
in response to an intervention, statistical treatments 
will average out the clinical effects along with changes 
due to unwanted sources of variability. In fact, this 
type of intersubject variability is the rule rather 
than the exception. (pp. 37-38) 
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Group designs in general are based on 
assumption that all subjects in a given san! 
ple have homogeneous characteristics. In the. 
case of clinical samples, homogeneity of dis- 
order and etiology is also assumed. However, 
since it is generally accepted that essential 
hypertension is a multicausal disorder, it is 
likely that within each group there are sub- 
samples that respond differently to training, 
One way of overcoming this difficulty is to 
apply more stringent sampling criteria. An- 
other alternative is to match subjects on cer 
tain variables, But unfortunately, at present, 
the relevant matching variables are simply 
not known. Group designs also imply that à 
standardized method of intervention is appro- 
priate for all subjects. But because of the 
heterogeneity of etiology in essential hyper- 
tension, such an assumption is clearly inap- 
propriate. Yet training procedures used in 
group designs must be designed in such a way 
that they are generally suitable for all sub- 
jects, and once the experiment is under way, 
they cannot be adjusted to the individual 
needs of the subject. It is then not surpris- 
ing that many group outcome studies yield 
statistically significant but clinically irrelevant 
results, In contrast, single-case experiments 
allow for training procedures to be tailored 
to the specific needs of each person and permit 
the testing of specific components of a given 
training procedure. 

For future research the use of both types 
of designs is recommended. Single-case eX- 
periments would prove particularly valuable 
in deciding which training procedure is likely 
to achieve the best results with which type 
of subject. This could be followed up by multi- 
factorial between-groups studies in which one 
or more training procedures would be com- 
pared with different subject samples (e£: 
comparing the effects of progressive relaxation 
and breath meditation on essential hyperten- 
Sives with high versus low pretest frontalis 
EMG levels). Further, between-groups stud- 
ies can be improved by not only gathering pte 
test, posttest, and follow-up data but also by 
repeating measurements during the training 
Period. Such a repeated measures desig! 
(Kiesler, 1971) would allow both outcome and 
process information. 
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Suggestions for a Comprehensive Approach 
é to Assessment and Training 


It will have become clear from this re- 
view that so far, psychological approaches 
have had only limited success in controlling 
essential hypertension. One reason for this 
may be that the commonly invoked psycho- 
physiological model of essential hypertension 
is too narrow and that assessment and treat- 
ment based on this model are too simplistic. 
Most studies attempting to control elevated 
blood pressure through psychological tech- 
niques have aimed at counteracting or reduc- 
ing the sympathetic nervous system activity 
that is assumed to play a central role in es- 
sential hypertension. The question of what 
actually causes sympathetic arousal in the 
first place is rarely asked. It is hypothesized 
here that to a considerable degree, the per- 
son’s sympathetic arousal reflects idiosyncratic 

"ways of perceiving, appraising, and interacting 
with his or her environment. It could there- 
fore prove useful not only to measure blood 
pressure and other related physiological vari- 
ables but also to measure cognitive appraisal 

and coping patterns in response to environ- 
mental demand. Assessment could take place 
in the laboratory applying standard stressors 

(Richter-Heinrich, Knust, Müller, Schmidt, 
& Sprung, 1975) or in the person's real-life 
environment (e.g, at the person's work 
place). The latter is a practicable proposi- 
tion because the necessary technology is now 
available (Littler et al., 1975). 

Such comprehensive assessment would 

allow the determination of which situations 
and events the person responds to with blood 
pressure elevations and whether they are 
» mediated by dysfunctional patterns of think- 
Ing, emoting, and behaving. A more compre- 
hensive approach to training would make use 
of specific behavioral techniques, cognitive 
restructuring (Goldfried, 1977), and stress 

Management techniques such as anxiety man- 

agement (Bloom & Cantrell, 1978) and stress 

inoculation training (Meichenbaum, 1977). 

It is hoped that future research will examine 

whether these techniques that aim at sys- 
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more passive techniques of blood pressure 
control described in this review. 


Conclusion 


In conclusion, psychological approaches to 
the self-regulation of high blood pressure 
are a promising adjunct to pharmacological 
treatment, and under the supervision of a 
physician, may allow the gradual reduction of 
medication requirements as the patient be- 
comes more proficient in his or her respec- 
tive technique (Patel & North, 1975). How- 
ever, without more large-scale clinical trials 
with sound methodology, an unqualified ac- 
ceptance of these techniques as an alterna- 
tive to pharmacological treatment is not 
justified. To date, few studies have resulted 
in blood pressure reductions that are clini- 
cally relevant by virtue of either their mag- 
nitude or duration. It has been clearly shown 
that substantial blood pressure reductions 
can be achieved under laboratory conditions 
and that after periods of consistent relaxa- 
tion/meditation practice, these changes are 
maintained under resting conditions without 
prior practice. But it has not yet been dem- 
onstrated whether training reduces blood 
pressure in the person’s real-life environment. 
The active therapeutic ingredients of the 
various techniques have not yet been satis- 
factorily established, and only two studies 
have used nonspecific treatment control pro- 
cedures. - 

It is unlikely that any one single tech- 
nique will be suitable for all subjects, but 
rather a variety of approaches is necessary 
to deal effectively with individual differences 
in physiological and psychological respond- 
ing. It is important that research should con- 
centrate on differential diagnosis, with the 
aim of establishing criteria for choosing the 
most appropriate intervention techniques. 
Such research would also help to stimulate 
the development of new techniques or to 
suggest new combinations of existing ones. 

A wide application of psychological ap- 
proaches to the prevention and control of 
essential hypertension is unlikely in the 
immediate future. On the one hand, much 
more needs to be known about which tech- 
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nique or combination of techniques works 
best with whom. On the other hand, a wide 
application of these techniques requires a 
major shift in attitude of the general popu- 
lation and of the medical and health pro- 
fessions. The majority of people may prefer 
the easier course of medication rather than 
the more demanding process of training and 
daily practice. This attitude may not so much 
reflect a lack of willingness to take responsi- 
bility for one's own health as a lack of 
awareness of how personal habits effect it 
and how they may be changed. 

It is hoped that psychologists will play 
an increasingly important role in heighten- 
ing general awareness and in researching and 
teaching the skills conducive to health and 
the prevention of disease. 


Reference Notes 


1. Love, W. A, Jr, Montgomery, D. D., & Moel- 
ler, T. A. Working paper number 1. Ft. Lauder- 
dale, Fla: Nova University, Behavioral Sciences 
Center, 1973. 

2. Montgomery, D. D., Love, №. А, Jr. & Moeller, 
T. A. Working paper number 2. Ft. Lauderdale, 
Fla.: Nova University, Behavioral Sciences Cen- 
ter, March 1974. 

3. Hager, J. L., & Surwit, R. S. Hypertension. self- 
control with a portable feedback unit for re- 
laxation. Paper presented at the meeting of the 
Society for Psychophysiological Research, San 
Diego, October 1976. 

4. Sherman, R. A, & Gaardner, K. R. Patient in- 
volvement and treatment effectiveness «in: ibe- 
havioral treatments of hypertension. Paper pre- 
sented at the meeting of the Biofeedback Society 
of America, Orlando, Fla., March 1977. 


References 


Barlow, D. H., Blanchard, E. B., Hayes, S. C, & 
Epstein, L. H. Single-case designs and clinical 
biofeedback experimentation. Biofeedback and 
Self-Regulation, 1977, 2, 221-239. 

Beiman, L, Graham, L. E, & Ciminero, A. R. 
Self-control progressive relaxation training as an 
alternative nonpharmacological treatment for es- 
sential hypertension: Therapeutic effects in the 
natural environment. Behaviour Research and 
Therapy, 1978, 16, 371-375. 

Benson, H. The relaxation response. New York: 
Morrow, 1975. 

Benson, H., Rosner, B. A, Marzetta, B. R, & 
Klemchuck, H. M. Decreased blood pressure in 
borderline hypertensive subjects who practice 


PETER SEER 


meditation. Journal of Chronic Diseases, 1914, 
27, 163-169. (a) f 

Benson, H., Rosner, B. A, Marzetta, B. R, & 
Klemchuck, H. M. Decreased blood pressure in 
pharmacologically treated hypertensive patients 
who regularly elicited the relaxation response, 
Lancet, 1974, 1, 289-291. (b) 

Benson, H., Shapiro, D., Tursky, B. & Schwartz, 
G. E. Decreased systolic blood pressure through: 
operant conditioning techniques in patients with] 
essential hypertension. Science, 1971, 173, 740- 
742. 

Blackwell, B., et al. Transcendental meditation in 
hypertension: Individual response patterns. Lane 
cet, 1976, 1, 223-226. 

Blanchard, E. B., & Young, L. D. Self-control of 
cardiac functioning: A promise as yet unfulfilled. 
Psychological Bulletin, 1973, 79, 145-163. 

Bloom, L. J., & Cantrell, B. Anxiety management 
training for essential hypertension in pregnancy, 
Behavior Therapy, 1978, 9, 377-382. 

Borkovec, T. D., & Hennings, B. L. The role of 
physiological attention focusing in the relaxation 
treatment of sleep disturbance, general tension, 
and specific stress reaction, Behaviour Research 
and Therapy, 1978, 16, 7-19. 

Borovec, T. D., & Nau, S. D. Credibility of ana- 
logue therapy rationales. Journal of Behavior 
Therapy and Experimental Psychiatry, 1972, 3, 
257-260. 

Brady, J. P. Metronome-conditioned relaxation: A 
new behavioral procedure. British Journal 
Psychiatry, 1973, 122, 729-730. А 

Brady, J. Р., Luborsky, L., & Kron, R. E. Blood 
pressure reduction in patients with essential hy- 
pertension through metronome-conditioned re- 

, laxation: A preliminary report. Behavior Ther? 
apy, 1974, 5, 203-209. 

Brener, J. A. A general model of voluntary control 
applied to the phenomena of learned cardiovas- 
cular change. In P. A. Obrist, A.H. Black, Ji 
Brener, & L. V. DiCara (Eds.), *Cardiovascul 
psychophysiology. Chicago: Aldine, 1974. 

Bulpitt, C. J., & Dollery, C. T. Side effects of hypo 
tensive agents evaluated by a self-a inistrat 
questionnaire. British Medical Journal, 913, 4 
485-490, 

Cannon, W.B. The wisdom of the body. New York: 
Norton, 1932. ма 

Davidson, R. J., & Schwartz, С. E. The psycho: 
biology of relaxation and related states: A multi 
process theory. In D. I. Mostofsky (Ed.), B 
havior control and modification of physiologi 
activity. Englewood Cliffs, N.J.: Prentice- 
1976. 

Deabler, Н. L., Fidel, E, Dillenkoffer, Е. Lo [ 
Elder, S. T. The use of relaxation and hypnosi 
in lowering high blood pressure. The Americ 
Journal of Clinical Hypnosis, 1913, 160), 19-8 

Dollery, C. T. Normal and raised arterial pressure 
What is hypertension? In С. Onesti, К. E. Ki 

& J. H. Moyer (Eds), Hypertension: Mec 


ESSENTIAL HYPERTENSION 


nisms and management. New York: Grune & 

* Stratton, 1973. 

Dunne, J. F. Variation of blood pressure in un- 
treated hypertensive outpatients. Lancet, 1969, 1, 
391-392. 

Elder, S. T., & Eustis, N. K. Instrumental blood 
pressure conditioning in outpatient hypertensives. 
Behaviour Research and Therapy, 1975, 13, 185— 
188. 

Elder, S. T., Longacre, A, Jr, Welsh, D. M., & 
McAfee, R. D. Apparatus and procedure for 
training subjects to control their blood pressure. 
Psychophysiology, 1977, 14, 68-72. 

sElder, S. T., Ruiz, Z. R., Deabler, Н. І, & Dillen- 
koffer, R. L. Instrumental conditioning of dias- 
tolic blood pressure in essential hypertensive pa- 
tients, Journal of Applied Behavior Analysis, 
1973, 6, 377-382. 

Elder, S. T., Welsh, D. M., Longacre, A., Jr, & 
McAfee, R. D. Acquisition, discriminative stimu- 
lus control and retention of increases/decreases 
in blood pressure of normotensive human sub- 
jects. Journal of Applied Behavior Analysis, 1977, 
10, 381—390. 

Fey, S. С., & Lindholm, E. Systolic blood pressure 

* and heart rate changes during three sessions in- 
volving biofeedback or no feedback. Psycho- 
physiology, 1975, 12, 513-519. 

Fey, S. G., & Lindholm, E. Biofeedback and pro- 
gressive relaxation: Effects on systolic and dias- 
tolic blood pressure and heart rate. Psychophysi- 
logy, 1978, 15, 239-247. 
nkel, B. L., Patel, D. J., Horowitz, D., Fried- 
wald, W. T., & Gaardner, K. R. Treatment of 
hypertension with biofeedback and relaxation 
techniques. Psychosomatic Medicine, 1978, 40, 
276-293. 

Goldfried, M. R. The use of relaxation and cogni- 
tive relabeling as coping skills In R. B. Stuart 
(Ed.), Behavioral self-management. New York: 
Bruner/Mazel, 1977. 

Goldman, H., Kleinman, K., Snow, M., Bidus, D., 
& Korol, B. Relationship between essential hy- 
pertension and cognitive functioning: Effects of 
biofeedback. Psychophysiology, 1975, 12, 569- 
573. 

Goldring, W., Chasis, H., Schreiner, G. E, & 
Smith, H. W. Reassurance in the management 
of benign hypertensive disease. Circulation, 1956, 
14, 260-264. 

Graham, L, E, Beiman, L, & Ciminero, A. Е. The 
generality of the therapeutic effects of progres- 
Sive relaxation training for essential hypertension. 
Journal of Behavior Therapy and Experimental 
Psychiatry, 1977, 8, 161-164. 

Grenfell, R. F., Briggs, A. H, & Holland, W. C. 

A double-blind evaluation of antihypertensive 

drugs. Angiology, 1964, 15, 163-170. 

Gribbin, B., Steptoe, A., & Sleight, P. Pulse wave 

velocity as a measure of blood pressure changes. 

Psychophysiology, 1976, 13, 86-91. 

tmann, M. C. & Benson, H. Interaction of en- 


1041 


vironmental factors and systemic arterial blood 
pressure: A review. Medicine, 1971, 50, 543-553. 

Henry, J. P., & Cassel, J. C. Psychosocial factors 
in essential hypertension; Recent epidemiologic 
and animal experimental evidence. American 
Journal of Epidemiology, 1969, 90, 171-200. 

Hersen, M., & Barlow, D. H. Single-case experi- 
mental designs: Strategies for studying behavior 
change. New York: Pergamon Press, 1976. 

Jacobson, E. Variation of blood pressure with skele- 
tal muscle tension and relaxation. Annals of In- 
ternal Medicine, 1939, 12, 1194-1212. 

Jacobson, E, Modern treatment of tense patients. 
Springfield, Ill: Charles C Thomas, 1970. 

Kannel, W. B. Recent highlights from the Fram- 
ingham study. Australian and New Zealand Jour- 
nal of Medicine, 1976, 6, 373-386. 

Kannel, W. B., & Dawber, T. R. Hypertensive car- 
diovascular disease: The Framingham study. In 
G. Onesti, K. E. Kim, & J. H. Moyer (Eds.), 
Hypertension: Mechanisms and management. New 
York: Grune & Stratton, 1973. 

Kazdin, A. E., & Wilcoxon, L. A. Systematic de- 
sensitization and nonspecific treatment effects: A 
methodological evaluation. Psychological Bulletin, 
1976, 83, 729-758. 

Kiesler, D. J. Experimental designs in psychother- 
apy research. In A, E. Bergin & S. L. Garfield 
(Eds.), Handbook of psychotherapy and behavior 
change: An experimental analysis. New York: 
Wiley, 1971. 

Kleinman, К. M, Goldman, H., Snow, M. Y. & 
Korol, B. Relationship between essential hyper- 
tension and cognitive functioning II: Effects of 
biofeedback training generalize to nonlaboratory 
environment. Psychophysiology, 1977, 14, 192- 
197. 

Krausman, D. T. Methods and procedures for mon- 
itoring and recording blood pressure. American 
Psychologist, 1975, 30, 285-294. 

Kristt, D. А, & Engel, B. T. Learned control of 
blood pressure in patients with high blood pres- 
sure. Circulation, 1975, 51, 370-378. 

Littler, W. A., Honour, A. J., Pugsley, D., & Sleight, 
P. Continuous recording of direct arterial pres- 
sure in unrestricted patients. Circulation, 1975, 
51, 1101-1106. 

Littler, W. A., Honour, A. J., Sleight, P., & Stott, F. 
D. Continuous recording of direct arterial pres- 
sure and electrocardiogram in unrestricted man. 
British Medical Journal, 1972, 3, 76-78. 

LoGerfo, J. P. Hypertension. Management in a 
prepaid health care project. Journal of the Amer- 
ican Medical Association, 1975, 223, 245-248. 

Mahesh Yogi, M. Transcendental meditation, New 
York: Signet, 1968. 

Meichenbaum, D. H. Cognitive-behavior modifica- 
tion. New York: Plenum Press, 1977. 

Naranjo, C. & Ornstein, R. E. On the psychology 
of meditation. New York: Viking Press, 1971. 

Onesti, G., Kim, K. E, & Moyer, J. H. Hyperten- 


1042 


sion: Mechanisms and management. New York: 
Grune & Stratton, 1973. 

Patel, C. H. Yoga and biofeedback in the manage- 
ment of hypertension. Lancet, 1973, 2, 1053-1055. 

Patel, C. H. 12-month follow-up of yoga and bio- 
feedback in the management of hypertension. 
Lancet, 1975, 1, 62-65. (a) 

Patel, C. H. Voga and biofeedback in the manage- 
ment of "stress" in hypertensive patients. Clini- 
cal Science and Molecular Medicine, 1975, 48, 
171-174. (Supplement) (b) 

Patel, C. H., & North, W. Е. S. Randomised con- 
trolled trial of yoga and biofeedback in manage- 
ment of hypertension. Lancet, 1975, 2, 93-95. 

Peters, R. K., Benson, H., & Porter, D. Daily re- 

laxation response breaks in a working popula- 
tion: I. Effects on self-reported measures of 
health, performance and well-being. American 
Journal of Public Health, 1977, 67, 946-953. 

Pickering, G. W. High blood pressure. London: 
Churchill, 1968. 

Pollack, A. D., Weber, M. A. Case, D. B, & 
Laragh, J. H. Limitations of transcendental medi- 
tation in the treatment of essential hypertension. 
Lancet, 1977, 1, 71-73. 

Redmond, D. P., Gaylor, M. S., McDonald, R. H. 
& Shapiro, A. P. Blood pressure and heart-rate 
response to verbal instruction and relaxation in 
hypertension, Psychosomatic Medicine, 1974, 36, 
285-297. 

Richter-Heinrich, Е, Knust, U, Мише, У, 
Schmidt, К. Н, & Sprung, Н. Psychophysiologi- 
cal investigations in essential hypertensives. Jour- 
e of Psychosomatic Research, 1975, 19, 251— 
258. 

Schwartz, G. E. Self-regulation of response pat- 
terning: Implications for psychophysiological re- 
search and therapy. Biofeedback and Self-Regu- 
lation, 1976, 1, 7-30. 

Shannon, B. J., Goldman, M. S, & Lee, R. M. 
Biofeedback training of blood pressure: A com- 
parison of three feedback techniques. Psycho- 
physiology, 1978, 15, 53-59. 

Shapiro, D. A monologue on biofeedback and psy- 
CT Psychophysiology, 1977, 14, 213— 
227. 

Shapiro, D., Schwartz, G. E, & Tursky, B. Con- 
trol of diastolic blood pressure in man by feed- 
back and reinforcement. Psychophysiology, 1972, 
9, 296-304. 

Shapiro, D., & Surwit, R. S. Learned control of 
physiological function and disease. In H. Leiten- 
berg (Ed.), Handbook of behavior modification 
and behavior therapy. Englewood Cliffs, NJ.: 
Prentice Hall, 1976. 

Shapiro, D., Tursky, B., Gershon, E., & Stern, M. 
Effects of feedback and reinforcement on the con- 
trol of human systolic blood pressure, Science, 
1969, 163, 588-590. 

Shapiro, D. H., Jr., & Giber, D. Meditation and 
psychotherapeutic effects. Archives of General 

Psychiatry, 1978, 35, 294-302, 


PETER SEER 


Shapiro, D. H., Jr., & Zifferblatt, S. M. Zen те 
tation and behavioral self-control: Similariti 
differences, and clinical applications. American 
Psychologist, 1976, 31, 519-532. | 

Shoemaker, J. E, & Tasto, D. L. The effects ol 
muscle relaxation on blood pressure of essenti 
hypertensives. Behaviour Research and Therapy, 
1975, 13, 29-43, 

Smith, J. C. Psychotherapeutic effects of transcen. 
dental meditation with control for expectatioi 
of relief and daily sitting. Journal of Consultin 
and Clinical Psychology, 1976, 44, 630-637. 

Stamler, J., Stamler, R., Riedlinger, W. F., Algeray 
G., & Roberts, К. Н. Hypertension screening ol 
1 million Americans, Journal of the American 
Medical Association, 1976, 235, 2299-2306. 

Steinmark, S. W., & Borkovec, T. D. Active and 
placebo treatment effects on moderate insomnia 
under counterdemand and positive demand in- 
structions. Journal of Abnormal Psychology, 1974, 
83, 157-163. 

Steptoe, A. Blood pressure control: A comparison 
of feedback and instructions using pulse wave 
velocity measurements. Psychophysiology, 1976, 
13, 528-535. 

Steptoe, A. Voluntary blood pressure reductions, 
measured with pulse transit time: Training con- 
ditions and reactions to mental work. Psycho- 
physiology, 1977, 14, 492-498. 

Steptoe, A. The regulation of blood pressure re- 
actions to taxing conditions using pulse transit 
time feedback and relaxation. Psychophysiolog, 
1978, 15, 429-438. 

Steptoe, A., Smulyan, H., & Gribbin, B. Pulse wave 
velocity and blood pressure change: Calibration 
and application. Psychophysiology, 1976, 13, 488% 
493. 

Stone, R. A, & DeLeo, J. Psychotherapeutic con- 
trol of hypertension. The New England Journal 
of Medicine, 1976, 294, 80-84. 

Stoyva, J. A psychophysiological model of stress 
disorders as a rationale for biofeedback training. 
In Е. J. McGuigan (Ed.), Tension control: Pro- 
ceedings of the second meeting of the American 
Society for the Advancement of Tension Control. 
Blacksburg, Va.: University Publications, 1976. 

Stoyva, J., & Budzynski, T. Cultivated low arousal 
—An antistress response? In L. V. DiCara (Ей), 
Limbic and autonomic nervous system research 
New York: Plenum Press, 1974. 

Surwit, R. S, Shapiro, D, & Good, M. I. Com 
parison of cardiovascular biofeedback, neuromus 
cular feedback, and meditation in the treatment 
of borderline hypertension, Journal of Consult- 
ing and Clinical Psychology, 1978, 46, 252-263. 

Tarler-Benlolo, L. The role of relaxation in bio- 
feedback training: A critical review of the litera- 
ture. Psychological Bulletin, 1978, 85, 127—155. 

Taylor, C. B, Farquhar, J. W., Nelson, E, Ast 


342. 


1 


* 


ESSENTIAL HYPERTENSION 


Travis, T. A, Kondo, C. Y., & Knott, J. К. Heart 

* rate, muscle tension, and alpha production of 
transcendental meditators and relaxation controls. 
Biofeedback and Self-Regulation, 1976, 1, 387— 
394. 

Treichel, M., Clinch, N., & Cran, M. The metabolic 
effects of transcendental meditation. The Physi- 
ologist, 1973, 16, 472. 

Tursky, B. The indirect recording of human blood 
pressure, In P. А. Obrist, A. H. Black, J. Brener, 
& L. V. DiCara (Eds.), Cardiovascular psycho- 
physiology. Chicago: Aldine, 1974. 

Veterans Administration Cooperative Study Group 
on Antihypertensive Agents. Effects of treatment 
on morbidity in hypertension: I. Results in pa- 
tients with diastolic blood pressure averaging 115 
through 129 mmHg. Journal of the American 
Medical Association, 1967, 202, 1028-1034. 

Veterans Administration Cooperative Study Group 
on Antihypertensive Agents. Effects of treat- 


1043 


ment on morbidity in hypertension: II. Results 
in patients with diastolic blood pressure averag- 
ing 90 through 114 mmHg. Journal of the Amer- 
ican Medical Association, 1970, 213, 1143-1152. 

Walrath, L. C., & Hamilton, D. W. Autonomic 
correlates of meditation and hypnosis. The Amer- 
ican Journal of Clinical Hypnosis, 1975, 17, 190- 
197. 

Walsh, P., Dale, A., & Anderson, D. E. Comparison 
of biofeedback pulse wave velocity and progres- 
sive relaxation in essential hypertensives. Per- 
ceptual and Motor Skills, 1977, 44, 839-843. 

Woolfolk, R. L. Psychophysiological correlates of 
meditation. Archives of General Psychiatry, 1975, 
32, 1326-1333. 

Wright, B. M. & Dore, C. Е. А random-zero 
sphygmomanometer, Lancet, 1970, 1, 337-338. 


Received May 1, 1978 m 


1915, Vol. 86, No, 5, 1044-1049 


Cognitive Behavior Modification: 
Misconceptions and Premature Evacuation 


Michael J. Mahoney and Alan E. Kazdin 
The Pennsylvania State University 


Ledwidge's recent implication that cognitive behavior modification is "a step in 
the wrong direction" is examined and evaluated. Misconceptions about the 


of ” Finally, it is argued that Ledwidge's cautions about the continuing 


pursuit of cognitive-behavioral techniq 


trary to the commitment to empirical evaluation shared by both cognitive and 


It is perhaps not surprising that the emer- 
gence of cognition in behavioral quarters has 
stimulated such strong and vituperative re- 
actions, Within the last 2 years alone, there 
have been almost a dozen articles and papers 
attacking the "mentalistic" resurrection of 
cognitive processes in behavior modification 
(Goldiamond, 1976; Observer, 1977, 1978; 
Rachlin, 1977a, 1977b; Skinner, 1977; 
Wolpe, 19762, 1976b, 1978). The sentiment 
of most of these writings is aptly summarized 
in a recent Psychological Record editorial 
(Observer, 1978): 

Cognitivism constitutes a counter-revolution to the 
behavioristic revoluton that promised to promote 
psychology to a scientific status... . (р. 157) 


Students of scientific psychology cannot but de- 
plore the regressive tendencies of cognitive psy- 
chology. (p. 159) 


In his recent evaluation of cognitive be- 
havior modification, Ledwidge (1978) sounds 
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a similar alarm regarding the apparent trend 
toward cognitive theorization in behavior 
modification: 


A wholesale conversion to cognitive methods om 
insufficient evidence could rob behavior therapy 
of its distinctiveness and lead to the abandonment 
of the more traditional behavioral techniques, 
success of which have afforded (behavior) therapy 
the reputation it enjoys today. (p. 354) 


After a sporadic review of the available 
literature, Ledwidge draws a mixture of com 
clusions that seem only tangentially related 
to extant evidence. It is the purpose of 
brief article to point out some of the mis 
conceptions in Ledwidge's review and to 50" 
late some of the pivotal issues that seem (0 
be developing along the interface of cogn 
tive psychology and behavior modification. 


Dualism and Dichotomy 


Two of the most persistent myths that 
surround the developing interface are 
notions that (a) cognitivism necessita 
mentalism and that (b) a therapist (or 
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) is either exclusively cognitive or be- 

r (but never the twain shall meet). 

The first assumption is apparent in Skinner's 

(1974) recent responses to the growing in- 

terest in cognitive psychology. He uses the 

terms "cognitive psychologist" and “mental- 
ist” interchangeably: 


By attempting to move human behavior into a 
world of nonphysical dimensions, mentalistic or 
“cognitive psychologists have cast the basic issues in 
| forms. They have also probably cast us 
t much useful evidence . . . . (p. 118) 


This same sentiment is reiterated by Led- 
 widge (1978) when he states that “the pres- 
ent controversy is, of course, just another 
round in the centuries old mihd-body de- 
M (p. 360), What is interesting is that 
+ н issue of a nonphysical mind 
does not appear to differentiate groups that 
om been traditionally labeled as cognitive 

behavioristic. In a recent survey in- 
| Tu 42 of the most eminent living con- 
tributors to behavior therapy and cognitive 
modification, no significant differ- 
ences in belief in the existence of a "mind" 


se two groups Bees the importance 
f experimental rigor in theory evaluation. 
The second myth is more subtly defended 
by Ledwidge's (1978) insistence on the dif- 
ferentiation of cognitive and behavioral ther- 
distinction that he defends on the 
grounds that (a) cognitive-behavioral tech- 
"до not attempt to directly modify be- 
viors and that (b) should these new tech- 
мб fail to improve on the therapeutic 
1 И behavior therapy, they will “un- 
y detract from the good reputation 
modification enjoys today" (p. 
me, Ledwidge concedes that 
forms of therapy involve some form of 


‘that 
312). 
All forms 


Of therapy, including medical treatment, 

to the extent that the therapist must 
the client to cooperate in the suggested 
before therapy can begin, In this trivial 
. are cognitive. (p. 357) 
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that compliance and trust are "trivial" issues, 
but the basic point is that clients are pre- 
sumed to think. What is apparently over- 
looked is the equally salient observation that 
therapists behave. At some level of analysis, 
all forms of psychotherapy involve behaviors 
on the part of a therapist that are intended 
to produce changes in the ongoing experiences 
of a client, Thus, one might also argue that 
all therapies are simultaneously cognitive and 
behavioral. 

A more compelling illustration of this point 
is offered by Bandura's (1977a) timely em- 
phasis on the distinction between procedures 
and processes. According to Bandura—and 
many other persons labeled “cognitive be- 
havior modifiers"—the processes that govern 
human adjustment (and maladjustment) are 
cognitive in nature. (ie, They involve at- 
tentional processes, aspects of information 
storage and retrieval, etc.) However, in 
almost comic irony, it now appears that be- 
havioral procedures may be among the most 
powerful methods for activating those cog- 
nitive processes, Thus, if any clear distinc- 
tion can be drawn, the major difference be- 
tween cognitive and less cognitive behavior 
modifiers does not lie in their therapeutic 
procedures so much as in their rationale and 
selection of a given procedure in an individ- 
ual case, The more cognitively oriented ther- 
apist is inclined to employ a behavioral pro- 
cedure appropriate to the "cognitive restruc- 
turing" presumed to be required, 


Misconceptions About Cognitive Therapy 


This confusion regarding process and pro- 
cedure is most apparent in discussions about 
the modification of verbal behavior. In an 
attempt to lump all nonbehavioral ap- 
proaches into a broad category of psycho- 
therapy, Ledwidge (1978) appears to lament 
the recognition of thoughts as therapeutic 
targets in behavior therapy: 


Slowly, but surely, over the years, behavior theory 
has become more cognitive. Cognitions, relabeled 
self-statements, are classed as behaviors. Whereas 
changing a person's mind was, and still is, consid- 
ered psychotherapy, changing a person's self-state- 
ments passes for behavior therapy. (p. 354) 
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It is, of course, a special form of therapy, and 
Ledwidge prefers to call it cognitive therapy 
rather than cognitive behavior modification. 
The label is perhaps less important than the 
accurate assumption that "cognitive change 
(is) the active ingredient in treatment" (p. 
356). Much less accurate, however, is the 
assertion that 


whereas behavior therapists attempt to change be- 
havior directly by using mainly nonverbal means, 
cognitive therapists . . . rely chiefly on speech as 
the instrument of change. (p. 356) 


This artificial distinction is further elabo- 
rated by Ledwidge's later reference to cog- 
nitive therapies as “verbal therapies." The 
error of such a dichotomy is discernible on 
at least two counts. On one hand, the be- 
havior therapist is reliant on verbal com- 
munication during treatment—a point that 
is in direct variance with Ledwidge's “non- 
verbal" attribution. More to the point, how- 
ever, is the fact that persons labeled cogni- 
tive behavior modifiers are explicit in their 
emphasis on behavioral performance as a 
primary means of challenging maladaptive 
beliefs. Cognitive theorists ranging from 
George Kelly to Albert Bandura have been 
almost uniformly consistent in their reliance 
on active motoric performance in therapy 
(cf. Bandura, 1977a, 1977b; Beck, 1976; 
Kelly, 1955; Mahoney, 1977b; Meichen- 
baum, 1977). 

In several places, Ledwidge (1978) recog- 
nizes the problems of distinguishing cogni- 
tive behavior modification and behavior ther- 
apy and the "somewhat arbitrary decision 
of choosing where on a cognitive-behavioral 
dimension to place a cutoff between the two 
types of techniques" (p. 357). The depend- 
ence of many cognitive techniques on overt 
behavioral and nonverbal means of effecting 
therapeutic change would make this distinc- 
tion almost impossible. The ambiguity of 
the distinction between cognitive and be- 
havior therapy is not the fault of Ledwidge. 
The problem is in trying to make distinctions 
at the level of techniques. At this level, what 
different therapists actually do often over- 
laps considerably, a fact that traditionally 
has served as an impetus for eclecticism or 
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integration of seemingly incompatible 
retical positions. We, too, would have 
culty in making decisions using a single 
tinuum based on a cognitive-behavioral 
mension that would divide techniques, 
though extremes might be identified wil 
agreement, At the conceptual level, disti 
guishing different forms of therapy is 
more straightforward task, since the theore 
cal allegiance of a particular technique 
readily identified, 

Perhaps for clinical psychology as a whole, 
the most important distinctions among thera: 
pies are not made at the technique or com 
ceptual levels. A dimension slighted in 
widge's (1978) review that is the most si 
nificant in distinguishing therapies is th 
commitment to empirical research as 
crucible for treatment evaluation. Cogniti 
behavior modification is firmly committed to 
the tenets and practices of contemporary bi 
havioral research. These include careful s 
cification of treatment ingredients, multipl 
operationism of outcome, recognition of over 
behavior as a major measure of treatment 
efficacy, and so on, In this regard, distinc: 
tions between select techniques, whether they 
are called behavioral or cognitive, tend (0 
diminish. In the final analysis, it will, 9 
should, be the theoretically sound and епу 
pirically established techniques that are em- 
braced by the field, What these are called, 
how these develop, and the purity of theit 
philosophical heritage will be interesting but 
not of ultimate importance. 


Empirical Status 


In his discussion, Ledwidge (1978) con: 
cludes that the data supporting cognitive 
behavioral approaches are “meager” in com 
parison with the “enormous body of researd 
validating the effectiveness of behavior thet 
apy procedures” (p. 370). One could, d 
course, question whether behavior therap) 
procedures have been so overwhelming) 
validated (e.g, Kazdin & Wilcoxon, 1976 
Kazdin & Wilson, 1978). Likewise, one cou 
question the classification of such proce 
dures as imagery and modeling as Беһауі0 
ally (rather than cognitively) oriented W 


tervention strategies. More pertinent, how- 
ever, may be Ledwidge's assertion that "the 
more cognitive the technique, the less effec- 
tive it is" (p. 370). This is in ironic con- 
trast to his concluding remark that judg- 
ment on the relative effectiveness of cog- 
nitive-behavioral strategies must be deferred 
until adequate research is available. Both 
explicitly and implicitly, it is clear that Led- 
widge has already developed the hunch that 
cognitive-behavioral techniques аге “а step in 
tthe wrong direction." This verdict is ques- 
tionable on at least two counts. 

First, there are now over a dozen studies 
that suggest that cognitive parameters may 
enhance either the predictive validity or the 
therapeutic power of previous techniques 
(Bandura, 1977a; Mahoney & Arnkoff, 1978; 
Meichenbaum, 1977). The potential of cog- 
nitive perspectives is most apparent in the 
recent studies by Taylor and Marshall 

* (1977) and Rush, Beck, Kovacs, and Hollon 
(1977). In the former it was shown that a 
cognitive-behavioral therapy was more effec- 
tive than isolated cognitive, behavioral, or 
no-treatment groups in the management of 
mild to moderately depressed subjects. The 
Rush et al. study found that cognitive ther- 
apy was more effective than chemotherapy 
(tricyclic drugs) in the treatment of severe 
depression—a finding that is noteworthy in 
its constituting the first (and only) psycho- 
logical treatment to surpass tricyclic drug 
efficacy in this realm. Likewise, it appears 
that cognitively dictated “mastery experi- 
ences” may enhance the effectiveness of be- 
havior rehearsal and participant modeling 
techniques (Bandura, 1977a, 1977b). 

A second reason for questioning Ledwidge’s 
(1978) verdict is its prematurity. Cognitive- 
behavioral approaches are relatively recent 
developments, and adequate opportunity to 
evaluate their mettle has yet to offer itself. 
Just as Ledwidge has expressed concern 
about the “wholesale conversion to cognitive 


ff, methods on insufficient. evidence" (p. 354), 


| One might question the wisdom of prema- 
ture rejection of such methods on equally 
insufficient grounds. There are, of course, 
Cognitive theorists who have been generous 


( E their evaluation and optimism regarding 
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the current status of cognitive-behavioral 
approaches (e.g, Ellis, 1977). One might 
find equally enthusiastic proponents of al- 
most any system of psychotherapy. On the 
other side of the coin, however, it is worth 
acknowledging that many defenders of the 
developing ^ cognitive-behavioral ^ interface 
seem to be well aware of the challenges that 
continue to face this area of inquiry (e.g., 
Bandura, 1977b; Beck, 1976; Mahoney, 
1974, 1977a, 1977b; Meichenbaum, 1977). 
At least some of these sentiments were ex- 
pressed in the lead article of that new jour- 
nal *devoted entirely to cognitive therapy" 
(Ledwidge, 1978, p. 359). After a conserva- 
tive statement on the status of cognitive- 
behavioral strategies, the need for continuing 
self-scrutiny was emphasized: 


In sum, our long and arduous journey has just 
begun. Let us not waste time congratulating our- 
selves on our wisdom when our ignorance is still 
so salient. We have cause for optimism, but hardly 
for jubilation. There are throngs of suffering hu- 
mans who fill our waiting rooms, and we have yet 
to demonstrate that our therapeutic promises will 
serve them better than have those of the past. 
Let us bear in mind that our ultimate commitment 
is to these persons, and not to the esoteric needs 
of our paradigm. We must be ready to change 
when so doing would serve them better, and we 
must be every ready to follow new paths toward 
clinical effectiveness. (Mahoney, 1977b, p. 15). 


Conclusion 


Among the many problems that arise in 
the therapy literature, two seem to be par- 
ticularly objectionable and dangerous. The 
first and certainly foremost is overzealously 
advocating treatment techniques based on 
anecdotal information and weak case mate- 
rial. Therapy techniques continue to be 
hailed as effective in professional and lay 
circles as if empirical evidence attesting to 
their efficacy were already available. Scien- 
tific criteria for endorsing existing treatments 
have yet to be uniformly embraced in the 
psychotherapy literature, not to mention 
clinical practice. 

The second problem is judging, in advance 
of empirical research, the kind of techniques 
that might be effective and ruling out certain 
avenues based on this judgment. Ledwidge's 
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(1978) article discusses the beginnings of 
research on the many different forms of cog- 
nitive behavior modification but at the same 
time warns about the potentially undesirable 
consequences of embracing these techniques. 
Those who object to cognitive behavior mod- 
ification for reasons that they believe to be 
conceptual or metaphysical should rejoice in 
the fact that this area is strongly committed 
to empirical research. If in fact cognitive 
based techniques add so little to existing tech- 
niques, this will be demonstrated rather soon, 
since research in cognitive behavior modifica- 
tion is proliferating so actively. Alternatively, 
if this research proves to be heuristically 
and clinically productive, as current evi- 
dence suggests, critics will be required to 
examine their own criteria for decision 
making. 

The current state of cognitive behavior 
modification calls for accelerated research 
rather than second thoughts over further 
exploration. Objections based on the pos- 
sible conceptual and methodological impuri- 
ties that cognitive theory or therapy might 
introduce reflect a failure to appreciate the 
historical lineage and contemporary charac- 
teristics of behavior therapy. Behavior ther- 
apy is hardly free from sin in the sense of 
having many loose theoretical ends, tech- 
niques based on concepts that stretch (and 
use) the imagination, and assertions of effi- 
cacy that are poorly based (Kazdin, 1978). 
To imply that cognitive theory or techniques 
somehow tarnish all of this is difficult to 
maintain. 

Cognitive therapy has thrown itself into 
the evaluative arena of empirical research. 
This is a risk that many forms of traditional 
therapy have yet to take. Along with the 
risk of demise should be the potential bene- 
fits of successes and empirical insights. At 
the very least, critics as well as advocates 
might be well advised to suspend judgment 
until the data accrue. 
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Cognitive Behavior Modification or New Ways 
to Change Minds: Reply to Mahoney and Kazdin 
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The “myths” that (a) cognitivism necessitates mentalism and that (b) ther- 
apists can be classified on the basis of technique are defended. Two studies 
cited by Mahoney and Kazdin as evidence of the potential of Cognitive per- 
spectives are found, on close examination, to raise more questions than they 
answer. Charges of prejudgment and premature evacuation of the field are 
disclaimed, but it is urged that the phrase behavior modification not be used 
to describe cognitive approaches, since failure to distinguish the two kinds of 
therapy invites a conceptual confusion of cognition with behavior that could 
have unfortunate theoretical and practical consequences. 


If it is possible to be damned with faint 
praise, then Mahoney and Kazdin’s (1979) 
critique surely canonizes my position (Led- 
widge, 1978) with faint criticism. Lacking 
adequate data, they fail to face squarely the 
central issues raised and instead attempt to 
shift the focus from treatment effectiveness 
to theoretical issues, including the old and 
bitter mind-body polemic. After dismissing 
as “sporadic” my review of the literature 
(which included a critical review of every ar- 
ticle on cognitive-behavior modification pub- 
lished in any of the four behavior therapy 
journals between 1963, when the first issue of 
Behaviour Research and Therapy appeared, 
and July 1976, when I began writing the arti- 
cle) and after characterizing my conclusions as 
“only tangentially related to extant evidence,” 
Mahoney and Kazdin then devote merely two 
paragraphs of their article to the empirical 
status of cognitive behavior modification and 
fail to cite any of the studies allegedly missed 
in my “sporadic review.” The balance of their 
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article is devoted to questions that are mor 
semantic than real and that, by definition, 
cannot be answered empirically. It is to these 
pseudoissues that I now reluctantly turn. 


Myths 


I am taken to task by the authors for prop- 
agating two “persistent myths” that surround 
the “interface” of cognitive therapy and be- 
havior modification. In the first place, accord- 
ing to Mahoney and Kazdin, I, like Skinnd 
(1978) mistakenly assume that cognitivisml 
necessitates mentalism. 

What is a cognition, however, if not a men- 
tal event? A cognition is a behavior (albeit 
private), a hypothetical construct inferri 
from behavior to account for behavior, oF # 
mental event in the bad sense of the word, 
that is, a causal event that is not reducible 
without remainder to either environmental; 
behavioral, or physiological events. The radi 
cal behaviorists, including Skinner, allow thal 
private events (e.g, minute muscle move 
ments detectable by an electromyograph) a 
acceptable for study but only if the investi 
gator is able to reliably determine when 
phenomenon occurs (Biglan & Kass, 1978) 
Verbal reports about cognitions qualify as E 
haviors in this sense, but the cognitions them 
selves do not. If cognitive behavior modifie. 
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yere attempting merely to change the client's 
_ verbal reports about cognitions, then the treat- 
ment could be characterized as behavioral, 
but it is clearly the thoughts, beliefs, and at- 
titudes of the clients and not just their speech 
acts that is the focus of treatment. Although 
radical behaviorists eschew the use of con- 
structs (and theory building in general), 
methodological behaviorists are not averse to 
using intervening variables (e.g., Hull's 7,) 
,to account for relationships among observa- 
ble, environmental, and organismic events. If 
in fact cognitive behavior modifiers used cog- 
nition simply as a construct to account for 
enduring patterns of clients’ responses to con- 
sulting room procedures, then there would 
be no confusion as to the status of a cogni- 
tion. Unlike Hull, however, who never at- 
tempted to manipulate r,s, Mahoney, Mei- 
chenbaum, Ellis, and others convert the in- 
,tervening variable to а subject matter. 
(Mahoney and Kazdin, 1979, agree that 
"cognitive change is the active ingredient in 
treatnient" [p. 1046].) If a cognition, then, 
as the term is used by cognitive behavior 
modifiers, is neither a behavior nor an inter- 
vening variable, then mental event remains 
| as the most apt descriptor. To the extent that 
Cognitive approaches appeal to causal events 
that are not environmental, behavioral, or 
Specifically physiological, they are mentalistic. 
Hence, although Mahoney and Kazdin appear 
to repudiate psychophysical dualism, they are 
In а sense, tacit or “closet” dualists. 
As evidence against my contention that 
Cognitivism necessitates mentalism, Mahoney 
, and Kazdin (1979)point out that in a soon- 
to-be-published survey (Mahoney, in press) 
involving 42 of the most eminent living con- 
ributors to behavior therapy and cognitive 
behavior modification, there were no signifi- 
[ent differences in belief in the existence of 
à "mind." It does not surprise me that thera- 
Pists of different persuasions have been able 
to accommodate in their lexicon this battered 
| term, The question is whether the behavior 
therapists and the cognitive behavior modi- 
= Surveyed mean the same thing when 
they say that mind exists, English and En- 
Uy (1958) list five definitions of mind, one 
of which might be acceptable to some behav- 
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iorists, namely, 


the organized totality or system of all mental 
processes or psychic activities, usually of an indi- 
vidual organism. The emphasis is upon the related- 
ness of the phenomena. Mind in this sense does not 
commit the user to a metaphysical position about 
the nature of these processes. Hence, it may be used 
by those who define psychology in terms of acts 
or behaviors. (p. 323; boldface in original) 


The second “myth” that I am found guilty 
of propagating is that therapists can be cate- 
gorized on the basis of the techniques they 
use. Mahoney and Kazdin (1979) point out 
that 


according to Bandura—and many other persons 
labeled “cognitive behavior modifiers"—the processes 
that govern human adjustment (and maladjustment) 
are cognitive in nature. (ie, They involve atten- 
tional processes, aspects of information storage and 
retrieval, etc.) However, in almost comic irony, it 
now appears that behavioral procedures may be 
among the most powerful methods for activating 
those cognitive processes. Thus, if any clear distinc- 
tion can be drawn, the major difference between 
cognitive and less cognitive behavior modifiers does 
not lie in their therapeutic procedures so much as 
in their rationale and selection of a given procedure 
in an individual case. The more cognitively oriented 
therapist is inclined to employ a behavioral proce- 
dure appropriate to the “cognitive restructuring” 
presumed to be required. (p. 1045) 


The logic here is questionable. First we are 
told that cognitive behavior modifiers believe 
that the processes governing behavior are cog- 
nitive in nature. Mahoney (1977) asserts that 
the first premise of the cognitive-learning 
perspective is that “the human organism re- 
sponds primarily to cognitive representations 
of its environments rather than to those en- 
vironments per se" (p. 7). Then we are told 
that “in almost comic irony" (comical to be- 
havior modifiers, ironic to cognitive behavior 
modifiers) behavioral techniques turn out to 
be the most efficient methods of changing these 
hypothetical constructs (an implicit endorse- 
ment of behavior therapy). Faced with data 
which indicate that human organisms respond 
primarily to the environment rather than to 
cognitive representations of it, Mahoney and 
Kazdin (1979) resolve this embarrassing theo- 
retical paradox by asserting that although 
cognitive therapists and behavior therapists 
often use the same techniques, the best way 
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to distinguish them is on the basis of the 
rationale each uses for selecting a technique. 

Surely, technique is the only logical basis 
for classifying therapists. Of what difference 
is it to the client whether the therapist's in- 
tent, when administering a procedure, was to 
reinforce discriminative operants, change ir- 
rational thought patterns, or strengthen the 
ego? Are we to believe that when participant 
modeling, administered by a cognitive thera- 
pist, results in a decrement in avoidance re- 
sponding, it does so for different reasons 
(cognitive changes) than when it is used 
successfully by a therapist who does not con- 
sider cognitive change the active ingredient 
in participant modeling? Surely all therapists, 
including cognitive behavior modifiers, should 
base their choice of technique on empirical 
evidence of its effects on behavior and not 
on any theory associated with the technique. 


Empirical Status 


Two studies published after I had com- 
pleted my review of the literature are cited 
as evidence of the potential of cognitive per- 
spectives. One of these, Rush, Beck, Kovacs, 
and Hollon (1977), found that cognitive ther- 
apy was more effective than imipramine in 
the treatment of depressed outpatients. The 
treatment procedure used by the cognitive 
therapists is not specified, but we are referred 
to 43 pages of a book by one of the authors 
(Beck, 1976, pp. 263-305) and are told that 


the cognitive therapist employs both verbal and be- 
havioral techniques to help the patient learn to (a) 
recognize the connections between cognition, affect, 
and behavior, (b) monitor his negative thoughts, 
(c) examine the evidence for and against his dis- 
torted cognitions, and (d) substitute more reality- 
oriented interpretations for his distorted negative 
cognitions. (pp. 18-19) 


One wonders exactly what went on in these 
cognitive therapy sessions. Cognitive behavior 
modification may be firmly committed to the 
tenets and practices of contemporary behav- 
ioral research, but cognitive therapy does 
not lend itself to empirical investigation as 
well as does behavior therapy because some 
parameters of cognitive therapy cannot be 
specified. How does one operationally define, 
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for example, a cognitive technique “to hel 
the patient recognize the connections betw 
cognition, affect, and behavior?" One wonders, 
too, how much of the cognitive therapy i 
the Rush et al. study consisted of ver 
techniques and how much consisted of 
havioral techniques. Would a straight behav. 
ior modification approach (e.g., differenti 
reinforcement of depressive and nondepressi 
behavior by the patient's spouse or anothi 
significant person) have been more effective! 
These questions seem particularly relevant i 
the light of the findings of the second stud 
cited by Mahoney and Kazdin (1979). Tayl 
and Marshall (1977) found that cogniti 
therapy and “behavioral intervention” (5 
—verbal nonassertiveness—was the only b 
havior treated) were equally effective in ri 
ducing depression but that a combination 0 
the two verbal approaches was more effective 
than either technique used singly. A behav- 
ior therapist might reasonably ask whethet 
behavioral treatment of a broader spectrum 
of depressive behaviors (including speech) 
would prove superior to cognitive therapy or 
the combination treatment of Taylor and 
Marshall. 


Conclusion 


In their discussion, Mahoney and Kazdin 
(1979) accuse me of "judging, in advance of 
empirical research, the kind of technique 
that might be effective and ruling out certall 
avenues based on this judgment" (p. 1047): 
The charge of prejudgment is supported bY 
an out-of-context quote from my ati í 
(Ledwidge, 1978), in which I am alleged t0 
have concluded that “the more cognitive b 
technique, the less effective it is" (р. 370). Í 
context, the statement above turns out 
apply only to one of three types of data 16 
viewed (comparisons of two variants of t 
same behavior therapy technique in whi 
one of the procedures relies less on cogniti 
operations than does the other and/or i 
extra behavioral components as part of h 
procedure). Seven such comparative StU i 
were presented (including two by Bandit 
and his colleagues [Bandura, Blanchart, ) 
Ritter, 1969; Bandura & Menlove, 1968 
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six of which showed that the more cognitive 
the technique, the less effective it was. Ban- 
dura (1977) has come to the same conclusion: 
“Regardless of the methods involved, results 
of comparative studies attest to the superior- 
ity of performance-based treatments" (p. 196). 

At no point in my article did I recommend, 
as Mahoney and Kazdin (1979) imply, that 
research on the effectiveness of cognitive ther- 
apies be discontinued; in fact, more research 
and deferred judgment were called for: 


Existing CBM procedures may in time be shown to 
be as effective as behavior therapy with clinical dis- 
orders, or new and more effective cognitive methods 
may yet be devised. As documented earlier, cognitive 
behavior modification is of recent origin, and only 
a handful of studies have appeared in the journals 
so far. Until comparisons of behavior therapy and 
CBM are carried out with clinical populations, how- 
ever, judgment on their relative effectiveness must 
be deferred (Ledwidge, 1978, p. 371) 


The emphasis I intended to impart was that 
this new hybrid therapy should not be called 
cognitive behavior modification because it is 
not behavior modification. Cognitive therapists 
are engaged in a radical departure from the 
methodology of behaviorism in treating cog- 
nition as subject matter rather than as an 
intervening variable. I hoped to point out 
how failure to distinguish the two kinds of 
herapy invites a conceptual confusion of 
Cognition with behavior that could have un- 
fortunate theoretical as well as practical con- 
Sequences. 

If the requested name change sounds like 
School chauvinism, it is because it is. The 
hard-earned excellent reputation that behav- 
lor modification enjoys today would be tar- 
nished if cognitive behavior modification 
| Proves no more effective than more traditional 
forms of Psychotherapy. 
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Psychobiology of Active and Inactive Memory 


Donald J. Lewis 
University of Southern California 


A brief history and review of the short-term memory and long-term memory 
distinction is presented, and this distinction is concluded to be no longer ade- 
quate for either human or animal memory data. Simple memories for events 
are apparently formed quickly and are permanent. In such cases, an initial 
physiologically unstable period is not required. Thus, most forgetting is the 
result of a retrieval failure rather than a storage failure. A distinction between 
active memory (AM) and inactive memory (IM) is made. AM is a subset of 
IM and contains either newly formed memories or established retrieved mem- 
ories or both. Some of the implications for psychobiology of the AM and IM 
distinction are discussed. It is suggested, for example, that while in AM, mem- 
ories are particularly open to disruption either by amnesic agents or through 
other forms of interference. The forgetting process for new and established 
memories is time dependent (but independent of memory age) and is based on 
interference. It is desirable to maintain the distinction between memory storage 
and memory retrieval even while recognizing that associative storage aids in 
retrieval. The search for the biological basis of rapidly forming memories, 
perhaps based on the restructuring of protein fragments, remains important 
but the physiological brain processes underlying memory interference and re- 


trieval require greater emphasis. 


The distinction between short-term memory 

. (STM) and long-term memory (LTM) has 
been a useful and productive one in both 

animal physiological psychology and human 

cognitive psychology in recent years. There 

is not total agreement on the characteristics 

of STM, but most agree that (a) STM is 

either the route of entry to LTM for new 

memories or at least the holding template 

until LTM processes are complete; (b) the 

contents of STM are temporary and fragile, 

either decaying rapidly, open to disruption, 

or transferring rapidly to LTM; and (c) the 

capacity of STM is limited to a few items at 

one time. LTM, on the other hand, stores 

memories relatively permanently and has al- 

most unlimited capacity. There are many 

variations on these characteristics, some of 

which are discussed later. The purpose of this 
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article is to show that the distinction between 
a temporary and fragile memory and one thal 
can become permanent is no longer adequate 
for the preponderance of the data in either 
the animal or the human fields. A more ade 
quate alternative is to consider all memories 
as long-term or permanent, with a few of 
them being active at any given time and the 
remainder of them in an inactive state. A brief 
history of the STM-LTM distinction will 
demonstrate why this is а more adequate 
alternative. 


Precursors 


William James (1890) introduced the term 
“primary memory,” by which he meant the 
perceptions, images, sensory impressions, Ре“ 
ceptual processes, concepts, and mental er 
tities of any sort that occur as the result 
external stimulations during learning. ^ 7 
items of primary memory are part of E 
psychological present, James used the teri 
“specious present” to indicate that the 0” 


a 
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cept of the present was more apparent than 
real because what seemed like the present 
was overlaid by a great deal of the past and 
the future. This was part of James's firm in- 
sistence that consciousness was continuous 
and that any attempt to break it into parts 
was artificial. The "real" memory was his 
secondary memory, into which items could be 
stored and retrieved. For James, retrieved 
memories, even at the time of retrieval, were 
«in secondary memory. 

After James (1890), classical interest in 
primary memory focused on the number of 
items that could be simultaneously present. 
This line of research culminated in G. A. 
Miller's (1956) “magical number seven," con- 
solidating the agreement among many studies 
that on the average, only seven items could 
be held in immediate memory and proposing 
the concept of “chunking,” by which the 
amount of information could be greatly in- 
creased while retaining the limitation of ap- 
proximately seven items, 

Working in the field of verbal learning, 
Müller and Pilzecker (1900) attempted to 
explain the data on retroactive interference— 
that the learning of a second list of verbal 
materials causes the forgetting of a previously 
earned list. They argued for two physiologi- 
cal processes: first, a perseverating neural pro- 
cess that was maintained until, second, a more 
Permanent memory structure was formed. 
During the perseverative phase, the neural 
Process was open to disruption, and thus the 
Second list of verbal materials disrupted the 
neural perseveration initiated by the first list 

before it could become a permanent physical 
Structure, Müller and Pilzecker’s “consolida- 
„tion theory" has had an immense impact on 
neurological investigations of learning, even 
though its influence on verbal learning, for 
Which it was originally proposed, has been 
minimal. A major reason for the neglect of 
Perseveration theory by cognitive researchers 
| Hj been the recognition of proactive inter- 
fone forgetting can be caused by the learn- 
the 4 verbal Materials occurring long before 
‘ites Daa of materials whose forgetting en- 
А à though retroactive interference could 
Че to a totally different forgetting mecha- 


ni fos 
msm than proactive interference, an unneces- 
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sary lack of parsimony would be required. 
Further, the finding that a major determinant 
of both retroactive and proactive interference 
was the similarity of the learned items turned 
researchers away from neurological concepts, 
since verbal similarity seemed difficult to con- 
ceptualize neurologically. 

Ebbinghaus (1913) also proposed the pos- 
sibility of two different memories, although 
his distinction was based on the number of 
items to be learned. If 6 or 7 nonsense sylla- 
bles were to be committed to memory, only 
one trial was typically required, whereas 12 
syllables required almost 15 repetitions. This 
discontinuity implied two different processes 
to Ebbinghaus, although it was not a distinc- 
tion between an STM and an LTM, since 
both processes committed items to LTM. 

Following Ebbinghaus (1913), researchers 
showed little interest in a STM-LTM dis- 
tinction until the late 1940s, at which time 
both physiological psychologists and human 
cognitive psychologists became interested but 
with different orientations. The cognitivist 
interest grew out of research on information 
processing (Broadbent, 1958) and paid little 
attention to physiological work. Hebb (1949) 
renewed the interest of physiological research- 
ers, who were in turn little concerned with 
the informational approach. In both fields 
there was an initial marked enthusiasm for 
two-process theories, an enthusiasm that later 
became tempered by skepticism and, in many 
cases, outright rejection. 


Human Cognitive Conceptions 
Experimental 


Broadbent (1958) attached earphones to 
his human subjects and simultaneously pre- 
sented two sets of three or four digits, one 
set to each ear, and the subjects were asked 
to recall each set. Broadbent found that sub- 
jects usually attended to only one set of 
digits and could recall this set perfectly. But 
they could also recall some of the second set 
even though the rate of presentation was too 
fast to permit the switching of attention from 
one ear to the other. From this, Broadbent 
concluded that there must be a temporary 
storage system, an STM, in which the second 
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set of digits was held momentarily, until at- 
tention could be turned to it. He conjectured 
that this storage system could hold the infor- 
mation only for a few seconds and that items 
that were not attended to within this time 
decayed and disappeared. Although Broad- 
bent's STM mechanism was more complicated 
than has just been described, its essential fea- 
tures were (a) rapid decay rates, (b) con- 
tinuing processing necessary to maintain items 
in STM, and (c) a limited capacity of only 
two to four items. Broadbent believed that 
these features of STM necessitated its dis- 
tinction from LTM, which possessed none of 
these properties. Forgetting in LTM, he be- 
lieved, was due to the interference produced 
by other memories and not to decay; items 
could be held in LTM indefinitely and with- 
out effort unless there was specific interfer- 
ence; the capacity of LTM was virtually 
unlimited. 

Peterson and Peterson (1959) and Brown 
(1958) independently introduced both a new 
experimental procedure to study STM and 
additional data and theory to support the 
separation of STM from LTM. They pre- 
sented a verbal unit consisting typically of 
three consonants—a trigram. If recall occurred 
immediately after a presentation, performance 
was usually perfect. If immediately after pre- 
sentation of the trigram subjects were re- 
quired to count backward by threes from a 
given number, there appeared a sharply de- 
celerated “decay” function with retention al- 
most at zero after an 18-sec delay. Here was 
good evidence for rapid forgetting, if the 
items could not be held continuously in STM, 
and Peterson and Peterson and Landauer 
(1974) supported a time-dependent decay 
theory of forgetting from STM, a notion very 
similar to that of the Müller and Pilzecker 
(1900) theory. This theory, you recall, had 
fallen into disuse in part because of the evi- 
dence that forgetting was produced by inter- 

ference that was caused by the similarity be- 
tween the target information and the interfer- 
ing information. Further, this similarity could 
be defined on the bases of the semantic prop- 
erties of the material. “House” and “dwell- 
ing,” for example, would interfere greatly with 
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each other, even though they had dissimi| 
physical features. 1 

Thus, evidence presented by Conrad (1963) 
and Sperling (1963) that interference 
occurred in STM was important. This int 
ference, however, was acoustic in nature rathi 
than semantic; for example, subjects woul 
respond with an F when S was the corr 
trigram letter. Therefore, STM was still con- 
sidered distinct from LTM, but acoustic in- 
terference replaced rapid decay in part, at 
least, as a distinguishing feature of STM. Im- 
portant cognitive theories advanced at about 
this time incorporated conceptually different 
memory components, either as a physical store, 
a process, or both (e.g., Atkinson & Shiffrin, 
1968; Glanzer, 1972; Waugh & Norman, 
1965). Probably, by the end of the 1960s, à 
majority of the cognitive learning researchers 
accepted STM as real and distinct from LTM, 
even though the notion of a decaying or à 
consolidating engram (Müller & Pilzecker, 
1900) never attracted a substantial number 
of adherents. (Nevertheless, see Hintzman, 
1974, who considers consolidation a possible 
explanation for the “spacing effect,” ап im- 
provement in retention when the interval be- 
tween repetitions of certain verbal materials 
varies from 0 up to 15 sec.) 


Amnesic Syndrome 


Humans who have suffered damage to the 
hippocampus and/or areas around the walls 
of the third ventricle frequently show a рї 
found learning and memory impairment of 4? 
extraordinary nature. Upon hearing а seri 
of digits, they can repeat correctly as тап) 
as can normals, and they can carry on а no 
mal conversation, which means they can hold 
in memory the early part of a message ат 
use it to unitize the thought. They also ар 
pear to remember events that occurred be 
fore their injury. If they have a normal met 
ory for current events and also for even 
prior to injury, why are they character" 
as amnesics? They appear to be unable ti 
transfer current memories to the longi 
store. A conversation they have today, " 
example, will be forgotten tomorrow. Ё. 
and LTM each seem clear and distinct, |; 


new information does not appear to enter 
ТТМ. The most famous of these amnesics is 
H.M. (Milner, 1966, 1970). H.M. was a 
severe epileptic who underwent a bilateral 
hippocampectomi, after which he seemed un- 
able to add any new information for later 
retrieval. No matter how many days in a row 
he had a conversation with a person, he could 
not remember the person's name or anything 
about the previous conversation on the next 
day. Other than this deficit, his behavior was 
* not notably abnormal. There are many others 
who suffer from the amnesic syndrome as a 
result of brain damage, and, although their 
symptoms may vary slightly, they have in 
common an inability or difficulty in remem- 
bering events beyond a short period of time 
(Talland, 1968). This sort of dichotomized 
memory function seemed to demand a dis- 
tinction between an STM and an LTM, and 
Atkinson and Schiffrin (1968), in their theo- 
retical treatment of memory, characterized the 
amnesic syndrome as the single most convinc- 
ing piece of evidence for a distinction between 
STM and LTM. 


Arguments Against a Human 
STM-LTM Distinction 


‘Experimental Evidence 


| Although the arguments and data support- 
ing an STM-LTM distinction at the human 
level are strong, there has been considerable 
dissent from this position, a great deal of it 
centering around the Peterson-Brown (Brown, 
1958; Peterson & Peterson, 1959) experimen- 
tal procedure. 

a Melton (1963) had argued that one of the 
major distinguishing features of LTM was 
its openness to interference and that to the 
extent that a putative STM and the LTM 
showed identical interference effects, there 

| would be no necessity to posit separate mem- 

í 9ries. Keppel and Underwood's (1962) dem- 

| Onstration that a considerable amount of the 

| forgetting Occurring in the Peterson-Brown 

, Procedure was due to proactive interference 

Was thus a powerful support for a single- 
memory conceptualization. Then, Waugh and 


,, Norman (1965) (see also Glanzer, 1972; 
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Wickelgren & Norman, 1966) showed that the 
number of interfering items was more im- 
portant in producing forgetting in the short- 
term paradigm than the simple passage of 
time. The similarity among items has also 
been demonstrated as a cause of STM for- 
getting. Deutsch (1970), for example, pro- 
duced greater forgetting of tone sequences 
when the interpolated material was also tonal 
than when words were used. Wickelgren, in 
a series of studies (e.g., 1965), showed both 
proactive and retroactive interference in STM 
when the learning and interfering materials 
were acoustically similar. 

The evidence thus shows that both pro- 
active and retroactive interference occur in 
STM and that this interference is the major 
cause of STM forgetting. Any simple decay 
(Reitman, 1974) that remains does not seem 
to be of much importance. Basically, these 
data seemed to support Melton's (1963) con- 
tention that if the putative STM obeys the 
same laws as LTM, there is no reason to as- 
sign it a separate existence. But interference 
may be acoustic in STM and semantic in LTM. 

Assuming a separate STM, how is the trans- 
fer of memories made to LTM? Items are 
both maintained in STM and transferred to 
LTM through rehearsal. But rehearsal need 
not refer only to the holding through repeti- 
tion of items in STM for a period of time. 
Craik and Watkins (1973) showed that sim- 
ply maintaining items in STM for 10-20 sec 
did not improve their strength in LTM. (See 
Craik, 1979, for more detailed discussion.) 
Some form of coding is necessary to retain an 
item permanently. Ample evidence now exists 
that semantic coding does take place in STM 
(e.g., Shulman, 1970, 1972). There may still 
be a difference in the degree to which seman- 
tic coding takes place in STM and LTM, 
but the difference is not dichotomous and 
therefore neither are the memories on this 
basis. These coding studies have yielded a 
useful distinction between maintenance re- 
hearsal and coding rehearsal The simple 
holding or maintaining of an item in STM 
for a period of time, as has been indicated, 
aids little in its retrieval from LTM unless 
some coding operation is performed on it. 
It is the type of activity that occurs during 
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rehearsal that is crucial іп STM itself, not the 
passage of time. 

That different types of coding do occur in 
STM has helped to blur an STM-LTM dis- 
tinction based on what has been called the 
"negative recency effect" (Madigan & Mc- 
Cabe, 1971). For immediate recall both the 
first and last items of several in a list are 
better retained than those in the middle. The 
usual explanation is that more attention and 
rehearsal can be devoted to the first items 
simply because they are first, whereas the 
last item is well retained because it is still 
in STM. Delayed recall, however, yields a 
negative recency because the last item re- 
ceived little rehearsal compared to the early 
items, and it is no longer in STM. But when 
appropriate coding is used (Bellezza & 
Walker, 1974), the usual positive recency is 
found also for delayed recall. Tulving (1968, 
1974) has taken the view that there are two 
retrieval processes involved in recall rather 
than two storage processes. Early aíter item 
presentation, temporal and phonemic retrieval 
cues are effective. Later on, semantic cues are 
better. Murdock (1974) gives an intuitive 
example of this distinction. You are directing 
someone to pick up your suitcase from an 
airport conveyor discharge belt. “It is the 
one that has just come down the chute" would 
be one type of instruction. If it has been on 
the belt for some time, a different type of 
code would have to be used such as, “Tt is the 
blue one with the torn grip." 

Based on the experimental data produced 
by cognitive learning psychologists, it seems 
fair to conclude that there is as yet no con- 
vincing reason to require a separate STM. But 
what of the evidence derived from the amnesic 
syndrome? 


Amnesic Syndrome 


H.M. had a normal immediate memory 
span, you recall, but he could not recognize 
people he had met many times or even recall 
yesterday's conversation. It was believed that 
he was unable to transfer memories from STM 
to LTM, supporting the notion of the inde- 
pendent existence of the two. Further studies, 
however, showed that H.M. could retain cer- 
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tain visual and tactile maze tasks (Corkin 
1968). Н.М, also had some knowledge 
he had a memory impairment (Milner, 1970) 
and he had learned a factory job, although h 
could not describe it. He knew that President 
Kennedy was assassinated, but he did no 
know who succeeded him. Perhaps one might 
conclude that H.M. has an aphasic deficien 
in coding new information so that it can b 
recognized later. If so, the deficit is one of 
retrieval rather than of storage. Other evi 
dence from amnesics lends support to thig 
hypothesis in that with appropriate retrieval 
probes, memories can be recovered (Weis: 
krantz & Warrington, 1975). This point 
discussed in greater detail later. 

The phenomenon of retrograde amnesia 
(RA) is the memory loss caused by а brain 
trauma, a loss that is most severe for events) 
just prior to trauma and is progressively less 
further into the past. This gradient is one] 
of the bulwarks of consolidation theory 
(Glickman, 1961; Mah & Albert, 1973; Me 
Gaugh, 1966; Miiller & Pilzecker, 1900), 
which requires new memories to be more open 
to disruption than older ones. As memories 
age, they harden and become relatively perma 
nent. There are numerous observations, how- 
ever, that are not consistent with the notion 
of a consolidating memory trace. (a) Many 
patients are amnesic for events that extend 
back a matter of years. (b) Shrinkage of am- 
nesia is a frequent occurrence. The shrinkage 
indicates that during amnesia the memory 
existed but simply was not accessible. Shrink- 
age strongly suggests a failure to retrieve, not 
a failure to consolidate or to progress from 
STM to LTM. (c) RA frequently is spotti 
with islands of remembering that have ma 
temporal continuity, and RA can involve con- 
densing two events into one (Talland, 1968). 
The amnesia shrinks, but it is not a consis) 
tently temporal process moving forward in 
time. More islands appear, and only late 
does temporal coherence return. (d) Amnesió 
have difficulty remembering specific events | 
but can remember well-categorized informa 
tion or generalized information. They © 
for example, remember that flags are for Wa | 
ing in a parade, but they do not remember v i 
specific flag or a specific parade (Wood © 
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Kinsbourne, Note 1). (e) Amnesics frequently 
can learn motor skills and perceptual tasks 
even though they forget when or how they 
learned them (Milner, 1970; Warrington & 
Weiskrantz, 1970). (f) Proactive and retro- 
active interference can explain much of the 
forgetting of amnesics (Weiskrantz & War- 
rington, 1975). 

In an excellent review of the amnesic syn- 
drome, Kinsbourne and Wood (1975) argue 
„that the reason that there appears to be a 
temporal gradient in many amnesics is that 
patients are asked different types of ques- 
tions about the recent past than about the 
remote past. The questions about the remote 
past tend to be more general and categorical 
(semantic), except for the case history type 
of information that has undoubtedly been fre- 
quently rehearsed. Specific temporally re- 
lated questions (episodic), on the other hand, 
њ are asked about recent events, and, since am- 
nesics have trouble remembering specific 
events about any period of their life, there is 

the appearance of the forgetting of recent 
events. Kinsbourne and Wood related the for- 
getting that is characteristic of the amnesic 
syndrome to the distinction made by Tulving 

(1972) between episodic and semantic mem- 
ory. Episodic memory is for specific events, 
‘episodes, or contexts. Semantic memory is 

about rules, categories, generalities, relations, 

and associations and can be context free. The 

forgetting of amnesics is characteristically 

episodic, whereas their semantic memory re- 

mains normal. The apparent time gradient in 

amnesics is due more to the nature of the 
. interview, which focuses on episodic material 
- for recent events and semantic material for 
., emote events, 
Kinsbourne and Wood (1975) reviewed the 
experimental work on amnesics that has been 
done using the Peterson-Brown paradigm. The 
ic results of their review suggest strongly 
that amnesics performed more poorly on the 
| STM task than did normals, but the forgetting 
. Curve was parallel for amnesics, and for nor- 

mals over an interval from 3 to 18 sec, there 
t mas no interaction between the groups. This 
у Ld is that the STM memory process is 

* same for both groups; they forget in the 
„Same fashion but differ simply in the amount 
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that is forgotten. Also, the amount of inter- 
ference interacts with retention time in the 
same way for both. Kinsbourne and Wood 
cite studies (H. Gardner, Boller, Moreines, & 
Butters, 1973; Wood & Kinsbourne, Note 1) 
in which subjects were told at retrieval time 
which category the words that they were re- 
quired to learn were taken from, and retrieval 
was aided. Similar information at the time of 
learning had no effect. They concluded that 
“the balance of the evidence favors a retrieval 
explanation of the amnesic defect in short- 
term memory as measured in the Peterson and 
Peterson paradigm” (Kinsbourne & Wood, 
1975, p. 277). 


Conclusion 


It seems clear that for those who wish still 
to hold to the distinction, STM and LTM 
share two important features: (a) Interfer- 
ence is a major determiner of forgetting in 
both, and (b) they both involve articulatory 
and semantic coding, although STM probably 
makes a greater use of the former. There re- 
main differences: (a) Maintenance rehearsal 
can occur in STM but not in LTM, and (b) 
STM is ordinarily viewed as a port of entry 
into LTM for new memories or a temporary 
holding template until a permanent memory 
is formed. On the basis of the human data 
and observations, however, there is little sup- 
port for the notion that STM is necessary 
for the firming of a physiological memory 
trace in the sense that Miiller and Pilzecker 
(1900) originally posited or that the passage 
of time alone in STM allows the trace to con- 
solidate, a conception that still prevails among 
animal physiological psychologists (Gold & 
McGaugh, 1975). Memories seem to be trans- 
ferred to LTM by “mental” operations, such 
as the various forms of coding that can occur 
with the speed of the neural impulse, and the 
source of the forgetting seems to be at re- 
trieval rather than storage. 

This conclusion is shared by Isaacson and 
Pribram (1975), who, in writing a summary 
for their book on the hippocampus, say, 


any simple, long-term memory consolidation hypothe- 
sis of hippocampal function based on the initial find- 
ings with human subjects has become untenable in 
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the light of subsequent analysis. Unfortunately, this 
hypothesis is still held by the majority of people 
not actively involved in hippocampal research. (p. 43) 


This is a conclusion similar to that of Weis- 
krantz and Warrington (1975) who say, "the 
amnesic deficit appears to be with mechanisms 
beyond the initial input into storage" (p. 
413). 

The quantity of surgical, chemical, and 
electrical intervention that has gone into ani- 
mal research on this topic has not been pos- 
sible with humans. It is time now to turn to 
animal research. There is, of course, no guar- 
antee that human and animal memory pro- 
cesses are the same or even similar, but since 
Pavlov's studies on the dog, marked com- 

monalities in learning have appeared, and it 
is not unreasonable to assume that the same 
will be true for memory. 


Evidence From Animal Studies 
General. 


Research on animal memory has been un- 
dertaken largely by physiological psycholo- 
gists whose basic purpose has been to discover 
those physiological processes that accompany 
and probably are an essential part of learning. 
Since learning is a relatively permanent 
change in behavior, a basic assumption is that 
a physical structure must be involved. It is 
assumed that a time period is necessary for 
the structure to form, but it is also clear that 
there is memory almost instantaneously fol- 
lowing stimulation. Therefore, some process 
intervening between stimulation and final 
structure has been assumed to be necessary. 
Hebb (1949) assumed that reverberatory elec- 
trical process could maintain the memory un- 
til the structure was formed. But most would 
agree that the electroconvulsive shock (ECS) 
studies convincingly demonstrate that this 
cannot be true. First of all, ECS sets up an 
electrical storm in the brain that totally over- 
whelms the kind of patterned electrical activ- 
ity posited by Hebb. In addition, there is an 
almost total isoelectric period following ECS 
that does not permit patterned electrochemical 
neural transmission. 
Hebb's is a two-process sequential model, 
with STM leading to and triggering the LTM. 
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Two-process parallel theories have also been 
proposed in which both STM and LTM aré 
initiated simultaneously at the time of learn- 
ing, but the STM drops out after a critical 
period, leaving LTM to carry the memory, 
Gold and McGaugh (1975) have proposed a 
single-trace but two-process memory system. 
The trace is formed immediately at acquisi- 
tion, but it fades and disappears unless some 
hormonal process follows that produces a 
memory fixation. Each of these alternatives 
is considered to be a form of consolidation 
theory in that each requires an initially fra- 
gile period for memory during which it is 
open to disruption or fading. 


Evidence for STM-LTM 


Duncan (1949) performed the classical 
study that showed that test performance was’ 
a function of the interval between a learning, 
trial and the administration of ECS; the closer 
in time the ECS is to the learning event, the 
less the test performance. This function was 
called “the gradient of retrograde amnesia” 
and is one of the most well-established facts 
in the memory literature. In an influential ar- 
ticle, Glickman (1961) reviewed the EC 
literature and concluded that this gradient 
reflected the period for the consolidation of 
the learning trace. Although there was spo 
radic disagreement about this conclusion (se 
Coons & Miller, 1961; Lewis & Maher, 1965, 
1966), it was vigorously reaffirmed by Ме 
Gaugh (1966). 

Methodologically, research moved from the 
use of multiple learning trials and multiple 
ECSs (Duncan, 1949) to single trials ай 
single ECSs. Single learning trials were U 
because the age of the learning trace Was 
made more determinate, and single ECSs welt 
used because data showed that whereas 80 
eral ECSs had marked punishing effects, 02 
usually did not (Hudspeth, McGaugh, 
Thomson, 1964). Experiment after experi 
has been reported over the past decade Ді 
which a “time-dependent” amnesia has beet 
found. The gradient varies with the intensif i 
of the reinforcer used in acquisition (Mah | 
Albert, 1973), with the intensity and du 
tion of the amnesic agent (Alpern & M ў. 


E 
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Gaugh, 1968; Buckholtz & Bowman, 1972), 
'and with other experimental conditions (Cher- 
kin, 1969). Amnesia gradients have been 
produced with convulsant drugs (Pearlman, 
Sharpless, & Jarvik, 1961), hypothermia anes- 
thetics (Alpern & Kimble, 1967), as well as 
electrical stimulation of the brain. The most 
widely accepted conclusion has been that the 
effect is on the storage of information and 
that at least indirectly, the gradient reflects 
the time for memories to pass from a fragile 
phase to a more permanent one (McGaugh & 
Dawson, 1971). Originally it was believed 
that ECS totally disrupted the consolidation 
process and destroyed the memory (Luttges 
& McGaugh, 1967), but now consolidation is 
believed to slow down and speed up depending 
on a number of experimental conditions (Mah 
& Albert, 1973), and the disruption may leave 
a partial memory that can later be reacti- 
vated and become fully consolidated and per- 
manent (Cherkin, 1972; Kesner, 1973). 
Additional support for the consolidation 
theory came from studies in which a block 
in protein synthesis in the brain coincided 
with a memory block (J. B. Flexner, Flexner, 
& Stellar, 1963; L. B. Flexner, Flexner, & 
Roberts, 1967). Various antibiotics injected 
into the brain or administered subcutaneously 
were shown to disrupt the formation of new 
protein by as much as 95%, and a memory 
impairment for recently acquired information 
was produced as well. Since, it was conjec- 
tured, a new structure mediating LTM must 
involve protein formation, a plausible mecha- 
nism for permanent memory seemed at hand. 
Further, injections of the protein synthesis 
inhibitor cycloheximide (cvcro) before learn- 
ing did not prevent normal initial acquisition 
Dor did it interfere wth memory until several 
hours (3 to 6) had passed (Barondes & 
Cohen, 19682, 1968b), by which time severe 
memory decrements had been observed. Thus, 
CYCLO seemed to be acting specifically on 
LTM while leaving STM intact. A similar 
Conclusion may be drawn from data showing 
that memory is intact shortly after ECS 
(909 & Jarvik, 1968) but disappears rapidly 
hereafter. Originally it was believed that ECS 
ор the consolidation of STM, but these 
ater data are interpreted to mean that STM. 
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is left intact, and the interference is with 
LTM or with the transfer from STM to LTM. 

The locations of physical sites in the brain 
for these effects have been pinpointed. Am- 
nesia can be reduced if subseizure electrical 
current is administered to the hippocampus 
(Kesner & Connor, 1972, 1974), the caudate 
nucleus (Wilburn & Kesner, 1972), the sub- 
stantia nigra (Routtenberg &  Holtzman, 
1973), the thalamus (Mahut, 1962), the mid- 
brain reticular formation (Glickman, 1958), 
and the amygdala (McDonough & Kesner, 
1971). (See McGaugh & Gold, 1976, for a 
review of these studies.) In each case the am- 
nesia was time dependent. Bloch (1976) pro- 
vides evidence for the notion that consolida- 
tion can be speeded up as well as slowed down. 
Time-dependent phenomena are again preva- 
lent and time dependency is widely deemed 
to uniquely reflect the progressive formation 
of a new permanent memory structure and the 
fading or decay of an STM. 


Evidence Against STM-LTM 


Some data causing difficulties for the tra- 
ditional consolidation of a STM into LTM 
are reviewed, and some of the newer theoreti- 
cal notions are introduced. The negative data 
concerning the dichotomy between STM and 
LTM are treated more extensively than the 
positive, since that dichotomy is less widely 
accepted. 


General 


The supreme difficulty with the early con- 
solidation theory (McGaugh, 1966) was that 
it failed to take sufficient notice of the learn- 
ing-performance distinction (see Lewis, 1969). 
Thus, the behavioral deficits produced by 
amnesic agents administered at the time of 
learning were believed to be a direct reflec- 
tion of the consolidation engram. Insufficient 
attention was paid to the many reasons why 
animals fail to perform other than that they 
no longer have any memory for what was 
learned. As a consequence, the RA gradient 
was believed to reflect the length of time dur- 
ing which STM remained open to disruption. 
This was generally believed to be a matter of 


1062 DONALD J. LEWIS 


seconds or a few minutes at most (Glickman, Reminder and Recovery 
1961), and since the engram was a chemical- 
electrical-physical event encased in a relative 
constant physiological environment, its con- 
solidation time was assumed to be relatively 
invariant. This latter assumption turned out 
to be far from the truth. Consolidation gradi- 
ents had great variability from approximately 
10 seconds (Chorover & Schiller, 1965) to 
6 hours and more (Kopp, Bohdanecky, & 
Jarvik, 1966). It became necessary to con- 
sider the empirical RA gradient as reflecting 
only the period of susceptibility to disruption 
of the memory and not as the actual con- 


Another difficult problem for the storage 
failure point of view rose when the “re. 
minder” studies began to appear (Koppen- 
aal, Jagoda, & Cruce, 1967; Lewis et al, 
1968). The basic assumption of these studies 
was that ECS served to inhibit or block the 
retrieval of memory rather than disrupt its 
formation. This approach assumed that 
memories endured through ECS, but access 
to them was temporarily lost. If so, then a 
reminder—a portion of the original learning 
situation—should return them to expression, 


solidation time. The reports that the consoli- Miller and Springer (1972) explored the 
dation time could be reduced to less than .5 "¢™inder effect in detail and found it to be 
sec (Lewis, Miller, & Misanin, 1968, 1969) independent of the ECS-reminder interval or 
if the animals were well familiarized with the the reminder-test interval. Also, it occurs in 
learning environment before the introduction iy situations as well as those of 
of footshock (FS) and ECS were often over- *V0!cance. 
looked. (See LN ie & McGaugh, "19696, A significant extension of the reminder 
ior а fale to reps; and eer, 465 was reported by Quartermain, McEwen 
mi isanin, 1973; Jensen & Riccio, 2" euo us 
1970; Miller, 1970; «m & Stikes, 1960, bring about a deep inhibition of protein syn- 
among others, for positive instances.) Lewi thesis and an amnesia, Nevertheless, а re- 
et al. (1969) Та that learning rna minder stimulus produced a memory return, 
with the speed of the neural impulse and that which suggested strongly that whatever the 
ECS as ordinarily administered must be af- effect the inhibitors of protein synthesis had 
fecting a retrieval process, not a storage one, 0 memory, it did not block its formation. 
If amnesia is due to a retrieval failure, then Tt must be understood that the effect of n 
RA gradients are not informative abo . drug on memory depends on many condi 4 
solidation time or memory а Е IE eo tions other than the drug itself, such as the 
Attributing the action of an amnesic agent 511618 of the memory and the test condi- 
that is administered at the time of ailh tions. Nevertheless, the role protein synthesis 
to a retrieval process has seemed illogical to UN plays in memory Mes ai 
experimental agents acting at acquisition mories can be fully rec 
must be on acquisition processes and that even when protein synthesis is inhibited over 
can be on retrieval processes. Many human memories can also be re | 
ов аут, UNE have long i Ee. Be kt 
elieve at interpolated learning has its г demonstration о 15 Я 
effect оп retrieval mechanism, even though гош, Меуег, апа Меуег (1966) M M 
the ae pn closely follows the bus ^ и No pen Me а 
original learning, and the test is much later. ürpation of the posteri Е 
Interpolated learning may not work in the of the brain but was relearned rapidly, e | 
same fashion as an amnesic agent, but the TOR to controls, on injections of pr 
point is that the proximity of the forgetting phetamine. Adams, Hoblit, and 


i : 19 ession 
agent to original learning need not demand mr К у х physostif 
that any memory deficit be due to storage mine, Roberts Ei ава Flexner (1970) | 

3 » 


failure. Serota, Roberts, and Flexner (1972), 9" (| 
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Botwinick and Quartermain (1974) also have 
ihduced memory recovery pharmacologically 
following a memory lapse due to protein syn- 
thesis inhibition. Perhaps the most important 
recoveries have been reported by Rigter and 
Van Riezen (1975). (See Barraco & Stettner, 
1976, and Meyer & Beattie, 1977, for more 
detailed reviews.) The recovery of memory 
following protein synthesis inhibition shows 
that there could not have been a permanent 
loss of LTM even though memory seems to 
‘disappear approximately 6 hours after ac- 
quisition. 

Interestingly, protein synthesis inhibition 
compounds also reduce the amount of avail- 
able norepinephrine (NE) and other biogenic 
amines, and Serota et al. (1972) proposed 
that a NE deficiency lies behind the am- 
nesias. If so, Quartermain (1976) has shown 
that the effect is on memory retrieval and 
not on its formation in that he has found a 
spontaneous recovery of a T-maze memory 
for which amnesia had been produced by the 
inhibition of NE synthesis. Barraco and 
Stettner (1976) conclude, “we believe that 
the leading hypothesis at present should be 
that antibiotics block or impair retrieval 
Processes providing access to memories for 

„specific aspects of training" (p. 271). They 
add that puromycin may be an exception to 
this conclusion. 

The fact that memory either recovers 
Spontaneously, recovers on reminder stimu- 
lation, or is recovered pharmacologically in- 
dicates that the amnesias are not due to the 
failure of memory formation, either short 
term or long term. Reminder stimulation has 

hot been tried in all situations, and there- 
fore this conclusion must remain tentative, 
‘but the body of evidence on amnesia strongly 
Suggests that a retrieval mechanism is in- 
volved and that a multiple-stage memory 
formation theory is not necessary for cur- 
rently existing facts. 


- Cue-Dependent Amnesia 


" oe block of evidence that runs coun- 
ee n. two-process memory conception 
je m studies on cue-dependent am- 
ds " lese studies Suggest a different vari- 

memory processing of a cognitive 
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type (Lewis, 1976), which is discussed later. 
The first cue-dependent amnesia experiment 
was that of Misanin, Miller, and Lewis 
(1968; but see Schneider & Sherman, 1968, 
for a similar experiment with a different 
point of view). First, the animals learned 
an avoidance response cued to a tone. 
Twenty-four hours following learning, after 
any consolidation process should have been 
completed, the animals were returned to the 
learning apparatus, the tone was presented, 
and it was immediately followed by ECS. 
Twenty-four hours after treatment, the ani- 
mals were returned for testing, and those 
that had received ECS in the presence of 
the learning cue were amnesic. The impli- 
cations of this study were considerable. It 
suggested that the age of the memory was 
not an essential determiner of whether or 
not amnesia would be produced by ECS 
and that amnesia was not dependent on a 
memory's early fragility. An immediate im- 
plication was that the RA gradient was not 
conclusive evidence of consolidation failure 
or even of a susceptibility to storage dis- 
ruption (Lewis, 1969). The Misanin et al. 
(1968) study also showed that ECS had to 
be given in the presence of a learning cue 
from the learning situation for memory to be 
disrupted, and for this reason the effect has 
been called cue-dependent amnesia. 

Misanin et al. (1968) speculated that the 
presence of the cue served to reinstate the 
memory and that the simultaneous presence 
of ECS with the memory served to inhibit 
or block the later expression of that memory. 
In other words, the memory must be evoked 
for it to be blocked by ECS and presumably 
by other amnesic agents. This reasoning led 
them to make a distinction between active 
and inactive memories. Learning always oc- 
curs in the presence of specific cues and a 
contextual environment, and to the extent 
that these cues and context occur again the 
memory will be reinstated. Thus, memories 
are active under at least two conditions: (a) 
during original learning and (b) during re- 
instatement. Under both conditions the mem- 
ories are open to disruption by ECS. Since 
memories that are not reinstated are least 
disturbed by ECS, they were believed to be 
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in a different state, a state of inactivity or 
passivity. Active memories could be dis- 
rupted; inactive memories could not. 

This experiment has been essentially rep- 
licated by many researchers (although see 
Dawson & McGaugh, 1969a, for a failure 
to replicate). Davis and his colleagues were 
simultaneously reporting a similar finding. 
Davis and Klinger (1969) extended the in- 
terval over which potassium chloride, puro- 
mycin, and acetoxycycloheximide could pro- 
duce amnesia by leaving their subjects in the 
conditioning chamber. Confinement in the 

training apparatus extended the effective 

amnesia interval in experiments by Davis 
and Hirtzel (1970) and Potts (1971). 

Schneider and Sherman (1968) reported a 

similar finding. Meyer and his students 

(Howard, Glendenning, & Meyer, 1974; 

Howard & Meyer, 1971; Robbins & Meyer, 

1970), using a multiple-choice maze, gave 

animals three different successive problems, 

each associated with its characteristic cues, 

They reported amnesia for problems whose 

cues were paired with ECS, regardless of 

the age of the memories, which were fre- 
quently 3 weeks old. A similar cue-dependent 
amnesia has been demonstrated many times 

(DeVietti & Holliday, 1972; Gordon & Spear, 

1973а, 1973b; Lewis & Bregman, 1973; 
Lewis, Bregman, & Mahan, 1972). Finally, 
DeVietti and Kirkpatrick (1976) and Gor- 
don (1977b) have shown a typically RA 
gradient in a cue-dependent amnesia situa- 
tion, which confirms the inference that time- 
dependent vulnerability of memory is not 
uniquely associated with storage. Of course, 
the time dependency of reactivated memories 
may be due to a different mechanism than 
for a new memory, but it is unparsimonious 
to believe this until data supporting two 
mechanisms are presented. Gordon (1977a) 
has shown that the reactivated gradient is 
shorter than a new gradient, but he does not 
believe this finding is support for two dif- 
ferent mechanisms. 

The reactivation situation has also been 
used to show the kind of memory enhance- 
ment that has previously been demonstrated 
when certain drugs are administered soon 
after a learning experience. Gordon and 
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Spear (1973b) showed that strychnine suli 
fate could enhance a well-aged memory | 
and only if the memory were reactivated 
immediately prior to drug administration, 
and Gordon (1977b) has shown that thi 
effect is time dependent. Finally, Landfield 
McGaugh, & Tusa (1972) found an electri 
encephalogram (EEG) 6 wave that appeared 
immediately following learning, and the 
interpreted the Ө as an indicator of a mem 
ory consolidation process. Nicholas, Gal 
braith, and Lewis (1976) confirmed thei 
empirical finding but found the © to occu 
also upon reactivation of the memory. Тһе 
interpreted the 6 as an indicator of memor| 
activity. 


Conclusion 


АП of the data presented in this section 
argues strongly against the standard two 
process sequential model in which STM pre: 
cedes LTM and is a necessary holding tent 
plate for the LTM. But there are other 
forms of a two-process model (see McGaugh 
& Dawson, 1971). For example, STM ant 
LTM could both be initiated at the onset 0 
learning and could parallel each other until 
STM drops out, while LTM continues. This 
is a parallel rather than a sequential two: 
process theory. The basic point being mad? 
here, however, is that memories can be 16 
covered following amnesia, regardless o 
whether the amnesia is conceived of as # 
failure of STM, LTM, or the transfer be 
tween the two. There are, of course, обе 
conceptions of memory formation than have 
been considered here, but they are variants 
on a general theme. The theme is that fol- 
lowing an experience, new memories a! 
fragile and will decay unless they аге eithé! 
held by rehearsal in STM or transforme 
into something more permanent. This transi 
ference into LTM takes place typically 25 ^ 
function of time or of time and something 
else, for example, a nonspecific physiolos!® 
response, usually adrenergic (Barondes 5 
Cohen, 1968a; Gold & McGaugh, 197 ) 
Various arguments and data have been © 1 
to show the inadequacy of these concepti? 
of an initially fragile STM or LTM. B° 


intuitive experience and experimental daly. 
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indicate that memories are formed almost in- 

| stantaneously following an experience and 

that these memories are relatively permanent, 

are surprisingly resistant to simply decay, 

and seem to yield only to forms of inter- 

ference and competition that affect retrieval. 

Several powerful guns have recently fired 

at the consolidation-storage hypothesis. Weis- 
krantz and Warrington (1975) say: 


«But the lack of evidence for a defect of “con- 
-solidation" or long-term “storage” in animals is no 
longer an embarrassment, because that hypothesis 
appears to be no longer able to account for the 
amnesic defect in man. (p. 425) 


Meyer and Beattie (1977) state: 


However, the argument that interventions which 
have time-dependent actions upon newly-formed 
habits produce their effects because they interfere 

* with labile traces has been tested, and found 
wanting. The argument that interventions which 
produce complete and lasting impairments of re- 
tention must therefore have interfered with mem- 
ory formation is not now, nor has it been for almost 
à decade, worthy of serious belief. (p. 154) 


Apparently, memory can be formed within 
у a time span of less than a second, and it is 
x reasonable to believe that this may be typi- 
cal of most memories, There is presently no 
reason to believe that a new memory is any 
more fragile than an established one. A 
Physiological process must be found that ac- 
commodates such a brief memory formation 
time span, and we have already seen that 
Teverberating neural firing is not the answer. 
The synthesis of new protein also probably 
does not provide a mechanism for the rela- 
* tive immediate formation of memory. Squire 
(1975) proposes that under the simplest of 
circumstances, at least 1 minute is required 
for the synthesis of protein and its trans- 
Port to a synaptic site. A longer interval 
Would be required if the prior synthesis of 
MRNA were necessary, This does not deny 
а role for protein synthesis and the growth 
5 Synaptic knobs (Lynch, Deadwyler, & 
о 1973; Rutledge, 1976) as a support 
Or frequently evoked memories, but it does 
n that these mechanisms are necessary 
, lor the formation of permanent memories. 
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The possibility that learning is due to 
changes in synaptic neurotransmission re- 
mains strong, but a long time-dependent 
structural change is not necessarily required 
for such changes to take place. They can, 
in fact, be rapid, almost as rapid as the 
elaboration of chemicals at the synapse or 
the combination of protein fragments or the 
addition of a carbohydrate to a neural mem- 
brane protein, forming a glycoprotein (Bo- 
goch, 1968, 1973). Kety (1970) has proposed 
a model that suggests how biogenic amines 
could participate in the learning process. He 
proposes that novel or surprising stimuli 
cause the release of the appropriate biogenic 
amines throughout the central nervous sys- 
tem and that these amines serve to increase 
firing probability in neural systems that are 
active and perhaps decrease activity in 
others. A specific pattern of neural firing 
then comes under the command of the en- 
vironmental events existing at the time. The 
pattern of neural firing is the memory, and 
it is reactivated when the stimuli are pre- 
sented again. 


Active—Inactive Memory Distinction 


Active memory (AM) is considered as a 
changing subset of all permanent memories 
possessed by an organism. At any given time 
many of the permanent memories, which have 
the potential for being active, are in a rela- 
tively inactive state and have little effect on 
current behavior. A rat, for example, who 
has learned a complicated T maze has the 
potential to perform correctly in that maze 
even though, at a given moment, it is eating 
from a trough in a living cage in a room that 
is different from the one in which the learn- 
ing occurred. Also, a memory may be active 
without having an observed effect on current 
behavior. A rat may be in the start box of 
the T maze and remain stationary for any 
one of a number of reasons (e.g. low mo- 
tivation, sickness) other than the failure to 
reactivate a memory. Although we speak of 
active memory (AM) and inactive memory 
(IM), different stores or locations are not 
implied; it is doubtful that AM occupies a 
specific site or sites in the brain. It seems 
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better at present to conceive of AM as a 
patterned state of neural firing that is not 
specifically localized, although different mem- 
ories that are active undoubtedly reflect dif- 
ferent densities of firings in different parts 
of the brain. It is likely that each evocation 
of an AM will be slightly different from each 
other evocation, although there must be con- 
siderable similarities, or the memories could 
never be identified as the same (John, 1972). 
In each human, AM is intuitively appar- 
ent; “immediate consciousness" is a part of 
it. In lower animals AM is not intuitively 
apparent except through anthropomorphic 
analogy. Positing an entity such as AM for 
rats is scientifically dangerous and certainly 
far outside the standard stimulus-response 
(S-R) tradition. Nevertheless, there are 
precedents for the application of cognitive 
concepts to rats and for their experimental 
manipulation. Tolman (e.g. 1932, 1949) was 
à biologically oriented psychologist whose 
use of behavioral-based but cognitive con- 
cepts such as "purpose," "cognitive maps," 
and “memorial lore" makes interesting read- 
ing today. Although Tolman did not use the 
concept of AM, his “vicarious-trial-and- 
error” (VTE) is instructively similar. For 
Tolman, VTE was most apparent at a choice 
point and was evidenced by the back-and- 
forth movement of a rats head before it 
committed itself to one of the alternatives. 
In Tolman's view, the rat was weighing the 
alternatives, cognitively trying to arrive at 
the correct decision. He was able to quan- 
tify the VTE to some extent. The amount of 
VTE was a direct function of the difficulty 
of the problem, For easy discriminations 
VTEing occurred during the early trials and 
dropped out as learning proceded. For more 
difficult problems there was little early VTE- 
ing; the greatest amount of VTEing occurred 
just before and at the time of the solution 
to the problem and continued some time 
after. For the most difficult discrimination, 
VTEing was never greatly reduced after 
solution, even with a great amount of over- 
learning. Apparently, the animal had to con- 
tinue to weigh alternatives on each trial. It 
may well be that VTE can be used as one 
index of AM. 
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Human. Conceptions 


The concept of STM as an entry point | 
for new memories and their transfer to LTM 
goes back at least to James (1890). How- 
ever, the concept of AM, which is simply an 
aspect of the total memory process into 
which memory can both flow from IM and 
serve as a point of entry for new memories, 
is, seemingly, new. The first mention of an 
active-inactive distinction may well have 
been made by Posner (1967), who proposed: 
further that interference and forgetting could 
take place in AM. He briefly summarizes an 
experiment that was negative to the concept 
of AM as an arena of forgetting, but he de- 
veloped the conception of AM in more detail 
later (Posner, 1973). Certainly, as is dis- 
cussed later, AM as a process in which for- 
getting occurs needs much more experimental | 
testing. | 

The distinction between AM and IM also! 
developed through the theorizing of Atkinson | 
and Shriffin. In their earlier treatment (1968) 
they still held to the standard notion of 
STM that served only as a point of entry and | 
processing of new learning. In Atkinson and | 
Shriffin (1971), however, they conceived of 
STM as a processing point for both new 
learning and for retrieved memory from. 
LTM. For them, AM was the conscious ace 
tivity of memories, regardless of whether 
they were old or new. It was the control 
center for rehearsal, coding, and imagining, 
for all cognitive activity. This is an impor- 
tant theoretical treatment of memory with 
a sophisticated discussion of AM. 

A similar treatment of AM as the site for $ 
the processing of retrieved memories as well 
as for new learning is that of Craik and 
Lockhart (1972). They suggested that dif- i 
ferent levels of processing occurred in AM] 
with the deeper semantic levels laying dow? 
more effective and retrievable traces. Bad- 
deley and Hitch (1974) also have conceive 
of AM (working memory) for both storag 
and further processing of retrieved memories: 
They show that several cognitive process 
(e.g., free recall) are made more difficult 
with additional simultaneous memory 87 
tivity. This is because of the limited capacity 
of AM, which as the “focus of attention, 
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can perform only a circumscribed set of pro- 
| cesses at one time. Bjork (1975) has pre- 
sented a similar conception of AM, as have 
Shiffrin (1976) and Bower (1975), among 
others. Thus, a concept of AM as a central 
processor of both new learning and of re- 
trieved memories now exists. It is similar to 
conscious awareness, attention, and even 
thinking as far as humans are concerned, 


| Animal Conceptions 


For lower animals, the first distinction 
between AM and IM apparently was made 
by Misanin et al. (1968), who showed that 
rats could be made amnesic for old memories 
if, they hypothesized, the memories were 
active at the time of ECS, whereas IMs were 
relatively immune to the amnesic treat- 
ment, This distinction has been explored 

) further by Lewis (1969, 1976), Spear 
(1976), and by Lett (1978). These theo- 
retical and experimental developments are 
considered in the next sections. 


Entry Into Active Memory 


| Two kinds of events can be distinguished 
| for AM by their route of entry into AM. 
sFirst, memories can be created anew from 
external stimulation and, second, memories 
may be reactivated later when these stimuli 
are again present. Because entry into АМ 
as а new memory or reentry as an old one 
usually entails some processing of the sort 
discussed later, it is impossible to separate 
entirely the new entry function from the 
other, and some overlapping of these func- 
[ron cannot be avoided, 


New Memories 


Each stimulus activating a receptor is con- 
Verted into neural impulses and under nor- 
ma] Conditions, occasions brain activity that 

0151515 of a representation of the stimulus 
Complex, and that representation is one 
example of an AM, These representations 
Simultaneously contain information about 
гу mode ог modes, context, temporal 
Trangements, color, shape, and other prop- 

Me and relationships among stimuli. In 
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total, these make up the attributes of mem- 
ory (Underwood, 1969). 

Memory is considered to be active for 
animals whenever the organism is alert and 
patterned stimuli impinge on the sensorium. 
Under these conditions it is assumed that the 
organism is either learning or retrieving. 
(See Thistlethwaite, 1951, for a review of the 
latent learning literature, and see Spear, 
1973, for a discussion of the role of the 
stimulus and the context upon retrieval.) 
Reinforcement is not necessary for learning, 
but a reinforcer does serve to focus attention 
by pointing out to the organism that some- 
thing specific is to be learned. If there is 
no reinforcement, they will be learning any- 
way, but it will be more difficult for the ex- 
perimenter to discover what has been learned. 
This also means that a nonlearning control 
group for those who are searching for the 
physiological correlations of learning cannot 
be simply one for which reinforcement is not 
administered. After the organism has learned, 
the stimuli that were salient (significant, 
noticed, reinforced, emphasized) during 
learning can be presented again, and the 
memory of the learning will occur. Since the 
experimenter's notion of what the salient 
cues are may not also be those of the animal, 
representation (or reinstatement) of the cues 
may not always reactivate. A distinction be- 
tween nominal and functional cues is im- 
portant. 

It would be helpful to determine the sim- 
plest possible memory. Learning is frequently 
considered to be an association of two or 
more items, stimuli, or events, but it be- 
comes difficult at times even to know when 
two items are present. If, for example, the 
stimulus object is a square, the top half of 
which is one color and the bottom is another 
color, is the stimulus integral or separable 
(see, e.g, Blough, 1972; Gardner, 1970; 
Leith & Mahi, 1977)? Nevertheless, items 
vary in complexity, and it is possible to con- 
ceive of the simplest item along a complexity 
dimension. This kind of primitive conception 
will have to do for analytical purposes at 
present. An item of learning will be the 
conceptual simple unit. When items are re- 
lated, an association is said to be formed. 
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It is assumed here that new simple memo- 
ries have two important properties, proper- 
ties that will appear startling to some: (a) 
They are formed almost instantaneously, and 
(b) they are relatively permanent. Neither 
one of these properties can be proved em- 
pirically, at least at present. Still, they are 
not empty properties, for they have conse- 
quences that can, at least to some degree, 
be empirically tested (see Lewis, 1969, 1976; 
Miller & Springer, 1973). 

There are two fundamental sets of data 
that seem to contradict the notion that mem- 
ory formation can be almost instantaneous. 
First is the retrograde amnesia gradient 
which implies that a newly formed memory 
becomes less susceptible to disruption over 
time, which, as we have already shown, re- 
quires reinterpretation, and the second is 
the typical learning curve that implies a 
gradual improvement over trials. t 

The learning curve seems to require an 
incremental process, proceeding bit by bit 
to an asymptote. But this empirical curve 
must be separated from the underlying theo- 
retical process causing the curve. Although 
Hull (1943) assumed underlying incremental 
strength in habit for the learning curve, 
Guthrie (1935) did not. Guthrie assumed 
that learning occurred all at once and im- 
mediately whenever a response occurred in 
the presence of a stimulus. Learning was 
thus a one-trial affair. The incremental learn- 
ing curve arose because the stimuli varied 
from trial to trial, and typically a number 
of trials were required for the response to 
become attached to a sufficient number of 
stimuli. Estes (1950) has formalized this 
assumption, which remains a common and 
important one (Bower, 1975). Similar one- 
experience learning assumptions have been 
made by those studying amnesia (Irwin, 
Banuazizi, Kalsner, & Curtis, 1969; Lewis, 
1976). Recent treatments of learning (see 
Bower, 1975) are not concerned with the 
relationship between a stimulus and a re- 

sponse but with the relationship between a 
stimulus and the contents of AM. Since the 
contents of AM are at least as variable as 
the stimuli, another reason for an incremental 
learning curve is present, even assuming im- 
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mediate and permanent learning. The 4 
here is that neither the incremental learning! 
curve nor the decremental amnesia gradient | 
requires a theoretical growth or decay 
process. 

Much learning, of course, involves con- 
siderable complexity and many items and 
therefore requires more experience and more 
trials than does the simplest learning. Ob- 
viously, it will take longer to learn a T maze 
with 10 choice points than a maze with only 
1. Multiple-unit learning requires that each 
unit and the relationship among the units 
be learned. Complexities of this type have 
led biological researchers to employ the sim- 
plest types of learning situations possible. 
Typically, two have been selected: Pavlovian 
conditioning and single-trial passive avoid- 
ance learning. Each of these probably still 
remains complex, relative to what the sim- 
plest possible learning situation could be. 
Even so, the evidence suggests that an ani- 
mal can learn quickly that a footshock (FS) 
has occurred in the passive avoidance situa- 
tion (Lewis et al, 1968) as shown by its 
avoidance of the FS location on the next 
trial. The animal may learn only that an FS | 
has occurred, or it may learn also that the 
source of the FS was the grids on the floor, 
something about the intensity and quality of 
the shock, what happened before the shock, 
and what happened after the shock, all of 
which will greatly aid in the retrieval of 
the memory about FS. Such further learning 
will take more time than the simple learning 
that an FS occurred, but each item of learn- 
ing is assumed to occur rapidly and to be 
permanent. 


Established Memories 


Perhaps equal in importance to learning, 
as a source for AM, although frequently 
neglected, is IM. It is clear that memories 
have the potential to return to expression 
long after original learning. It is argue 
all contents in AM share similar prope 
that is, a reactivated old memory is similar 
in many respects to a newly created 3 

When stimuli from original learning a 
Dinge on sense organs, and the organism б à 
tends to them, perhaps as indexed bY i 


rtie5, 


ACTIVE AND INACTIVE MEMORY 


orienting response, a characteristic pattern 
bf neural firing is reinstated, and in some 
fashion this reinstated firing represents the 
memory. This kind of conceptualization is 
different from a traditional S-R paradigm. 
Adams and Lewis (1962), for example, in 
the traditional format, proposed that ECS 
served as an unconditioned stimulus (US) 
in the classical conditioning sense and that 
the conditioned competing response evoked 
by ECS replaced those that had previously 
‘been learned in an avoidance situation. The 
replacement of the original learning appeared 
to be an amnesia, This interpretation was 
soon found to be inadequate for the single- 
ECS situations to which researchers turned 
because no evidence of a conditioned con- 
vulsion could be detected with only one ECS 
administration (Paolino, Quartermain, & 
Miller, 1966). Once freed from the bondage 
dg thinking in terms of traditional peripheral 
responses, it was possible to think in terms 
of events and memorial representations. 
Whereas Adams and Lewis thought of com- 
peting responses, and Lewis and Maher 
(1965) proposed the inhibition of peripheral 
responses, Misanin et al. (1968), in their 
Cue-dependent amnesia experiment, thought 
of the inhibition of memories. 
» Because of the imprecise relationship—to 
the experimenter—of memories to stimuli 
and responses, it is not always possible for 
the experimenter to present the exact cues 
that will reinstate the memory. Garcia and 
his colleagues (e.g., Garcia & Koelling, 1966; 
Garcia, Kovner, & Green, 1970; Garcia, Mc- 
Gowan, Erwin, & Koeuer, 1968) have force- 
; fully illustrated that all stimuli are not equiva- 
lent in their ability to initiate memories. 
Ordinarily, for an experimenter to test success- 
fully for memory, either the associated events 
had to occur in close temporal contiguity or 
the animal had to maintain a physical ori- 
entation to the stimuli (Grice, 1948; Hunter, 
1913; Spence, 1947). But Garcia and his 
colleagues showed that if the stimulus is a 
Novel taste and the response is illness, then 
even hours may separate the two and still 
the rat will avoid the novel stimulus the next 
time it is presented, This phenomena is dif- 
k" to formulate in standard learning terms 
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because of the long time interval between 
the novel taste and the induced illness. Nor 
does the concept of preparedness (e.g., Selig- 
man, 1970) provide a satisfactory answer 
to the basic question: How does the rat 
bridge the gap between the novel stimulus 
and the sickness? There are several possibili- 
ties: 

1. Novel tastes create a pattern of neural 
firing that endures long enough so that the 
firing pattern from the illness becomes con- 
tiguous with that from the taste, and an 
association is formed, If so, the taste pattern 
has to subsist through hours of other experi- 
ences and thus many other patterns of firing. 

2. Illness evokes past taste sensations and 
novel tastes are the most salient with illness, 
and thus there is a contiguity between the 
memory of taste and the illness. 

3. Taste aversions in the rat are a special 
kind of learning for which the animal is 
innately prepared, and it is simply outside 
the ordinary laws of learning. 

4. Illness creates a phobic reaction to 
novel tastes in general (Mitchell, Scott, & 
Mitchell, 1977). 

Data that distinguish among these alterna- 
tives are not yet available, but if learning 
is involved, it is difficult to avoid the neces- 
sity of a relative contiguity between the asso- 
ciated events, which means that illness prob- 
ably reactivates the memory of the novel 
taste, and the association is formed in AM 
between the reactivated event and the new 
event (Lett, 1973, 1978). 

In summary, it seems that memories not 
only enter AM from IM but are produced 
from the external world in a new state. If 
so, then a very important question is to 
what extent are reactivated memories like 
new memories? 


Comparison Between New and 
Established Memories 


It has frequently been shown that the dis- 
ruption of new memories by amnesic agents 
is time dependent, producing the RA gradi- 
ent. DeVietti and Kirkpatrick (1976) showed 
a similar time dependency for reactivated 
memories, as has Gordon (1977a, 1977b), 
although Gordon found the effective gradient 
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shorter for the reactivated memory than for 
the new one. Because gradients have been 
found to be so tremendously variable (see 
Chorover, 1976) even for new memories, 
this difference is probably not significant. 
Time-dependent gradients have been pro- 
duced for reactivated memories by electrical 
stimulation of the brain through implanted 
electrodes (DeVietti & Kirkpatrick, 1976), 
by strychnine sulphate (Gordon, 1977b), 
and by other competing memories (Gordon 
& Spear, 1973b). An enhancement gradient 
for a reactivated memory has also been re- 
ported by DeVietti, Conger, and Kirkpatrick 
(1977). 

A retrieved memory may be different from 
what it was during original learning because 
some part of it may be obscured due to com- 
petition and interference, or it may have 
become a part of a different memory com- 
plex from when it entered memory. Because 
memories are dynamic, it is unlikely that a 
retrieved memory will ever be identical to 
the original, and the very act of retrieval 
will produce a change. It is also likely that 
there will always remain enough of this orig- 
inal memory in what is retrieved for the two 
to be recognizably similar. 

As suggested, the process of retrieval it- 
self has an effect on the memory, but it is 
far from clear what this effect 15, Cherkin 
(1970) has argued that retrieval amounts to 
additional learning and that the effect of a 
reminder cue is the same as the effect of an 
original learning trial. The effects of reacti- 
vation on the memory have still to be worked 
out in detail, but sufficient data exist which 
show that a learning trial and reinstatement 
or reactivation trial are not the same (Lewis 
& Nicholas, 1973; Gordon & Spear, 1973a). 
Reinstatement is the presentation of the cues 
that were present during learning but with- 
out a reinforcer. Clearly, this is the experi- 
mental operation for extinction as well as 

reinstatement, and extinction is not an addi- 
tional learning trial. There are other differ- 
ences between the two, however. For ex- 
tinction, many trials are repeated with a 
fairly short intertrial interval, and response 
decrement increases over trials. For rein- 
statement, typically, only one trial is given, 
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or if two or three trials are given (Spear 
& Parsons, 1976), they are widely distrib! 
uted. It is probable that similar memorial 
processes are involved for both procedures, 
but there is a difference in their conse- 
quences. In the case of extinction, the suc- 
cessive failures to reinforce indicate that 
the previously reinforced cues and the as- 
sociated behavior are no longer significant, 
Orienting and other alerting behavior are 
reduced. For reinstatement, presentation of 
cues without reinforcement is unexpected; 
producing orienting and rehearsal. The status 
of the cues as indicators of reinforcement is 
indeterminate. Also, because reinstatement 
usually occurs at a long interval following 
the last learning trial, there is reason to 
suppose that the memory has been degraded 
to some extent at the time of the representa- 
tion of the cues. The degradation can be a! 
function of the stimulus change and memory, 
interference that inevitably occur with time 
(Campbell & Jaynes, 1966), or it can be due 
to experimental procedures such as the ad- 
ministration of an amnesic agent. 

A first guess is that reactivation has 4 
much stronger effect, at least in multitrial 
situations, than does a single additional learn- 
ing trial at the time of learning, that is, à 
memory return from reactivation following, 
forgetting or amnesia is much more dramatic 
than the administration of an additional trial 
during learning (Campbell & Jaynes, 1966; 
Lewis & Nicholas, 1973; Spear & Parsons, 
1976). This suggests that the strong reactl 
vation effect is somehow a function of the 
partial memory degradation. There is prob- 
ably a definitional problem in distinguishing 
reactivation from the effects of distribute 
learning, although both could be on the same 
dimension as the beneficial effects of dis 
tributed practice (Hill & Spear, 1962; Hintz 
man, 1974) due to the recovery from the 
memory degradation that occurs during the 
intertrial interval, 


Processing in Active Memory 


In this section some of the processit | 
operations performed on both old and ad 
memories in AM are discussed. Since 5 | 
processing necessarily occurs both when 7 7 
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memories are formed and when established 
memories reenter AM, process similarities 
are inevitable. 

There are four functions of AM that are 
tentatively defined as follows: (a) to register 
new inputs, to note that an event has oc- 
curred; (b) to associate two or more new 
inputs to each other, as illustrated by classi- 
cal conditioning, the first phase of sensory 
preconditioning (Brogden, 1939), and when 
"one thing leads to another" (Tolman, 1949) 
fas in latent learning; (c) to associate new 
learning with already established and re- 
activated memories, It is assumed that when 
the established memory is reactivated in AM 
simultaneously with the learning, an associa- 
tion is formed, The second phase of sensory 
preconditioning is an illustration of this pro- 
cess. The first conditioned stimulus (CS;) is 
paired with the second (CS2) a number of 

|, times, and an association is established (first 

phase). The CS. is then paired with a US 
(second phase), bringing about an associa- 
tion of the US with the established memory 
of CS; and CS». Of course, the degree of as- 
Sociation and coding will vary with the 
different components. A more common ex- 

, ample of an association between a new and 
àn established memory is that of secondary 
reinforcement. CS, is first paired with a US. 

Then the established properties of CS; are 
paired with a new stimulus CS», producing 
à CS,-CS, association; and (d) to associate 
established memories with other established 
Memories, This occurs when two old memo- 
ries are reactivated simultaneously (Hearst 
& Peterson, 1973; Solomon & Turner, 1962). 

Such studies are commonly performed to 
show the effect of classical conditioning on 

‘instrumental responding and to illustrate 
how two established memories can affect 
each other even though they do not share 
à common response, 

AM Functions 1 and 2 have been the 
Object of much laboratory research, but 
‘Functions 3 and 4 have not received the 
attention they deserve, and they are prob- 
ably more interesting, They involve the study 

of memory structures and systems. Certainly 
these functions will soon attract the at- 
Lention of inventive researchers. 
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Postresponse Events 


In traditional S-R theory a learning trial 
was considered completed when the desig- 
nated response was over and reinforcement 
was administered. It is now known that there 
are post response events that are extremely 
important to the later use of the memory. 
Duncan (1949) made this point in his classi- 
cal study of amnesia when he demonstrated 
that a gradient was a function of the time 
interval from the termination of the learned 
response to the administration of ECS. Lewis 
and his colleagues (see Lewis et al., 1968) 
have long argued that the interval immedi- 
ately following a learning trial is used by 
the organism for cognitive processes that aid 
in the retrieval of the memory. Tolman 
(1949) gave an illustration of a postresponse 
cognitive process when he described an ex- 
periment by Hudson who found no avoid- 
ance learning to electric shock if the appa- 
ratus cues surrounding the shock were re- 
moved immediately following treatment. If, 
however, the cue remained so that the animal 
could look back “to see what happened,” 
as Tolman put it, it readily learned to avoid. 
Essential aspects of this study have been 
replicated by  Keith-Lucas and Guttman 
(1975). 

Wagner, Rudy, and Whitlow (1973) 
showed that if an unexpected stimulus was 
presented following a learning trial, reten- 
tion was impeded, and the interference was 
a function of the time interval between the 
learning experience and the unexpected stim- 
ulus. They believed that adequate learning 
required a posttrial rehearsal process that 
was prevented by the unexpected stimulus. 
Similar data have been presented by Terry 
and Wagner (1975) and Miller, Misanin, and 
Lewis (1969). Also for humans, Waugh and 
Norman (1965) showed that any new item 
can displace an earlier one if it is unexpected 
but that there was no interference from 
highly predictable and redundant items. 
Bartus and Johnson (see Bartus & LeVere, 
1976), with monkey subjects, found that the 
interference during postresponse processing 
was a function of the similarity of the post- 
response stimulus to the relevant learning 
stimulus, indicating a retroactive interference 
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phenomenon. These studies emphasize the 
importance of postresponse processing for 
adequate retrieval to occur. If this process 
is interrupted by surprising stimuli, by stim- 
uli that make it difficult to discriminate the 
important learning stimuli, or by the electri- 
cal stimulation of the brain, retrieval is in- 
terfered with. 

What is the nature of the postresponse 
rehearsal process? It would be convenient 
to conceive of it as a sort of perseveration of 
the original input or a sort of continuing 
repetition, Although something of this sort 
may occur, it is also likely that a more ac- 
tive and cognitive activity takes place. Lewis 
(1976) hypothesized two phases of learning. 
First, the organism registers that the event 
has happened—an FS, the intrusion of an- 
other organism, and so forth. Second, there is 
a form of interpretation of the event, a re- 
lating of the new to the established, that is, 
that it was painful and came from the floor 
or that it was female and in heat, and so 
forth. It is this elaboration, following the 
initial registration of the occurrence of the 
event, that occupies the postresponse time. 
(See Bower, 1972, 1975; Everett & Corson, 
1973; and Greeno, 1970, for a possibly simi- 
lar notion.) This is certainly what Tulving 
(1970) had in mind when he said, referring 
to humans: 


When a to-be-remembered unit is stored, some 
ancillary information about it is also stored with 
it. The storage of this ancillary information repre- 
sents what is referred to as “coding.” When some 
of this ancillary information (or the “code” of the 
to-be-remembered unit) is available at the time 
of attempted recall, the code serves as a retrieval 
cue. (p. 8) 


Sara, David-Remacle, and Lefevre (1975) 
also argue for a “core” memory that is blocked 
by ECS and that can be elaborated by further 
experience, returning the core memory to 
expression, Azmitia, McEwen, and Quarter- 
main (1972) found no amnesia if the sub- 
jects recovered from ECS in the training 
environment. They also argue for a core 
memory that is not disrupted by ECS and 
for a further learning stage, “which may 
involve integration of the experimental cues 
associated with the core trace with the ani- 
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mal’s existing memory repertoire of acce 
sible experiences” (p. 855). It may also bi 
similar to the distinction between item and 
order information (Murdock, 1974). The 
animal is trying to “make sense” out of what 
happened, to relate the event to the remain: 
der of its experience. The distinction is per 
haps also similar to that between the da 
and the address for the data in compute 
language. If no address is given for the data, 
their retrieval will be difficult indeed. 

If the target event is commonplace anf 
has been repeated many times, then the 
memory elaboration need not occur, oth 
than to record that the event has happened 
again. The event is immediately recognized 
as familiar and readily fits the memories 
(event representations) already in existence, 
If the target event is surprising or unex 
pected, the rehearsal process is engaged tor 
interpret the event, to fit it into an existim 
memory system, or to recognize and differ 
entiate the new event from other events 
These elaborative processes (rehearsal) takë 
place in AM, and since one of the importan 
characteristics of AM is its severely limited 
capacity, processing of other temporally neat 
events must be curtailed, and therefore еї 
retrieval will be reduced (Wagner et aly 
1973). These postresponse processes involve 
new and rapidly formed memories that ser 
the purpose of aiding the retrieval of the 
memory of the target event. The тоге 
event is distinctive and becomes part of af 
already existing memory system, the bettet 
is its retrieval (see Lewis, 1976). If post 
response stimuli are presented that are simi 
lar to the learning stimuli, retrieval is mad 
difficult through interference (Bartus & Le 
vere, 1976). 


= 


Preresponse Events 


It is proposed that conditions exist 
prior to the target event also have an p 
on memory processing. Lewis et al. (196 у 
showed that animals familiarized with | 
apparatus in which they were to receive à 
and immediate ECS demonstrated Ws 
amnesia. They reasoned that this was Б 
the properties and location of the 15 y 
easy to distinguish in the familiarized % 
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ironment, and the meaningful coding of the 
FS was simple and took less than .5 sec 
(Lewis, 1976). If an animal already knows 
“what leads to what” (Tolman, 1932) in 
an environment, the introduction of a new 
salient or biologically relevant stimulus is 
quickly assimilated (automatic search) into 
the “cognitive map” of the environment, 
leading to appropriate behavior. Lubow, Rif- 
kin, and Alek (1976) present interesting 
data on this effect. They show easy learning 
when a novel stimulus is used in a familiar 
environment and when a familiar stimulus 
is used in a novel environment as compared 
to using a novel stimulus in a novel environ- 
ment or a familiar stimulus in a familiar 
environment. This suggests that a familiar 
stimulus will not result in amnesia in a new 
environment. 
Whether or not animals learn without re- 
inforcement is no longer an issue; they do. 
Latent learning is discussed here as a form 
of familiarization with the environment that 
facilitates further learning when the experi- 
mental stimulus is introduced—either the 
reinforcer (Blodgett, 1929) or the FS (Lewis 
et al, 1968). The preresponse familiariza- 
tion builds a cognitive structure into which 
the new event is readily assimilated. The 
importance of these preresponse events is 
further illustrated by Mitchell et al. (1977) 
Who found that a long-delayed poison-in- 
duced aversion to novel visual stimuli in rats 
can be readily obtained if the animals have 
hàd extensive. familiarization (habituation) 
With the environment prior to the introduc- 
Hon of the novel stimulus. Without the fa- 
 miliarization, visual stimuli do not readily 
,Decome capable of eliciting delayed aver- 
sions, 

. The notion that prior familiarization facil- 
itates new learning in the familiarized situa- 
tion has human counterparts. Shiffrin (1976), 
who also takes the point of view that forget- 
ting is a retrieval problem, discusses the 
limitations on retrieval posed by the limited 
capacities of AM. He says that search in AM 
can be either automatic or controlled. 


Automatic search occurs when a stimulus or set of 
pud 5 sufficiently trained with respect to a 
given background (or distractor stimuli) that it 
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becomes associated with an automatic attention 
response that will cause the given stimulus to be 
searched or considered first, before competing stim- 
uli. (p. 220) 


Again, he says: 


Automatic search is facilitated by targets having 
distinctive simple general physical characteristics 
relative to the background (distinctive color, shape, 
orientation, location, etc.). In addition, automatic 
search can develop if any set of targets is trained 
sufficiently long with respect to a given back- 
ground. (p. 224) 


A similar notion has been presented by 
Sperber, Greenfield, and House (1973), who 
argue that an adequate representation or 
effective code of an item is more difficult to 
form when the item is unfamiliar than when 
it is familiar and already coded. 

In summary, the characteristics of AM, 
particularly its limitations on the number 
of items that can be active at any time, have 
a great effect on what is retrieved. New 
learning will be easily retrieved if it occurs 
in an environment that is familiar and that 
provides, through existing memory representa- 
tions of the environment, a memory context 
into which the new event may fit. If a new 
stimulus is introduced into a less familiar 
environment, more time will be required to 
fit the new learning into some existing mem- 
ory context. This may be better conceived 
of as a coding process rather than rehearsal, 
which implies a mere repetition or holding 
of memory items. Tests of this distinction 
can be made by manipulating both prelearn- 
ing and postlearning environments. 


Loss of Accessibility to AM 
Perceptual Stimuli 


Since stimuli are almost always impinging 
on receptors, there is an immense amount 
of incoming information. Each head move- 
ment brings new stimulation and new in- 
formation. It is clear that a great part of 
this new information, if it is different in 
any respect from memories, is almost im- 
mediately lost, and a basic question concerns 
the mechanism of the loss. There are two 
primary alternatives: 
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1. The consolidation process, which main- 
tains that new information either fades or 
is open to destruction for a period of time 
following its formation and that incoming 
information destroys current newly formed 
old information (McGaugh, 1966; McGaugh 
& Gold, 1976; Müller & Pilzecker, 1900). 
At the receptor image level, there is a basis 
for fading images (Sternberg, 1966), but 
this notion has been rejected previously in 
this article as the explanation for the for- 
getting of memories. 

2. The interference hypothesis, which 
maintains that the new items interfere with 
each other based on various properties such 
as similarity and the difficulty in discrimi- 
nating among many similar items as the 
organism moves through the environment. 
This seems a reasonable explanation even for 
the forgetting of the multitude of perceptual 
items and is identical to the retrieval hy- 
pothesis, 


Active Memory Forgetting 


Until 1969, all studies of experimental am- 
nesia had at least three properties in com- 
mon. One was the presentation of cues for 
new learning, the second was the administra- 
tion of a reinforcer, usually negative, and 
the third was the administration of the 
amnesic agent. Misanin et al. (1968) de- 
parted from the usual procedure by replac- 
ing the cues for new learning with cues for 
established memories that they paired with 
ECS and produced a memory decrement. 
They also presented ECS without the cue 
and obtained no memory decrement. Two 
interpretations of their data are plausible. 
One is that the ECS reduced the evocative 
power of the stimulus with which it was 
paired so that it was not, on a test, followed 
by the memory, a form of stimulus inhibi- 
tion; the other is that the memory (re- 
sponse) was blocked by the ECS. The ex- 
perimenters preferred the second interpreta- 
tion and concluded that ECS worked only 
on memories that were active and that mem- 
ory age was irrelevant. The conclusion is 
that AMs, or those in transition between 

inactivity and activity, are open to disrup- 
tion by amnesic agents, and IMs are not. 
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It is an interesting speculation that the 
AM disruption is not limited to amnesic# 
agents. Perhaps AMs are open to decrements 
due to all forgetting agents (Lewis, 1969), 
Perhaps IMs are protected from forgetting 
loss, the loss occurring only with memory 
activity. 

Only a little information relevant to this 
hypothesis can be found in the traditional 
animal learning literature because insu 
attention has been devoted to the study off 
forgetting. Spear (1976), whose own work; 
is an exception, could find no evidence that! 
Pavlov, for example, ever manipulated the) 
retention interval. Students of animal learn- 
ing traditionally did not use the concept of 
memory, and the word forgetting was almost 
totally absent from their vocabularies. “Re 
sponse elimination” was their approximation’ 
of this problem, and most of the procedures | 
were those of extinction. If a learned re 
sponse was repetitively evoked without ге 
inforcement, it would eventually extinguish. 
This, however, was more likely a form of 
discrimination learning (discriminating non- 
reinforced from reinforced conditions) rathet 
than forgetting, which operationally зато 
a retention interval between the end of 
learning and the test. | 

The paucity of animal data is a contras 
to that existing for humans. Without 80105 
into the wealth of detail existing in the study 
of human forgetting, an attempt is made 1? 
summarize the most important determine 
of forgetting in human memories, with the 
view of determining the possible relevance of 
this body of information to the kind of АМ 
forgetting conceptualized here. 

First are cue and context changes (Tus 
ving, 1972). Memories are formed 11 1 
stimulus environment, and these stimuli ca 
later serve to reactive the memories. To the 
extent that the cues and context changi 
forgetting will occur, that is, memories Y^] 
not be evoked. Since cues are always ei 


— Е: 


ing, there will be а greater cue change W" 
time, and thus forgetting increases over ti^ 
Second, forgetting is also caused by inte 
ference, which had traditionally been view! 

as a phenomenon of retrieval (Peterso 
1977), even though the interfering materi 
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was learned close to original learning and 
long before the recall test. Interference 
basically means that the subject does not 
distinguish adequately the occasion for re- 
membering X rather than Y. This failure 
to distinguish is based on the similarity of 
the learned material to the interpolated ma- 
terial. For example, if a subject first learns 
an association between A and B and then 
learns an association between A and C, the 
two associations will interfere with each 
other when A is presented at recall. The 
more similar the items (or lists), the greater 
will be the interference, Thus, similarity is 
the keystone of forgetting through interfer- 
ence. It may be that the subject will re- 
member the item but will be confused about 
the appropriate place to use the item. In 
the laboratory, this is a failure to make a 
"list differentiation." Coding serves to in- 
crease the distinctiveness of the learned ma- 
erial and thus facilitates retrieval. 

In summary, human forgetting is con- 
Sidered to have two primary causes: (a) a 
change in the stimulus conditions from those 
that were present at the time of learning. 
This change can be either external or internal 
to the organism; and (b) interference due 
to the learning of similar material. 
~ Can these basic principles of forgetting 
be integrated with those already discussed 
ànd applied to the exit of memories from AM 
for lower animals? The application is in- 
deed speculative and must be treated with 
Breat caution, but Spear (1971, 1976, 1978) 
has shown that such an application can be 
fruitful. 

First, there are cue changes. An active 
organism is being constantly bombarded 
with stimuli, many of which are registered 
in AM (and which simultaneously have а 
Tepresentation in IM, that is, again, that 
Memories are formed quickly and are per- 
manent), Representations may be held in 
AM for varying lengths of time. Familiar 
and expected representations come and go 
With great rapidity. They already fit into 
“nsting memory systems, and little new 
j| is needed, Perhaps the only new 

tning that occurs upon the appearance of 
familiar stimuli is that an old item has oc- 
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curred again. Unexpected and surprising rep- 
resentations are held for a longer period 
of time. During this time coding occurs, one 
of whose purposes is to make the representa- 
tion retrievable by fitting (association, fur- 
ther learning, coding) it into an existing 
memory assembly. In a sense this coding 
makes the representation distinctive, for the 
representation fits into a particular memory 
assembly because of its similarities with and 
differences from other memories. If a repre- 
sentation is not adequately coded, its re- 
trievability will be greatly reduced. This 
coding (or rehearsal) takes time, during 
which the organism cannot be attending to 
other stimuli, or there will be interference 
and retrievability will be reduced. 

A novel stimulus evokes the orienting re- 
flex (Pavlov, 1927) that ordinarily produces 
a focusing on the new stimulus and there- 
fore protects it from interference during the 
rehearsal or coding. Another novel stimulus 
registering during this period that requires 
coding may preempt the rehearsal process 
and reduce the retrievability of the first rep- 
resentation (Wagner et al., 1973). The closer 
the distractor stimulus appears in time to 
the first stimulus, the greater will be the 
interference and the less the retrievability. 
Here, the reference is to the interference be- 
tween new memories in which inadequate 
processing is prevented by the preemption 
of the limited capacity AM. The first stimu- 
lus representation should still be capable of 
retrieval if an appropriate probe is used. 

One of the principal points of the present 
interpretation is that AMs are particularly 
open to interference. This may be at least 
one of the reasons that the A-B, A-C para- 
digm produces interference. The A of the A-C 
learning component reactivates the A-B mem- 
ory. When A-B and A-C are both active, 
the interference occurs. An A-B, X-B sequence 
would produce little interference because the 
memories are not concurrently active. Im- 
mediately after presentation, paired associates 
are vulnerable to forgetting, and the loss is 
geometric. Perhaps forgetting will not occur 
unless memories are reactivated in a context 
that produces interference. The time-depen- 
dent interference in the Peterson and Peterson 
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(1959) paradigm suggests this. Gordon and 
Spear (1973a) have shown an AM interference 
using the cue-dependent amnesia paradigm. 
After one memory was well learned, they pre- 
sented the cue for this memory while simul- 
taneously evoking a competing memory. Con- 
siderable forgetting ensued under these cir- 
cumstances, but there was no forgetting if the 
established memory was inactive. This is a 
landmark study in showing that AMs interfere 
with each other, but that an AM does not in- 
terfere with an IM. More recently Gordon 
(1977a) and Gordon and Feldman (1978) 
have shown that the reinstatement of a pas- 
sive avoidance memory will interfere with 
the retention of a new active avoidance 
memory. 


ECS as Interference 


There is a growing body of evidence that 
ECS blocks AMs regardless of whether they 
are old or new, but IMs are not disturbed. 
Generalizing this finding, it has been suggested 
that AMs may be open to disruption by nor- 
mal forgetting agents, particularly by similar 
AMs (Gordon & Spear, 1973a). The similarity 
between normal forgetting and forgetting due 
to ECS is difficult to see, but it probably lies 
in the neural activity that must accompany 
memory activation. Each AM, whether new or 
established, must involve a distinctive pat- 
tern of neural firing that (a) represents an 
environmental or other event and (b) will be 
repeated when the event is repeated. The pat- 
tern probably involves a characteristic fre- 
quency of firing for single neurons as well as 
a characteristic assemblage of neurons. 


After reviewing a great deal of EEG data, 
John (1972) concluded that 


the evidence which has been presented indicates that 
when a specific memory is retrieved, a temporal 
pattern of electrical activity peculiar to that mem- 
ory is released in numerous regions of the brain. To 
that released set of waveshapes corresponds the av- 
erage firing pattern of ensembles of neurons diffusely 
distributed throughout these widespread anatomical 
domains. . . . This suggests that during retrieval 
of a particular memory, a unique and invariant 
temporal pattern of coherence occurs in the neural 


DONALD J. LEWIS 


discharges averaged across a spatially distributed and 
diffuse ensemble of neurons, in which the variabl¢ 
activity of an individual neuron is significant pri- 
marily insofar as it contributes to the statistics of | 
the population. (p. 862) | 

It is assumed that similar memories have 
similar components of neural firing. It is fur- 
ther assumed that when two similar memories 
are simultaneously active, there is competition 
and interference between their firing patterns, 
That is simply to say, two different patterns 
involving the same neurons, at least to somi 
degree, cannot be firing simultaneously with- 
out interference between the two patterns. 
The occurrence of one inhibits or blocks the 
other. This is the major assumption. When 
overlapping neural firing patterns are evoked 
simultaneously, they interfere with each other) 
and reduce the probability that either will 
fire again under similar circumstances. We 
know that similar responses and similar mem. 
ories compete with each other, and we know 
that memories and responses are based on 
neural activities. All that is added here 5 
that memory competition is based on пеш 
competition, and that similarity is an im 
portant determiner of the competition. In e 
perimental amnesia, the presentation of tht 
learning stimulus activates a memory and its! 
characteristic pattern of neural firing. While 
the memory pattern is active, another pat 
tern is created by the ECS, which effectively 
interferes with or blocks the activated mem 
ory pattern. 

Admittedly, this is highly speculative 20! 
is based on little physiological data. But it 8 
not more speculative or based on less da 
than a theory which maintains that a learning 
event establishes a physiological structure 
that is fragile and open to disruption for 
period of time and that amnesic agents 8 
to destroy this process. 

The interference-retrieval hypothesis i$ f 
from implausible. The basic assumptions · 
as follows: (a) An active memory Teguri 
neural firing, (b) the pattern of neural # 
ing is characteristic of that memory, © 
similar memories share similar neural 
components, (d) similar, but different, | 
ories and neural firing patterns that are aci 
at the same time interfere with each ? 
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(e) interference between neural firing patterns 
results in forgetting, and (f) the interference 
results not in a memory destruction but a 
blocking. 


Conclusion 


A point of view has been presented here in 
an area of investigation that has many per- 
plexities and complexities. Certainly, there are 

)pockets of data that do not, at the moment, 
fit tidily into the current conception, but the 
emphasis on retrieval failure as the source 
of forgetting in psychobiology has been grow- 
ing and making increasing inroads on the 
traditional storage-failure approach. (See 
Spear's excellent new book, 1978.) Retrieval 
failure is more consonant with the important 
theorizing of human cognitive psychology and, 

| at the moment, seems to comprehend a large 
segment of the psychobiological data, and 
the distinction between AM and IM affords 
а cohesive framework for conceptualizing this 
data. The major lesson for psychobiologists 
is that they would profit from turning a 
Breater amount of their research efforts to 
the study of retrieval mechanisms and that 
they should consider the implications for 

Physiological study of a learning engram that 

‘Ys formed permanently in less than a second. 
An interesting point that remains to be con- 

Sidered is the distinction, if any, remaining 

between a consolidation-storage point of view 
and a retrieval point of view. Without doubt 
the distinction is becoming blurred. The con- 
solidation-storage approach has moved from 
| conceptualizing a long-delayed fixing of struc- 

ture produced by learning (Gold & McGaugh, 
t1975; McGaugh, 1966) to the position of 

Bloch (1976), who says that consolidation 
Would be better named the phase of infor- 
mation processing” (p. 583) or Cherkin 
(1970), who says that anything occurring soon 
after acquisition is part of the consolidation 

Process, This conceptualization of consolida- 
tion is similar to the present approach and, 
45 Bloch points out, requires brain activities 
that can be prevented or enhanced. 
ne nevertheless remain several impor- 

! istinctions between the two. One of 
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these is empirical and the others are concep- 
tual. The empirical distinction concerns the 
consolidation-storage approach, which argues 
that the various brain manipulations produc- 
ing forgetting do so either by eliminating the 
physical trace of learning (McGaugh, 1966) 
or by reducing it below some threshold of ex- 
pression (Cherkin, 1972). 

ТЕ reminder stimuli reinstate a memory, as 
they clearly do, then the amnesia or forgetting 
cannot have been due to insufficient storage. 
As has already been reviewed, the storage 
theorists argue that the reminder stimulus 
(a) is not effective if amnesia is complete and 
(b) serves simply as another learning trial if 
amnesia is not complete. Both of these argu- 
ments have been rebutted here and elsewhere 
(Miller & Springer, 1973, 1974; Spear, 1973; 
Spear & Parsons, 1976). Reminder has been 
shown following complete amnesia, and re- 
instatement has a greater effect than a single 
additional learning trial. The conclusion drawn 
here is that ECS and other forms of brain 
intervention block the retrieval of memory, 
whether old or new, rather than its storage. 
This is the interpretation that is most con- 
sistent with existing data. On this point dif- 
ferences do remain between a storage and 
retrieval point of view. 

The distinction between the two has largely 
been lost at the conceptual level when the 
consolidation-storage approach is viewed as 
a form of information processing. Information 
processing here has been treated as a form of 
coding and elaboration of already stored mem- 
ories. This elaboration is additional learning 
and it aids in the retrieval of previous learn- 
ing. In the current conception, a subject learns 
A (that FS occurred), then he learns B (that 
it came through the grid bars), then he learns 
C and D, and so forth, which are new items 
of learning that give meaning to the total, 
including A, and that aid in retrieval. Am- 
nesic agents prevent this further processing, 
and they also block (interfere with) the later 
expression of A or of any learning that has 
occurred up to the point of the amnesic agent 
as long as that new learning is still in AM. 
If this is now what consolidation means, then 
it is to that extent a retrieval theory. 
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Reference Note 


1. Wood, F., & Kinsbourne, M. Paper on the subject 
of the amnesic syndrome in humans, presented 
at the meeting of the International Neuropsychol- 
ogy Society, Boston, February 1974. 


References 


Adams, H, E., Hoblit, P. R., & Sutker, P. B. Electro- 
convulsive shock brain acetylcholinesterase activity 
and memory. Physiology and Behavior, 1969, 4, 
113-116. 

Adams, Н. E., & Lewis, D. J. Electroconvulsive shock, 
retrograde amnesia, and competing responses, Jour- 
nal of Comparative and Physiological Psychology, 
1962, 55, 299-301. 

Alpern, H. P., & Kimble, D. P. Retrograde amnesic 
effects of diethyl ether and bis (tribluoroethyl) 
ether. Journal of Comparative and Physiological 
Psychology, 1967, 63, 168-171. 

Alpern, Н. P., & McGaugh, J. L. Retrograde am- 
nesia as a function of duration of electroshock 
stimulation. Journal of Comparative and Physio- 
logical Psychology, 1968, 65, 265-269. 

Atkinson, R. C., & Shiffrin, R. M. Human memory: 
A proposed system and its control processes. In 
K. W. Spence & J. T. Spence (Eds.), The psy- 
chology of learning and motivation: Advances in 
research and theory (Vol. 2). New York: Academic 
Press, 1968. 

Atkinson, R. C., & Shiffrin, R. M. The control of 
short-term memory. Scientific American, 1971, 
225(2), 82-90. 

Azmitia, E. C., McEwen, B. S., & Quartermain, D. 
Prevention of ECS-induced amnesia by reestab- 
lishing continuity with the training situation. 
Physiology and Behavior, 1972, 8, 853-855. 

Baddeley, A. D., & Hitch, G. Working memory. In 
G. H. Bower (Ed.), The psychology of learning 
and molivation (Vol. 3). New York: Academic 
Press, 1974. 

Barondes, S. H., & Cohen, H. D. Arousal and the 
conversion of "short-term" to “long-term” mem- 
ory. Proceedings of the National Academy of 
Sciences, 1968, 61, 923-929. (a) 

Barondes, S. H., & Cohen, H. D. Memory impair- 
ment after subcutaneous injection of acetoxycyclo- 
heximide. Science, 1968, 160, 556-557. (b) 

Barraco, R. A., & Stettner, L. J. Antibiotics and 
memory. Psychological Bulletin, 1976, 83, 242-302. 

Bartus, К. T, & LeVere, T. E. Storage and utiliza- 
tion of information within a discrimination trial. 
In D. L. Medin, W. A. Roberts, & R. T. Davis 
(Eds), Processes of animal memory. Hillsdale, 
N.J.: Erlbaum, 1976. 

Bellezza, F. S., & Walker, R. J. Storage-coding trade- 
off in short-term store. Journal of Experimental 
Psychology, 1974, 102, 629-33. 

Bjork, R. A. Short-term storage: The ordered out- 
put of a central processor. In F. Restle, R. M. 


DONALD J. LEWIS 


hiffrin, N. J. Castellan, Н. R. Lindman, & D. R, 
d [o Cognitive theory (Vol. 1). Hills, 
dale, N.J.: Erlbaum, 1975. RER 

Bloch, V. Brain activation and memory consolidation. 
In M. R. Rosenzweig & E. L. Bennett (Eds.), 
Neural mechanisms of learning and memory, Cam- 
bridge, Mass.: MIT Press, 1976. 3 

Blodgett, H. C. The effect of the introduction of re- 
ward upon the maze performance of rats. Univer- 
sity of California Publications in Psychology, 1929, 

4, 113-134. E 

Blough, E. S. Recognition by the pigeon of stimuli 
varying in two dimensions, Journal of the Experi- 
mental Analysis of Behavior, 1972, 18, 345-367. 

Bogoch, S. The biochemistry of memory. London: = 
Oxford University Press, 1968. 

Bogoch, S. Brain glycoproteins and learning: New 
studies supporting the “sign-post” theory. In W. 
B. Essman & S. Nahyima (Eds.), Current bio- 
chemical approaches to learning and memory. 
Flushing, N.Y.: Spectrum-Halstead, 1973. 

Bower, С. Н, Stimulus-sampling theory of encoding 
variability. In A. W. Melton & E. Martin (Eds.), 
Coding process in human memory. Washington, | ( 
D.C.: V. H. Winston, 1972, 

Bower, С. Н. Cognitive psychology: Ап introduc- Å 
tion. In W. K. Estes (Ed.), Handbook of learning < 
and cognitive processes: Introduction to concepts 
and issues (Vol. 1). Hillsdale, N.J.: Erlbaum, 1975. 

Botwinick, С, Y., & Quartermain, D. Recovery from 
amnesia induced by pre-test injections of mono- 
amine oxidase inhibitors. Pharmacology, Biochem- 
istry and Behavior, 1974, 2, 375-379. 

Braun, J. J., Meyer, P. M., & Meyer, D. R. Sparing 
of a brightness habit in rats following visual de: 
Cortication. Journal of Comparative and Physio- 
logical Psychology, 1966, 61, 79-82. 4 

Broadbent, D. E. Perception and communication. 
New York: Pergamon Press, 1958. 

Brogden, W. J. Sensory pre-conditioning. Journal of 
Experimental Psychology, 1939, 25, 323-332. 

Brown, J. Some tests of the decay theory of imme- 
diate memory. Quarterly Journal of Experimental 
Psychology, 1988, 10, 12-21, 

Buckholtz, N. S., & Bowman, К. E. Incubation and 
PERRA pus studies with various ECS in- ~ 
ensities and durations, i j 
1972, 8, 113-117. SEY end Behavior, 

Campbell, B. A, & Jaynes, J. Rei i 
chological Revise, tong ти 

CE amna nets of memory consolidation: Role 

ent parameters. Proceedings oj 
of Sciences, 1969, 63, 1094- 


Cherkin, A. Retrograde amnesia: Im 
consolidation or impaired retrieval? C unica. 
onsoli о от. - 
tions in Behavioral Biology, 1970, 5, 183-190, > 

Cherkin, A. Retrograde amnesia in the chick: Ке 


Sistance to the reminder effect. Physio 
Behavior, 1972, 8, 949-955. reri 
EIER S. An experimental critique of "consolida. 
ion studies" and an alternative “ » 
model-systems” 


paired memory 


approach to the biophysiology of memory. In M. 

КЕ. Rosenzweig & E. L. Bennett (Eds.), Neural 

4» mechanisms of learning and memory. Cambridge, 
Mass.: MIT Press, 1976. 

Chorover, S. L., & Schiller, P. H. Short-term retro- 
grade amnesia in rats. Journal of Comparative 
and Physiological Psychology, 1965, 59, 73-78. 

Conrad, R. Acoustic confusions and memory span for 
words. Nature, 1963, 197, 1029-1030. 

Cóons, E. E., & Miller, N. E. Conflict versus con- 
solidation of memory traces to explain “retrograde 
amnesia" produced by ECS. Journal of Compara- 
tive and Physiological Psychology, 1961, 53, 524— 
531. 

Corkin, S. Acquisition of motor skills after bilateral 
medial temporal lobe excision. Neuropsychologia, 
1968, 6, 255-265. 

Craik, F. I. M. Human memory. In M. R. Rosen- 
zweig & L. W. Porter (Eds.), Annual review of 
Psychology. Palo Alto, Calif: Annual Reviews, 
1979. 

Craik, Е.І. M; & Lockhart, К. S. Levels of process- 
ing: A framework for memory research. Journal 
"of Verbal Learning and Verbal Behavior, 1972, 11, 
671-684. 

Craik, Е. І. M, & Watkins, M. J. The role of re- 
hearsal in short-term memory. Journal of Verbal 
Learning and Verbal Behavior, 1973, 12, 599-607. 

Davis, R. E., & Hirtzel, M. S. Environmental con- 
tré] of ECS-produced retrograde amnesia in gold- 
fish. Physiology and Behavior, 1970, 5, 1089-1092. 

Davis, R. E., & Klinger, P. D. Environmental con- 
trol of amnesic effects of various agents in gold- 
fish. Physiology and Behavior, 1969, 4, 269-271. 

Dawson, R. G., & McGaugh, J. L. Electroconvulsive 

shock effects on a reactivated memory trace: Fur- 

ther examination, Science, 1969, 166; 525-527. (a) 

Dawson, R. G., & McGaugh, J. L. Electroconvulsive 
shock-produced retrograde amnesia: Analysis of 
the familiarization effect. Communications in Be- 
havioral Biology, 1969, 91-95 (b). 

Deutsch, D. Tones and numbers: Specificity of inter- 
ference in immediate memory. Science, 1970, 168, 
1604-1605. 

_DeVietti, T. L., Conger, G. L., & Kirkpatrick, B. R. 
Comparison of the enhancement gradients of re- 

" tention obtained with stimulation of the mesen- 

cephalic reticular formation after training or mem- 

ory reactivation. Physiology and Behavior, 1977, 

19, 549-554. + 
DeVietti, T. L., & Holliday, J. H. Retrograde amnesia 
. produced by electroconvulsive shock after reactiva- 

tion of a consolidated memory trace: A replication. 
Psychonomic Science, 1972, 29, 137-138. 

DeVietti, T. L., & Kirkpatrick, B. R. The amnesia 
4. gradient: Inadequate as evidence for a memory 
Consolidation process. Science, 1976, 194, 438-439. 
Duncan, C. P. The retroactive effects of shock on 
learning) Journal of Comparative and Physiologi- 

cal Psychology, 1949, 42, 32-34. 

Ebbinghaus, H. Memory: A contribution to experi- 

mental psychology. New York: Columbia Univer- 


we 


a ACTIVE AND INACTIVE MEMORY 


1079 


sity Teacher’s College, Bureau of Publications, 1913. 

Estes, W. K. Toward a statistical theory of learn- 
ing. Psychological Review, 1950, 57, 94-107. 

Everett, J. C., & Corson, J. A. ECS in one-trial ap- 
petitive learning in rats. Journal of Comparative 
and Physiological Psychology, 1973, 84, 353-360. 

Flexner, J. B., Flexner, L. B., & Stellar, E. Memory 
in mice as affected by intracerebral puromycin. 
Science, 1963, 141, 57-59. 

Flexner, L. B., Flexner, J. B., & Roberts, R. B. Mem- 
ory in mice analyzed with antibiotics. Science, 
1967, 155, 1377-1383. 

Garcia, J., & Koelling, R. A. Relation of cue to con- 
sequences in avoidance learning. Psychonomic Sci- 
ence, 1966, 4, 123-124. 

Garcia, J., Kovner, R., & Green, K. F. Cue prop- 
erties vs. palatability of flavours in avoidance 
learning. Psychonomic Science, 1970, 20, 313-319. 

Garcia, J., McGowan, B. K., Erwin, F. R., & Koeuer, 
R. A. Cues: Their effectiveness as a function of 
the reinforcer. Science, 1968, 160, 794-795. 

Gardner, H., Boller, F., Moreines, J., & Butters, N. 
Retrieving information from Korsakoff patients: 
Effects of categorized cues and references to the 
task. Cortex, 1973, 9, 165-175. 

Gardner, W. R. The stimulus in information process- 
ing. Amercian Psychologist, 1970, 25, 350-358. 

Geller, A., & Jarvik, M. E. The time relations of 
ECS-induced amnesia. Psychonomic Science, 1968, 
12, 169-170. 

Glanzer, M. Storage mechanisms in recall. In G. H. 
Bower (Ed.), The psychology of learning and mo- 
tivation: Advances in research and theory (Vol. 
5). New York: Academic Press, 1972. 

Glickman, S. Deficits in avoidance learning produced 
by stimulation of the ascending reticular forma- 
tion. Canadian Journal of Psychology, 1958, 12, 
97-102. 

Glickman, S. Perseverative neural responses and con- 
solidation of the neural trace. Psychological Bul- 
letin, 1961, 58, 218-233. 

Gold, P. E., & McGaugh, J. L. A single-trace, two- 
process view of memory storage process. In D. 
Deutsch & J. A. Deutsch (Eds.), Short-term mem- 
ory. New York: Academic Press, 1975. 

Gordon, W. C. Similarities between recently acquired 
and reactivated memories with production of mem- 
ory interference. American Journal of Psychology, 
1977, 90, 231-242. (a) 

Gordon, W. C. Susceptibility of a reactivated mem- 
ory to the effects of strychnine: A time-dependent 
phenomenon. Physiology and Behavior, 1977, 18, 
95-99. (b) 

Gordon, W. C., & Feldman, D. T. Reaction induced 

` interference in a short-term retention paradigm. 
Learning and Motivation, 1978, 9, 164-178. 

Gordon, W. C., & Spear, N. E. Effect of reactiva- 
tion of a previously acquired memory on the in- 
teraction between memories in the rat. Journal of 

3 Experimental Psychology, 1973, 99, 349-355. (a) 

Gordon, W. С’, & Spear, N. E. The effect of strych- 
nine on recently acquired and reactivated passive 


1080 


avoidance memories. Physiology and Behavior, 

1973, 10, 1071-1075. (b) 

Greeno, J. G. How associations are memorized. In 
D. A. Norman (Ed.), Models of human memory. 
New Vork: Academic Press, 1970. 

Grice, G. R. The relation of secondary reinforcement 
to delayed reward in visual discrimination learn- 
ing. Journal of Experimental Psychology, 1948, 38, 
1-16. 

Guthrie, E. R. The psychology of learning. New 
York: Harper, 1935. 

Hearst, E., & Peterson, G. B. Transfer of conditioned 
excitation and inhibition from one operant re- 
sponse to another. Journal of Experimental Psy- 
chology, 1973, 99, 360-368. 

Hebb, D. O. The organization of 
York: Wiley, 1949. 

Hill, W. F., & Spear, N. E. Resistance to extinction 
as a joint function of reward magnitude and the 
spacing of extinction trials. Journal of Experi- 
mental Psychology, 1962, 64, 636-639. 

Hinderliter, C. F., Smith, S. G., & Misanin, J. R. 
Effects of pretraining experience on retention of 
a passive avoidance task following ECS. Physiol- 
ogy and Behavior, 1973, 10, 671-675. 

Hintzman, D. L. Theoretical implications of the 
spacing effect. In R. L. Solso (Ed.), Theories in 
cognitive psychology: The Loyola Symposium. 
Hillsdale, N.J.: Erlbaum, 1974. 

Howard, R. L., Glendenning, R. L., & Meyer, D. 
R. Motivational control of retrograde amnesia: 
Further explorations and effects. Journal of Com- 
parative and Physiological Psychology, 1974, 86, 
187-192. 

Howard, R. L., & Meyer, D. R. Motivational contro) 
of retrograde amnesia in rats: A replication and 
extension. Journal of Comparative and Physiologi- 
cal Psychology, 1971, 74, 37-40. 

Hudspeth, W. J., McGaugh, J. L., & Thomson, C. 
W. Aversive and amnesic effects of electroconvul- 
sive shock. Journal of Comparative and Physio- 
logical Psychology, 1964, 57, 61-64. 

Hull, C. L. Principles of behavior, New York: Apple- 
ton-Century-Crofts, 1943. 

Hunter, W. S. The delayed reaction in animals and 
children. Behavior Monographs, 1913, 2(2, Serial 
No. 6). 

Irwin, S., Banuazizi, A., Kalsner, S, & Curtis, A. 
One-trial learning in the mouse: Its characteristics 
and modifications by experimental-seasonal varia- 
bles. In A. G. Karczmay & W. P. Koella (Eds.), 
Neurophysiological and behavioral aspects of psy- 
chotropic drugs. Springfield, Il: Charles C 
Thomas, 1969. 

Isaacson, R. L., & Pribram, K. H. The hippocampus 
(Vol. 2). New York: Plenum Press, 1975. 

James, W. The principles of psychology. New York: 
Holt, 1890. 

Jensen, R. A., & Riccio, D. Effects of prior experi- 
ence upon retrograde amnesia produced by hypo- 
thermia. Physiology and Behavior, 1970, 5, 1291- 
1294. 

John, E. R. Switchboard versus statistical theories 


behavior, New 


DONALD J. LEWIS 


of learning and memory. Science, 1972, 177, 850- 
864. 

Keith-Lucas, T, & Guttman, N. 
delayed backward conditioning. 
parative and Physiological Psychology, 
468-496. 

Keppel, G., & Underwood, B. J. Proactive inhibition 
in short-term retention of single items. Journal of 
Verbal Learning and Verbal Behavior, 1962, 1, 
153-161. 

Kesner, R. A neural system analysis of memory stor- 
age and retrieval. Psychological Bulletin, 1973, 80, 
177-203. 

Kesner, R. P, & Connor, H. S. Independence of 
short- and long-term memory: A neural systems 
approach. Science, 1972, 176, 432-434. 

Kesner, К. Р., & Connor, Н. S. Effects of electrical 
stimulation of rat limbic system and midbrain 
reticular formation upon short- and long-term 
memory. Physiology and Behavior, 1974, 12, 5-12. 

Kety, S. S. The biogenic amines in the central ner- 
vous system: Their possible roles in arousal, emo- 
tion, and learning. In F. 0. Schmitt (Ed.), The 
neurosciences: Second study program. New York: 
Rockefeller University Press, 1970. 

Kinsbourne, M., & Wood, F. Short-term memory 
processes and the amnesic syndrome. In D. Deutsch 
& J. A. Deutsch (Eds.), Short-term memory. New 
York: Academic Press, 1975. 

Kopp, R., Bohdanecky, Z., & Jarvik, M. E. Long 
temporal gradients of retrograde amnesia for à 
well-discriminated stimulus, Science, 1966, 153, 
1547-1549. 

Koppenaal, R. J., Jagoda, E. R. & Cruce, J. A. Е. 
Recovery from ECS-produced amnesia following 
a reminder. Psychonomic Science, 1967, 9, 293-294. 

Landauer, T. K. Consolidation in human memory: 
Retrograde amnestic effects of confusable items i 
paired-associate learning. Journal of Verbal Learn- 
ing and Verbal Behavior, 1974, 13, 45-53. 

Landfield, P. W., McGaugh, J. L., & Tusa, R. J. 
Theta rhythm: A temporal correlate of memory 
storage processes in the rat. Science, 1972, 175, 
87-89. 

Leith, C. R., & Mahi, W. S., Jr. Effects of compound 
configuration on stimulus selection in the pigeon. w 
Journal of Experimental Psychology: Animal Be- 
havior Processes, 1977, 3, 229-239. 

Lett, B. T. Delayed reward learning: Disproof of* 
the traditional theory. Learning and Motivation, 
1973, 4, 237-246. 

Lett, B. T. Long delay learning: Implications for 
learning and memory theory. In N. S. Sutherland 
(Ed.), Tutorial essays in experimental psychology. 
New York: Academic Press, 1978. 

Lewis, D. J. Sources of experimental amnesia. Psy- # 
chological Review, 1969, 76, 461-472. 

Lewis, D. J. A cognitive approach to experimental 
amnesia. American Journal of Psychology, 1976 
89, 51-80. 

Lewis, D. J., & Bregman, N. The source of the cues 
for cue-dependent amnesia. Journal of Comparative — 
and Physiological Psychology, 1973, 85, 421—426. ащ, — 


Robust serena 
Journal of Com- 
1975, 88, 


ACTIVE AND INACTIVE MEMORY 


Lewis, D. J., Bregman, N. J., & Mahan, J. J., Jr. 
'Cue-dependent amnesia in rats. Journal of Com- 

^ parative and Physiological Psychology, 1972, 81, 
243-247. ү 

Lewis, D. Ј., & Maher, B. A. Neural consolidation 
and electroconvulsive shock. Psychological Review, 
1965, 72, 225-239. 

Lewis, D. J., & Maher, B. A. Electroconvulsive shock 
and inhibition: Some problems considered. Psy- 
chological Review, 1966, 73, 388-392. 

Lewis, D. J., Miller, R. R., & Misanin, J. R. Control 
of retrograde amnesia. Journal of Comparative and. 
Physiological Psychology, 1968, 66, 48-52. 

; D. J., Miller, R. R., & Misanin, J. R. Selective 

ds in rats produced by electroconvulsive 

shock, Journal of Comparative and Physiological 
Psychology, 1969, 69, 136-140. 

Lewis, D. J., & Nicholas, T. Amnesia for active mem- 
ory. Physiology and Behavior, 1973, 11, 821-825. 

Lubow, R. E., Rifkin, B., & Alek, M. The context 
effect: The relationship between stimulus preex- 
posure and environmental preexposure determines 

.. Subsequent learning. Journal of Experimental Psy- 

№ 10 chology: Animal Behavior Processes, 1976, 2, 38-47. 

Luttges, M. W., & McGaugh, J. L. Permanence of 

‘jp retrograde amnesia produced by electroconvulsive 

7" shock. Science, 1967, 156, 408-410. 

Lynch, G., Deadwyler, S., & Cotman, C. Postlesion 
axonal growth produces permanent functional con- 
nections. Science, 1973, 180, 1364-1366. 

Madigan, S. A., & McCabe, L. Perfect recall and 
total forgetting: A problem for models. of short- 
term memory. Journal of Verbal Learning and 
Verbal Behavior, 1971, 10, 101-106. 
ah, C. S., & Albert, D. J. Electroconvulsive shock- 

Дк amnesia gradient. Behavioral Biology, 1973, 
517-540. 

Mahut, H. Effects of subcortical electrical stimula- 
tion on learning in the rat. Journal of Comparative 
and Physiological Psychology, 1962, 55, 472-477. 

McDonough, J. H., & Kesner, R. P. Amnesia pro- 
duced by brief electrical stimulation of the amyg- 
dala or dorsal hippocampus in cats. Journal of 
Comparative and Physiological Psychology, 1971, 
77, 171-178. 

-3McGaugh, J. L. Time dependent processes in memory 
~ "storage. Science, 1966, 153, 1351-1358. 

McGaugh, J. L., & Dawson, К. G. Modification of 

emory storage processes. Behavioral Sciences, 

1971, 16, 45-63. 

McGaugh, J. L., & Gold, P. E. Modulation of mem- 
ory by electrical stimulation of the brain. In M. 
R. Rosenzweig & E. L. Bennett (Eds.), Neural 
mechanisms im learning and memory. Cambridge, 
Mass.: MIT Press, 1976. 
elton, A. W. Implications of short-term memory 

" for a general theory of memory. Journal of Verbal 
Learning and Verbal Behavior, 1963, 9, 596-606. 

Meyer, D. R., & Beattie, M. S. Some properties of 
substrates of memory. In L. Miller, C. Sandman, 
& A. Kasten (Eds), Neuropeptide influences on 
brain and behavior. New York: Academic Press, 

Law: 1977. 
av 


У 


1081 


Miller, G. A. The magical number seven plus or 
minus two: Some limits on our capacity for pro- 
cessing information. Psychological Review, 1956, 
63, 81-96. 

Miller, R. R. Effects of environmental complexity 
on amnesia induced by electroconvulsive shock in 
rats. Journal of Comparative and Physiological 
Psychology, 1970, 71, 267-275. 

Miller, R. R., Misanin, J. R., & Lewis, D. J. Am- 
nesia as a function of events during the learning- 
ECS interval. Journal of Comparative and Physio- 
logical Psychology, 1969, 67, 145-148. 

Miller, R. R., & Springer, A. D. Induced recovery 
of memory in rats following electroconvulsive 
shock. Physiology and Behavior, 1972, 8, 645-651. 

Miller, R. R., & Springer, A. D. Amnesía, consolida- 
tion, and retrieval. Psychological Review, 1973, 
80, 69-79. 

Miller, К, R., & Springer, A. D. Implications of re- 
covery from experimental amnesia. Psychological 
Review, 1974, 81, 470-473. 

Milner, B. Amnesia following operation on the tem- 
poral lobes. In C. W. M. Whitty & O. L. Zangwill 
(Eds.), Amnesia. London: Butterworths, 1966. 

Milner, B. Memory and the medial temporal regions 
of the brain, In K. Pribram & D. Broadbent (Eds.), 
Biology of memory. New York: Academic Press, 
1970. 

Misanin, J. R., Miller, R. R., & Lewis, D. J. Retro- 
grade amnesia produced by electroconvulsive shock 
after reactivation of a consolidated memory trace. 
Science, 1968, 160, 554-555. 

Mitchell, D., Scott, D. W., & Mitchell, L. К. Attenu- 
ated and enhanced neophobia in the taste-aversion 
"delay of reinforcement" effect. Animal Learning 
and Behavior, 1977, 5, 99-102. 

Müller, 6. E., & Pilzecker, A. Experimental beiträge 
zue lehre bom gedäctnesses. Zeitschrift für psycho- 
logie, 1900, 1, 1-288. 

Murdock, B. B., Jr. Human memory: Theory and 
data. Hillsdale, N.J.: Erlbaum, 1974. 

Nicholas, T., Galbraith, G., & Lewis, D. J. Theta 
activity and memory processes in rats. Physiology 
and Behavior, 1976, 16, 489-492. 

Paolino, R. M., Quartermain, D., & Miller, N. E. 
Different temporal gradients of retrograde amnesia 
produced by carbon dioxide anesthesia and elec- 
troconvulsive shock. Journal of Comparative and 
Physiological Psychology, 1966, 62, 270-274. 

Pavlov, I. P. Conditioned reflexes. London: Oxford 
University Press, 1927. 

Pearlman, C. A., Sharpless, S. К. & Jarvik, M. E. 
Retrograde amnesia produced by anesthetic and 
convulsant agents. Journal of Comparative and 
Physiological Psychology, 1961, 54, 109-112. 

Peterson, L. R. Verbal learning and memory. In M. 
R. Rosenzweig & L. W. Porter (Eds.), Annual re- 
view of psychology. Palo Alto, Calif.: Annual Re- 
views, 1977. 

Peterson, L. R., & Peterson, M. J. Short-term reten- 
tion of individual verbal terms, Journal of Ex- 
perimental Psychology, 1959, 58, 193—198. 

Posner, M. I. Short-term memory systems in human 


1082 


information processing. In A. F. Sanders (Ed.), 
Attention and performance (Vol. 1). Amsterdam: 
North-Holland, 1967. 

Posner, M. I. Cognition: An introduction. Glenview, 
Ill.: Scott, Foresman, 1973. 

Potts, W. J. The effect of different environments 
prior to electroconvulsive shock on the gradient 
of retrograde amnesia. Physiology and Behavior, 
1971, 7, 61-164. 

Quartermain, D. The influence of drugs on learning 
and memory. In M. R. Rosenzweig & E. L. Ben- 
nett (Eds), Neural mechanism of learning and 
memory. Cambridge, Mass.: MIT Press, 1976. 

Quartermain, D., McEwen, B. S., & Azmitia, E. C, 
Jr. Amnesia produced by electroconvulsive shock 
or cycloheximide: Conditions for recovery. Science, 
1970, 169, 683-686. 

Reitman, J. S. Without surreptitious rehearsal, in- 
formation in short-term memory decays. Journal 
of Verbal Learning and Verbal Behavior, 1974, 13, 
365-377. 

Riccio, D. C, & Stikes, Е. К. Persistent but modi- 
fiable retrograde amnesia produced by hypothermia. 
Physiology and Behavior, 1969, 4, 649-652. 

Rigter, H., & Van Riezen, H. Anti-amnesic effect of 
АСТН, ло: Its independence of the nature of the 
amnesic agent and the behavioral test. Physiology 
and Behavior, 1975, 14, 563-566. 

Robbins, M. J., & Meyer, D. R. Motivational control 
of retrograde amnesia. Journal oj Experimental 
Psychology, 1970, 84, 220-225. 

Roberts, R. B., Flexner, J. B., & Flexner, L. B. Some 
evidence for the involvement of adrenergic sites 
in the memory trace. Proceedings of the National 
Academy of Sciences, 1970, 66, 310-313. 

Routtenberg, A., & Holtzman, N. Memory disruption 
by electrical stimulation of substantia nigra, pars 
compacta, Science, 1973, 181, 83-86. 

Rutledge, L. T. Synaptogenesis: Effects of synaptic 
use. In M. R. Rosenzweig & E. L. Bennett (Eds.), 
Neural mechanisms of learning and memory. Cam- 
bridge, Mass.: MIT Press, 1976. 

Sara, S. J., David-Remacle, M, & Lefevre, D. Pas- 
sive avoidance behavior in rats after electrocon- 
vulsive shock: Facilitative effect of response re- 
tardation. Journal of Comparative and Physio- 
logical Psychology, 1975, 89, 489-497. 

Schneider, A. M., & Sherman, W. Amnesia: A func- 
tion of the temporal relation of footshock to 
electroconvulsive shock. Science, 1968, 159, 219-221. 

Seligman, M. E. P. On the generality of the laws of 
learning. Psychological Review, 1970, 77, 406-418. 

Serota, R. G., Roberts, R. B, & Flexner, L. B. 
Acetoxcycloheximide-induced transient amnesia: 
Protective effects of adrenergic stimulants. Pro- 
ceedings of the National Academy of Sciences, 
1972, 69, 340-342. 

Shiffrin, R. M. Capacity limitations in information 
processing, attention, and memory. In W. K. Estes 
(Ed.), Handbook of learning and cognitive pro- 
cesses: Vol. 4. Attention and memory. Hillsdale, 
N.J.: Erlbaum, 1976. 

Shulman, H. G. Encoding and retention of semantic 


DONALD J. LEWIS 


can later control 
sponses in the normal state. Psychological Review, 
1962, 69, 202-219. 

Spear, N. E. Forgetting as retrieval failure. In W. 
K. Honig & P. H. R. James (Eds), Ат 
memory. New York: Academic Press, 1971 a 

Spear, N. E. Retrieval of memory in animals. Ps: 
chological Review, 1973, 80, 163-194. 

Spear, N. E. Retrieval of memories: A psychobio- 
logical approach. In W. K. Estes (Ed.), Handbook 
of learning and cognitive processes. Hillsdale, N.J.: 
Erlbaum, 1976. 

Spear, N. E. The processing of memories: 


Forgetting 


& Parsons, P. 
tion treatment: Ontogenetic determinants of alf 
leviated forgetting. In D. L. Medin, W. A. Roberts, 
& R. T. Davis (Eds.), Processes of animal met. 
ory. New York. Wiley, 1976. 

Spence, K. W. The role of secondary reinforcement. 
in delayed reward learning. Psychological Review, 
1947, 54, 1-8. 

Sperber, R, D., Greenfield, D. R., & House, B. J. 
A nonmonotonic effect of distribution of trials in 
retardate learning and memory. Journal of Ex- 
perimental Psychology, 1973, 99, 186 -198. 

Sperling, G. A model for visual memory tasks, Hu 
man Factors, 1963, 5, 19-31. 


Squire, L. R. Short-term memory as à biological eit 
tity. In D. Deutsch & J. A. Deutsch (Eds.), Shori 
term memory. New York: Academic Press, 1975. 


Spear, N. E, 


Sternberg, S. High-speed scanning in human memory; 
Science, 1966, 153, 652-654. 

Talland, G. Disorders of memory and learning. 
more: Penguin Books, 1968. 

Terry, W. S., & Wagner, A. R. Short-term memor, 
for "surprising" vs. “expected” US in Pavlo 
conditioning. Journal of Experimental Psychology" 
Animal Behavioral Processes, 1975, 104, 122-133. 

Thistlethwaite, D. A critical review of latent le 
ing and related experiments. Psychological Bullets 
1951, 48, 97-129. 

Tolman, E. C. Purposive behavior in rats and m 
New York: Century, 1932. 

Tolman, E. C. There is more than one kind of lea 
ing. Psychological Review, 1949, 56, 144-155. 

Tulving, E. Theoretical issues in free recall. In- 
R. Dixon & D. L. Horton (Eds), Verbal behavih 
and general behavior theory. Englewood Cli 
NJ = Prentice-Hall, 1968. 

Tulving, E. Short- and long-term memory: Difi 
ent retrieval mechanisms, In К. H. Pribram i 
D. E. Broadbent (Eds.), Biology of memory- Ne 
York: Academic Press, 1970. 

Tulving, E. Episodic and semantic memory. ы) 


| 


1 


Balti 


ACTIVE AND INACTIVE MEMORY 


Tulving & W. Donaldson (Eds.), Organization of 
hy e New Vork: Academic Press, 1972. 

ulving, E. Cue-dependent forgetting. American Sci- 

entist, 1974, 62, 74-82. 

Underwood, B. J. Attributes of memory. Psycho- 
logical Review, 1969, 76, 559-573. 

Wagner, A., Rudy, J., & Whitlow, J. Rehearsal in 
animal conditioning. Journal of Experimental Psy- 
chology, 1973, 97, 407-426. 

Warrington, E., & Weiskrantz, L. Verbal learning 
and retention by amnesic patients using partial 
information. Psychonomic Science, 1970, 20, 210- 
211. 

jou, N. C, & Norman, D. A. Primary memory. 
Psychological Review, 1965, 72, 89-104. 

Weiskrantz, L., & Warrington, E. К. The problem 

of the amnesic syndrome in man and animals. In 


| 


1083 


R. L. Isaacson & K. H. Pribram (Eds), The 
hippocampus (Vol. 2). New York: Plenum Press, 
1975. 

Wickelgren, W. A. Similarity and intrusions in short- 
term memory for consonant-vowel  trigrams 
Quarterly Journal of Experimental Psychology, 
1965, 17, 241-246. 

Wickelgren, W. A., & Norman, D. A. Strength models 
and serial position in short-term recognition mem- 
ory. Journal of Mathematical Psychology, 1966, 
3, 316-347. 

Wilburn, M. W., & Kesner, R. P. Differential amnesia 
effects produced by electrical stimulation of the 
caudate nucleus and nonspecific thalamic systems. 
Experimental Neurology, 1972, 34, 45-50. 


Received May 5, 1978 m 


Psychological. Bulletin 
1979, Vol. 86, No. 5, 1084-1089 


Interactions, Partial Interactions, and 
the Analysis of Variance 


Interaction Contrasts in 


Robert J. Boik 
Pepperdine University 


A two-step framework for the interpretation of significant two-treatment inter- 

actions is proposed. First, а contrast between the means of one treatment is 
estimated separately at each level of the second treatment. A partial interaction 

test tests the hypothesis that the variabiliy among these contrasts is zero. The & 
second step consists of computing а difference between the separately estimated 
contrasts. An interaction contrast test tests the hypothesis that this difference 

is zero. The familywise Type I error rate can be controlled at alpha by employ- 

ing Gabriel’s simultaneous test procedure for partial interaction tests and 

Scheffé’s method for interaction contrast tests. 


Recently, an issue has arisen concerning a 
posteriori tests after detection of a significant 
interaction in the analysis of variance. Mara- 
scuilo and Levin (1970) and Levin and Mara- 
scuilo (1972) have criticized the use of simple 
effects tests in which the means within a single 
row or column of the data matrix are compared. 
Their basic criticism is that the null hypotheses 
tested by simple effects tests are not coherent 
with the null hypothesis tested by the a 
priori omnibus interaction test. To maintain 
a coherent analysis, Marascuilo and Levin 
recommended testing interaction contrasts 
after detection of a significant interaction. 

In commenting on the approach of Mara- 
scuilo and Levin (1970), Games (1973) argued 
that although interaction contrasts are co- 
herent with the omnibus interaction test, 
these contrasts do not easily lend themselves 
to meaningful behavioral interpretations. Con- 
sequently, Games recommended performing 
simple effects tests. Levin and Marascuilo 
(1973) replied by emphasizing the flexibility of 
the interaction contrast approach. Thus far, 
there is little evidence that the issue has been 
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resolved (Marascuilo & Levin, 1976; Games 
Note 1). A 

This article neither attempts to resolve nor) 
necessarily even to clarify the issue. The ques. 
tion of whether interaction contrasts should 
be tested must ultimately be answered by 
each individual researcher. Rather, the article 
attempts to present à conceptual framework 
for testing interaction contrasts, should the 
researcher decide to do so. Interaction contrasts 
will not be tested unless they can be readil 
interpreted by the researcher. It is hoped (Һай 
the conceptual framework presented in tH 
article will increase the meaningfulness o! 
interaction contrasts. 


Hypothetical Data 


The means for a set of hypothetical da 
are presented in Table 1. Each of 72 biolo| 
students complaining of a severe fear 
blood (hemophobia) was randomly assign: 
to one of the 12 cells with the restriction t 
6 students be assigned per cell. Treatment 
represents three types: of fear reducti 
therapy. Subjects in the first level of Tre 
ment À served as the control group and 
not participate in any therapy. Subjects 0 
the second and third levels of Treatment 


respectively. Treatment B represents 
doses of an antianxiety medication. The 
level increases from bi (placebo) to ba ( 
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participated in therapies denoted as a2 and 
i | 

| 
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INTERACTIONS 


imum dose). Therapy sessions occurred while 


ће subjects were under the influence of the 


drug. The dependent variable is the magnitude 
of the electrodermal response (in arbitrary 
units) when subjects were exposed to blood. 
Subjects were not under the influence of the 
drug when the electrodermal response was 
measured. 

The ANOVA summary table is presented in 
Table 2. Both treatments and the interaction 


xe Significant at the .01 level. The strength of 


е treatments and interaction as measured 
by а? (see Kirk, 1968, for a description) is as 
follows: w, = .33, wp = 31, and wap = .17. 
The strength of association for the treatments 
and interaction combined is а! = .81. Thus the 
therapies used, the drug doses selected, and 
the interaction between the two treatments do 
account for a substantial portion of the vari- 
ability among the electrodermal response 
scores (however, see Glass & Hakstian, 1969, 


"tor a cautionary note on the use of w°). 


Treatment Contrasts 


A contrast among the p levels of Treatment 
A is denoted by A, and defined as 
ЕД 
4i =È сш, (1а) 
1 
t = а subscript of arbitrary value used 
to distinguish one Treatment A contrast from 
another; c; — the coefficient of the contrast 
associated with the ith level of Treatment A. 
The p coefficients are subject to the restriction 


Table 1 

ABypothetical Means for a 3 X 4 Factorial 
Design in Which Each Cell Mean Is Based on 
Six Observations 


Treat- Treatment B 

ment 
A bi be bs b, Row M 
а 50.2 47.5 460 47.9 47.9 
az 49.9 38.2 28.5 19.0 33.9 
as 45.7 391 36.5 327 38.5 


41.6 37.0 33.2 


pa 
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Table 2 
Summary of Analysis of Variance 
Source SS df MS F 

Treatment A 2,444.16 2 1,222.08 63.82 
Treatment B 2,370.96 3 970.32 41.27 
Treatment А X 

Treatment B 1,376.40 6 229.50 11.98 

Within cell 1,149.00 60 19.15 


Note. Total SSs = 7,340.52, total dfs = 71. All Fs 
are significant at p < .01. 


and p; = the population mean of the ith 
level of Treatment A averaged across all 
levels of Treatment B. The estimate of A, 
is denoted by A, and defined as 
р 
Av - E aM: , (1b) 
1 

where M;. = the sample mean of the ith level 
of Treatment A averaged across all levels 
of Treatment B. The X 1 vector of coef- 
ficients (i.e., the р cis) associated with A, and 

(is denoted by A, For example, the coef- 
ficient vectors associated with the three 
pairwise Treatment A contrasts are A’, = 
(1 —10), A^; = (10 —1), and A’; = (01 —1). 
From Equation 1b, the estimates of the 
pairwise contrasts are 4, = 14.0, А, = 9.4, 
and А, = —4.6. 

The sum of squares accounted for by a 
Treatment A contrast is given by 

A? 
T Q) 
2g 

1 


554, = 


where g is equal to the number of levels of B. 
From Equation 2, the sums of Squares of the 
three pairwise contrasts аге SS4, = 2352.00, 
SSa, = 1060.32, and SS4, = 253.92. The cor- 
responding 1 and 60 degrees of freedom F 
ratios are F4, = 2352.00/19.15 = 122,82, Fa, 
= 1060.32/19.15 = 55.37, and F4, = 253.92/ 
19.15 — 13.26. The critical F using Scheffé’s 
(1953) method and a = .01is S = 2-F(2, 60) 
= 2(4.98) = 9.96. It is concluded that both 
therapies (аз and аз) resulted in a smaller 
electrodermal response (EDR) than the control 
group and that therapy a; resulted in a smaller 
EDR than therapy аз. 

A contrast between the g levels of Treatment 
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B is denoted by B, and defined as 


9 

В, = У Cjp-i у (3a) 
where u = a subscript of arbitrary value used 
to distinguish one Treatment B contrast 
from another, and c; = the coefficient of the 
contrast associated with the jth level of 
Treatment B. The д coefficients are subject to 
the restriction 


3629. 
1 
и = the population mean of the jth level of 
Treatment B averaged across all levels of 
Treatment A. The estimate of Bu is denoted 
by B, and defined as 
9 
В, = у GMs, (3b) 
1 

where M.;=the sample mean of the jth 
level of Treatment B averaged across all 
levels of Treatment A. The q X 1 vector of 
coefficients associated with B, and B, is 
denoted by B,. For example, the coefficient 
vector associated with the contrast between 
the placebo dose (bi) condition and the three 
nonzero dose conditions is Вл = (3 —1 —1 — 1). 
The contrast estimate from Equation 3b, is 
В, = 34.0. 

The sum of squares accounted for by a 
Treatment B contrast is given by 


ва 
PT. (4) 
>> ср 


55в, = 


For example, the sum of squares of the By 
contrast is 58, = 1734.00. The F ratio is 
Fp, = 1724.00/19.5 = 90.55. Since the ob- 
served F is larger than the Scheffé value, 
S = 3F(3, 60) = 3(4.13) = 12.39, а = .01, it 
is concluded that on the average, the EDR 
is smaller under the three drug conditions 
than under the placebo condition. 


Partial Interactions and Interaction Contrasts 


Often treatment contrasts such as Aj, Аз, 
Аз, and Вл are not tested when the Treatment 
A X Treatment B interaction is significant. 
This reflects the commonly held belief that a 
significant interaction renders treatment con- 
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trasts meaningless. Fortunately, this is not. 
the case. 4 
Let us define А, у as а simple Treatment A 
contrast at the jth level of Treatment B. 
That is, | 
» 
Апр = È сш, (Sa) 
1 
where ша = the population mean of the ith 
level of Treatment A at the jth level of 
Treatment B. The simple treatment mei 
Au) is estimated by Aw), where 
LÀ 
Aw = У Му, 
1 
and Мұ; = the sample mean of the ith level, 
of Treatment A at the jth level of Treatment 
B. For example, from Equation Sb, the 
estimates of the Азу simple treatment 
contrast are 


(Sb) 


Asay = 4.50, A 
Âr% = 8.40, 
Âz% = 9.5, 
and 
А.а = 15.2. 


The simple effects procedure described by 
Games (1973) consists of individually testing 
the four simple treatment contrasts ЋИР 
ignoring the А; treatment contrast itsel 
However, the significant Treatment A x 
Treatment B interaction does not necessarily 
indicate heterogeneity among the 4 420 
simple contrasts. If the observed variability 
among the four А (у) simple contrast estimate: 
can be attributed to experimental error, thel 
the A» treatment contrast can be interprete 
without regard for the significant Treatmem 
A X Treatment B interaction. The test ò 
homogeneity among the four 40 simp) 
contrasts represents a partial test of thi 
interaction. Let us denote this source 4 
variation as the interaction between the 
contrast and Treatment B or, more simpl) 
the A,B partial interaction. 1 
. The sum of squares for an AB parti 
interaction can be obtained from | 


SSam = MEM ЛР | 


Equation 6 represents a modification of the 
amiliar computational formula for a sum of 
squares: ч 


55 = Y X, (у Xn)?/n. 
1 1 


The simple contrast estimates have been 
substituted for the raw scores; the equation 
has been divided by 


>> CP? 


to normalize the contrasts; and finally, 
because each cell mean is based on л observa- 
tions, the equation has been multiplied by n. 
The degrees of freedom for the A,B partial 
interaction are obtained by multiplying the 
degrees of freedom of the 4, contrast by the 
degrees of freedom for Treatment B. That is, 


Yaw = (аел) (dfa). (7) 


From Equation 6, the sum of squares of the 
A,B partial interaction is SS4,5 = 6(412.1 
— 37.6%/4)/2 = 175.98. The degrees of freedom 
from Equation 7 are dfa, = (1)(3) = 3. 
The mean square and F ratio are therefore 
MSa,B = 175.98/3 = 58.66, and Fa, 
.66/19.15 = 3.06. 

e critical value for the F ratio of an a 
osteriori partial interaction test cannot be 
| obtained by Scheffés method unless the 
partial interaction has only one degree of 
freedom. However, the simultaneous test 
| procedure (STP) developed by Gabriel (1964, 
1969) can be employed. The STP and Scheffé’s 
Et are both coherent with the omnibus F 
| t and therefore are coherent with each 
jother. The relationship between Scheffé's 
imgthod and the STP has been described by 
‘Boik (1979). The critical value for an a 
posteriori vs degrees of freedom partial inter- 
action test is 


(8) 


Where F(vı va) = the critical value for the 
omnibus interaction test. Note that when 
Уз = ур, the critical value is equivalent to 
that of the omnibus test, and when »; = 1, 
the critical value is equivalent to that given 
by Scheffé’s method. 

Бог the A,B partial interaction test, the 
tical STP value is Gs = [6-F(6, 60)]/3 


Gy, = nF (v1, и) оз, 
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= (6-3.12)/3 = 6.24, a = .01. Since the ob- 
served F of 3.06 is less than 6.24, it is concluded 
that the А, simple contrasts are homo- 
geneous over the 4 levels of Treatment B. 
Thus, not only does the a; therapy result in 
a lower EDR than the control group on the 
average, but in addition, the difference is the 
same for all four drug levels. 

A similar analysis for the A 1 contrast results 
in a different conclusion. The sample estimates 
of the four Aig) simple treatment contrasts 
are 


Аа = 3, 

Ало) = 9.3, 

Ais) = 175, 
and i 

Ала) = 28.9. 


From Equations 6 and 7, the sum of squares 
and degrees of freedom for the A,B partial 
interaction are SS4,» = 1332.12, and алув 
= 3. The mean square and F are therefore 
MSas = 444.04, and Р.в = 23.19. Since 
the observed F is larger than the critical STP 
value of 6.24, the A,B partial interaction is 
judged significant. This indicates that the 
difference between the control condition and 
therapy a» is not identical for all drug levels. 
An examination of the Ал simple contrast 
estimates suggests that the difference between 
the control group and the a; therapy group 
might be larger under the three nonzero drug 
dose conditions than under the placebo condi- 
tion. This hypothesis can be tested by means 
of an interaction contrast. 

A product interaction contrast consists of 
a contrast between simple treatment contrasts. 
In the previous example, the interaction 
contrast described consists of the difference 
between the А1) simple contrast under the 
placebo condition and the average A; simple 
contrast under the three drug conditions. 
The Treatment B coefficient vector associated 
with the interaction contrast is B^; = (3 —1 
—1 ~1). 

Let us denote a product interaction contrast 
as А,В, and define it as 


4 
А.В, = У А, (9a) 
1 


where с; = the jth coefficient in the coefficient 
vector B,. The coefficient vector B, describes 
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the contrast between the Ag) simple treatment 
contrasts. An A,B, interaction contrast is 


estimated by ABa where 


= Cr e 
A, By = У, Аку - (9b) 


1 
The sum of squares of a product interaction 
contrast is computed from 


n (4B, 


$ UD (10) 
o cg): Qe) | 


SSAB, = 


where c; = the ith coefficient in the vector А,, 
and c; = the jth coefficient in the vector Bu. 
The degrees of freedom for a product interac- 
tion contrast are equal to 


аав, = (dfA)* (df), (11) 


where dfa, = the degrees of freedom of the 
A, contrast, and dfg, = the degrees of freedom 
of the B, contrast. Since the A, and By 
treatment contrasts always have 1 degree of 
freedom each, an interaction contrast always 
has 1 degree of freedom. 

For example, from Equation 9b, the estimate 


of the 41B; contrast is АВ, = (3)(.3) + (—1) 
(9.3) + (—1)(17.5) + (—1) (28.9) = — 54.8. 
From Equation 10, the sum of squares is 
З5лв = [(6)(—54.8)"/[(2) (12)] = 750.76. 
The F ratio is F4,5, = (750.76/1)/19.15 = 
39.20. The Scheffé critical value for the 1 and 
60 degrees of freedom F ratio is S = 6F (6, 60) 
= 6(3.12) = 18.72, а = .01. Since the ob- 
served F of 39.20 is larger than 18.72, the 
A,B, interaction contrast is judged signifi- 
cant. It can be concluded that the difference 
between the аз therapy group and the control 
group is larger under the drug conditions than 
under the placebo condition. In other words, 
the antianxiety drug increases the effectiveness 
of the a» therapy, when the effectiveness is 
measured by the difference between the 
control and therapy conditions. 

Partial interactions can also be examined 
by starting with a B, rather than an А, 
treatment contrast. For example, earlier it 
was shown that the Bı contrast [where 
B^; = (3 —1 —1 —1)] is significant. That is, 
on the average, the EDR is smaller under the 
drug conditions than under the placebo 
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condition. It is of interest to determir. 
whether this difference is the same for'À 
three therapy groups. Simple Treatment | 
contrasts allow us to estimate the differenc 
separately for each therapy group. A simpl 
Treatment B contrast is denoted by B,, 
and defined as 
9 

Bui = Y суш), (12a 
where c; = the jth coefficient in the vecti 
Bu. The Buu) simple contrast is estimated b 
BÉ. where 


@ 
В. = У Му. (12b 
1 
For the В, contrast, the simple treatmen 
contrast estimates are 


Ви) = 9.20, 


Во = 64.00, 
and 
Bi = 28.00. 


The sum of squares of an AB, partia 
interaction can be computed from an equatiot 
analogous to Equation 6: 


SSas, = MÈ B. – È Bu? p 


1 


È ej. (13) 


1 
The degrees of freedom for an AB, partial 
interaction is equal to 


dfan, = (dfa): (dfn,) - 


Employing Equations 13 and 14, the sur 
of squares, degrees of freedom, and meal 
square of the AB, partial interaction af 
SSas, = 171.04, адв, = 2, and М5ав,= 
385.52. The F ratio is therefore Fa s, = 385.52 
19.15 = 20.13. From Equation 8, the critica 
STP valueis G» = [6F (6,60) ]/2 = (6) (3.12)/' 
= 9.36, a = (01. Since the observed F 
20.13 exceeds 9.36, the AB; partial interactiol 
is judged significant. It is concluded that thi 
difference between the EDR under the placeb 
condition and the average EDR under th 
three drug conditions is not the same for 
therapy groups. 4 

The precise nature of the ABi dre 


(14) 
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% able 3 
|. Summary of Analysis of Variance 
р 
d Source SS df MS pU 
| Treatment A 2,444.16 2 1,222.08 63.82* 
er 2,352.00 1 2,352.00 122.88* 
En 1,060.32 1 1,060.32 55.37* 
Аз 253.92 1 253.92 13.26 
) Treatment B 2,370.96 3 790.32 41.27* 
Bi 1,734.00 1 1,734.00 90.55* 
Treatment A X 
Treatment В 1,376.40 6 229.40  11.98* 
А.В 1,332.12 3 444.04 23.19* 
АВ! 750.76 1 750.76 39.20* 
А,В 175.98 3 58.66 3.06 
AB, 771.04 2 385.52 20.13* 
| Within cell 1,149.00 60 ^ 19.15 
| Note. Ал = (1 —1 0); А = (1 0 —1); Аз = 
(01 –1); Вл = (3 —1 —1 —1) Total 558 = 
7,340.52; total dfs = 71. 
*p < 01. 


teraction can be determined by testing 
interaction contrasts. As in Equations 9a 
and 9b, an A,B, interaction contrast and its 


| estimate, AB, can be defined as 
| 


| р 
А.В, = У Ви, (15а) 
1 
Nand 
~ p 
AB, = У авио, (15b) 
1 


where c; = the ith coefficient in the vector А,. 
The coefficient vector A; describes the contrast 
among the Bua) simple treatment contrasts. 
An examination of the B;(; simple contrast 
estimates suggests that the difference between 
“the placebo and drug conditions is larger for 
therapy аг than for the control condition. The 
éoefficient vector associated with this contrast 
is A’) = (1 —1 0) From Equation 15b, 


AB, = (1)(9.2) + (—1) (64.0) = —54.8. No- 
tice that this is the same result earlier obtained 
from Equation 9b. Equations 10 and 11 could 
be used to calculate the sum of squares and 
‘degrees of freedom for the interaction contrast. 
Of course, the results would again be the same. 
Table 3 presents an ANovA table that sum- 
marizes all the treatment contrasts, partial 
interactions, and interaction contrasts tested. 


У 
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Examples of.partial interactions and inter- 
action contrasts for a three-treatment design 
are provided by Boik (1975), who also describes 
alternative methods for controlling the Type T 
error rate when testing partial interactions 
and interaction contrasts. 


Reference Note 


l. Games, P. A. Nesting, crossing, and the role of 
statistical models. Paper presented at the meeting 
of the American Educational Research Association, 
New York, April 1977. 
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On Getting Good Subject Mileage: Reuse | 
of Subjects in Experiments Involving Groups 


John M. Light 
Department of Sociology 
Princeton University 


This article presents a combinatorial solution to the problem of reusing sub- 
jects, which often arises in experiments on groups. Blocking and Latin square 
approaches are examined and contrasted. A compromise solution is offered 
applying a variation of "v, k, A" balancing techniques to incomplete block 
designs. Suggestions are offered for testing individual effects as well as for 
using ordinary one-way analysis of variance. 


Experimental studies involving interacting 
groups are an omnipresent feature of social 
psychological literature. In some cases, re- 
searchers are able to simulate interaction 
using confederates, fictitious people, com- 
puters, and the like. But many substantive 
areas of social psychology require that ac- 
tual groups be studied, for example, group 
polarization (Meyers & Lamm, 1976), in- 
teraction and attraction (Insko & Wilson, 
1977), bystander intervention (Wegner & 
Schaefer, 1978), obedience to authority 
(Hamilton, 1978), and certainly many others. 
Sometimes the group can actuall be the 
unit from which measurements of various 
dependent variables are taken, for example, 
studies of cooperation rates in Prisoners' 
Dilemma games (Dawes, McTavish, & Shak- 
lee, 1977). 

One aspect of group-level experiments 
poses an especially difficult problem. Sup- 
pose we are going to perform a simple one- 
factor completely randomized experiment on 
a set of subjects. Five or six people per 
treatment may be sufficient for the discovery 
of any treatment effects. But suppose that 
instead of people, the unit of analysis is 
groups, for example, groups of four. Then 
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the researcher would need five or six grou 
per treatment or between 20 and 24 peopl 
It should be clear that if the experiment coi 
sists of more than a few treatment lev 
or the researcher complicates the design ii 
апу way (eg. by adding another factori 
many subjects may be required. 

Even assuming that enough subjects 
available to carry out such a design, 
researcher still must contact, schedule, 
pay these additional subjects. Therefo| 
with all elements considered, the typi 
group-level experiment is logistically mo! 
difficult to carry out than an experiment i 
which people themselves are the unit of ап; 

This being the case, any enterprising i 
vestigator would want to determine wheth 
there is some legitimate way to reuse sul 
jects in group-level experiments. In 
article, we consider past approaches to 
problem and then present a plan that 
feel is a sensible alternative. 


Randomized Block Designs 


Techniques for the reuse of subjects ha 
typically been based on (or related to) га 


&domized block (RB) designs. In such de- 
signs, one ordinarily assembles a number of 
"blocks" of size t, where £ is the number of 
experimental treatments (or treatment com- 
binations), and assigns one member of each 
block at random to each treatment (Table 
1). If the assumptions of the design are met, 
one can calculate treatment effects as well as 
subject (block) effects. The linear model for 
the design is 


Xy — b B;t n ey, (1) 


where 8; is the effect of being in the jth 
treatment, and r, is the effect of being the 
ith block. Since the blocks are assembled so 
that they are relatively homogeneous with 
respect to the dependent variable under 
study, treatment effects can be more easily 
discovered (though not always, since one 
loses degrees of freedom in attempting to 
incorporate block effects). Of course, block- 
Ing can be used as one aspect of more com- 
plex designs (Kirk, 1968, p. 11). 

With repeated measures designs, a block 
may simply correspond to one experimental 
subject on whom more than one of the treat- 
ment levels is applied; thus the subjects are 
reused. In the extreme case, the researcher 
“can administer each treatment to each sub- 
ject. There is an additional bonus in this 
case, since subjects act as their own controls 
(Kirk, 1968, p. 131). 

Such a repeated measures procedure is 
probably the ideal way to employ the RB 
design. On the other hand, repeated mea- 
sures can only be used in the RB design if 
administration of one treatment does not 
“affect the results that would be expected by 


~ 
Table 1 ї 
Randomized Block Design 
Treatment 

Block 1 2 3 4 5 

: 1 о 0 0 % Os 

2 0; 0s 0 0, 0 

3 0 о бг 0 0 

4 0 0 Os 0, O 


Note. Each 0 represents one observation on a member 
of that block. 
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administration of a later treatment. With 
repeated measures on nonreactive units of 
analysis, such as rocks or leaves (as in 
natural science experiments), all treatment 
conditions are properly independent. But if 
human subjects are used, such independence 
is unlikely.? 

Violation of the independence assumption 
can occur for two reasons, First, there may 
be some time-order effect, that is, subjects 
react differently to later trials than earlier 
ones. For example, suppose that a researcher 
is studying the extent to which people are 
persuasive under several different motiva- 
tional conditions; subjects may improve their 
persuasion just by practicing on earlier trials 
(Greenwald, 1976, p. 316). Second, there 
may be some interaction effects; for example, 
a given treatment is reacted to differently 
depending on the treatments preceding it. 

In the event that a block repeated mea- 
sures design is used for groups (ie., each 
block is composed of one group that is re- 
peatedly measured on the dependent vari- 
able), both time-order and Time-Order x 
Treatment effects should be more pro- 
nounced, since it is well-known that groups 
tend to negotiate definitions of situations 


` (Asch, 1956; Festinger, 1950; Sherif, 1936). 


Note that if treatments are randomized 
within blocks, the main effect of time order 
cancels out in treatment comparisons. It 
does, however, contribute to the error term 
and therefore reduces the power of the de- 
sign. From this perspective, a repeated mea- 
sure group-level RB design will tend to be 
conservative, though relatively inefficient 
(we reject fewer null hypotheses than we 


2 Actually, the statistical requirement for an un- 
biased F test in a repeated measures design is that 
the Treatment X Treatment covariance matrix have 
equal off-diagonal elements (Winer, 1971, p. 277). 
This condition is satisfied if treatment covariances 
are only a result of measurements for each pair of 
treatments containing members of the same blocks 
(cf. Kirk, 1968, p. 140). 

3 All three types of context effects discussed by 
Greenwald (1976) are basically Time X Treatment 
interaction. His discussion of such effects is more 
detailed and contains more examples than ours. 
Much of that discussion applies to cases in which 
entire groups are reused as well. 
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Table 2 

Latin Square Design 

Time order 
Subject 1 2 3 4 

1 A B С р 
2 B c D A 
3 С р А B 
4 D A B c 


Note. Each letter represents a different treatment. 


should), Time х Treatment interaction can 
similarly be removed by more elaborate ran- 
domization, But other interactions cannot 
and may either further obscure actual results 
or create spurious ones. 


Latin Square Designs 


One way to increase the power of an RB 
repeated measures design and to deal with 
the main effects of time order is to have 
each subject take the treatments in different 
time orders. In this way, each treatment is 
given first during one round, second during 
another, third during another, and so on. 
This should allow the researcher to extract 
these time-order effects from the error term, 
thereby increasing the power of the test. 

In fact, this is just the approach taken in 
a Latin square (LS) design (Table 2). For 
a repeated measures LS design, we interpret 
the rows of the square as observational units 
(people, groups, etc.), the columns as the 
different times, and the table entries as treat- 
ments, The treatments are assigned so that 
each treatment appears only once in each 
row and column. In this way, each subject 
takes each treatment only once, and each 
treatment is taken at each possible different 
time only once. This is precisely the balanc- 
ing condition previously described as desir- 
able, The linear model for the LS design is 


Хут Seta + 8; + ук + пао, (2) 


where ш is the grand mean, a, is the effect 
of row i (subject), 8; is the effect of column 
j (time), yx is the effect of treatment #, and 
є is the experimental error. 

At first, one might think that this design 
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solves our problems. Certainly the major: 
problem of controlling for time-order eff 

has been taken care of. At least a treatment/ 
effect is not suppressed by including the main 
effect of time order in the error term. 

However, this design does not deal with) 
interaction effects. It must be assumed that! 
there are no interactions between time апа! 
treatment, time and subject, or treatment 
and subject (nor can there be any three-way | 
interaction), If any exist, they will be con- 
founded with treatment effects (in the casei 
of Time х Subject interaction) ог experi-// 
mental error (in the case of Time x Treat- 
ment or Treatment X Subject).* Since the) 
error term appears in the denominator of 
the F statistic for all main effects, occurrence 
of these last two types of interaction pro- | 
duces a conservative test (Cochran & Cox, | 
1957). On the other hand, if there is reason | 
to believe that such interaction effects will , 
be present, this conservative aspect of the’ 
LS design may be inefficient—a strong treat- 
ment effect is the only sort that would be 
detected. 

Although Time X Subject interaction is 
confounded with treatment effects, if one is 
dealing with groups as the unit of analysis, 
it is probable that this type of interaction } 
will not be a problem, In the case of Time,’ 
X Group interaction, we are postulating that 
groups learn differently: Time order has 
different effects on different groups. This is 
an unlikely possibility. Moreover, it does not 
seem likely that groups will have peculiar 
responses to particular treatments, Thus 
Treatment X Group interaction should be an 
unlikely problem as well. if 

Both of these generalizations are based on 
the fact that groups tend to be more homo- 
geneous than a comparable number of single 
individuals, since averaging of many individ- | 


*Evidently many people believe that all of these 
interactions will be confounded with main treat- | 
ment effects, but this is not so. Interactions involving 4“ 
treatment (Time X Treatment or Subject X Treat- 
ment) will increase variance about the treatment | 
mean, thus adding to error and obscuring real treat- 
ment effects. Only Time X Subject interaction will 
create between-means variance that artificially in- 
creases apparent treatment effects, 
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уат properties is possible. The larger the 
“group, the more likely this is to be true. 
Although there are experiments in which 
the characteristics of particular individuals 
in each group may be important (e.g., prob- 
lem solving, in which individual insight is 
the key to the solution), many if not most 
group experiments can rely on this homoge- 
nizing effect to create relatively similar-act- 
ing groups. 
But Time x Treatment interaction (Green- 
ald, 1976) does pose a problem, much as 
it does with RB repeated measures designs. 
To say that such interaction exists is to 
say that a treatment has a different effect 
at a particular time. For example, it may be 
that in a bargaining game, large payofís have 
more impact when administered in an early 
or late session; in the first case, there might 
be a primacy effect, and in the second, sub- 
les would have had a chance to develop 
nough perspective to realize the magnitude 
of the payoff. As argued earlier, this could 
well be a problem when repeated measure is 
taken on individuals; it will be even worse 
when groups are the unit measured. The 
reason this is such a problem for group re- 
peated measures designs, whether RB or LS, 
again rests on the fact that groups tend to 
Ve a source of reality conformation. This 
ing the case, we would expect that group 
members’ orientations toward a given treat- 
ment would be solidified in the group context, 
the more often that group context is reused. 
This would tend to exaggerate the Time X 
Treatment interaction. An individual alone 
would perhaps be less willing to make such 
4, judgments. 
Besides being weakened by the presence 
of this interaction effect (an effect that we 
Save argued will often be present), LS de- 
signs, whether applied to individuals ог 
groups, suffer from another, even more basic, 
problem. Kirk (1968, p. 158) notes that 
it is impractical to use LS designs of di- 
mension less than 5 X 5 with only one ob- 
servation per cell (as in Table 2), since 
there are not enough degrees of freedom to 
give the design any power. This gives the 
researcher two choices: to replicate the square 
(thereby doubling the number of experi- 
A units and to some extent defeating 
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the purpose of the repeated measures pro- 
cedure) or to use a fairly large square. But 
if the latter course can be followed (i.e., the 
experimenter is able to vary the number of 
treatment levels), fatigue effects can still ruin 
the experiment. Obviously, subjects' interest 
in the treatments will wane as they take 
more and more treatments. This may pro- 
duce motivation to find a single mode of 
responding to all treatments, a strategy that 
would be exacerbated in the group context. 
As a result, if groups are given many treat- 
ments (as in a sizable LS design), later 
treatments will often produce homogeneous 
responses and are hence uninformative. In 
many cases, it is doubtful that the added 
degrees of freedom gained in the larger LS 
could make up for this increased homogeneity 
of response, and the treatment effects could 
again be lost. 


A New Method for Reusing Subjects 


We have argued that for logistic reasons 
of recruiting and scheduling as well as for 
statistical reasons of intersubject variability, 
the reuse of subjects is especially beneficial. 
But we have also noted that the typically 
used RB and LS repeated measures designs 
are likely to be unsatisfactory. The more 
similar each treatment situation is to pre- 
vious and succeeding ones, the more the 
responses will tend to be interdependent. 
Moreover, we have argued that these de- 
pendencies should be even stronger with 
groups, since the group context offers mem- 
bers the opportunity to create some type of 
joint perspective as to the proper mode of 
orientation and behavior. 

Therefore, we would like to propose a 
different statistical approach, which takes 
advantage of the fact that groups are the 
unit of analysis. Since reuse of the same 
group in repeated measures designs often 
creates the problem of interaction effects, 
we suggest a format that will systematically 
rotate people through different groups but 
that will do so in such a way that no two 
people will ever be in the same group more 
than once. In other words, we will arrange 
to guarantee that on each trial, a subject 
will receive the experimental treatment with 
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an entirely new and uniquely constituted 
group. 

We thus make use of the fact that a 
large part of the context for each experi- 
mental treatment is a subject's group. By 
reusing subjects but not groups, we make 
certain that this important context is en- 
tirely novel on each trial. Under these con- 
ditions, it is likely that subjects will treat 
the trials as largely independent. Although 
statistically, individuals are still subject to 
sequence or carry-over effects, reconstituting 
the unit of analysis (i.e. the groups) each 
time assures that we minimize the problem 
of intertrial dependencies, particularly Time- 
Order x Treatment interaction, 

The format for our approach stems from 
Yates (1936), who developed a class of de- 
signs that he saw as one solution to the 
restrictions inherent in randomized block 
designs (ie. that one must have the same 
number of subjects per block as number of 
treatments). In such designs, each block is 
repeatedly measured but on fewer than all 
treatment levels. Yates called this procedure 
an "incomplete block" design to indicate that 
members of the block are given an incom- 
plete subset of treatment levels of the inde- 
pendent variable. 

This class of designs can be described as 
a combinatorial problem: How many ways 
can some & <¢ treatment levels be assigned 
to blocks (individuals in a repeated mea- 
sures design), and how many blocks will 
suffice. The binomial coefficient (4 choose &) 
specifies the number of blocks (5) needed 
if all possible selections of & from ¢ treat- 
ments are constructed, with the proviso that 
each treatment level is replicated 7 times 
and that any two treatments are paired to- 
gether in a given block A times. 

Combinatorial problems with these con- 
straints are often used in conjunction with 
creating balanced incomplete block (BIB) 
designs, such that the number of treatments 
remains constant across blocks. These de- 
signs, called (£, k, b, r, A) configurations, 
are formally defined as a family of b sub- 
sets of a set S consisting of £ elements, such 
that for some fixed & and A, each subset has 

Њ elements, and each pair of elements of 5 
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occurs together exactly A times. Given thes 
assumptions, the following equations m 
hold (Fisher, 1940):° 


N=rt = bk 
r(k — 1) =A(t — 1). 


(3) 
(4) 


For example, consider the set of seven 
elements to be S = {1, 2, 3, 4, 5, 6, 7}. Now 
suppose we specify the following subsets of 
the power set of S: (1, 2, b (2, 3:990] 
(3, 4, 6), (4, 5, 7), (5, 6, 
(7, 1, 3). Here 5— 7, ET k=3, and 
А = 1; any two numbers appear in the samé 
subset only once. 

The matrix summarizing these sets 
called an incidence matrix. In our example 
the incidence matrix is 


1101000 
0110100 
0011010 
0001101 
1000110 
0100011 
1010001 


The number of rows is equal to the number 
of subsets b, and the number of columns is 
equal to the number of elements ¢. The num- à 
ber of entries in the columns is &. The mm; ^ 
ber of entries in the rows is r. 

In the language of experimental desig] 
that use incomplete blocks, we have the fol- 
lowing equivalences: ¢ = treatment levels in 
the experiment? (the number of columns); 


5 This is easily proven. Since k and à are fixed, 
and £ is constant for any given experiment, r must, 
be the same for each treatment. Hence, there аге 
rt appearances of the treatments altogether. ВШ 
there are b blocks, each of which receives Ё treat: 
ments. This must also equal the total number of 
appearances. Thus bk — rt — N. Further, since ап; 
one treatment level may appear exactly Х times with 
any other of the remaining (# — 1) treatment level 
it follows that the total number of times a 
ment pair containing any one treatment will occu! | 
is А(Ё — 1). But there аге (k — 1) other treatment 
levels in any treatment’s subset that appear in eX 
actly ғ replications. This also equals the total num 
ber of times a treatment pair will occur. Hence 
r(k—1) =X(t—1). 

5$ A notational problem arises here. Originally, 
problems were called “v, k, А” designs; but шо 
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3 ); &=the number of treatment levels 
'assigned to blocks (the number of 1s in each 
row); r= the number of replications of each 
treatment level (the number of 1s in each 
column); A — the number of times any two 
treatment levels appear together in a block. 
For a design to be balanced, Equations 3 
and 4 must hold, and /, b, k, r, and A are 
integers, That this is true implies that & 
must be less than £, that is, each block must 

ecessarily be assigned less than all of the 

'treatments. It is this requirement that de- 
fines the block design as incomplete. 

We now deal with a subset of the bal- 
anced incomplete block designs and demon- 
strate this subset's utility in group experi- 
ments (Table 3), The subset can be de- 
scribed with the following constraints. First, 

hs those designs in which А = 1 are dealt 


Mos. the number of blocks (the number of 
"ws 


with. This consists of the subset of designs 
in which each treatment appears with any 
other treatment only once. Second, the de- 
signs are further restricted by assuming that 
these blocks are individuals, such that we 
essentially have an incomplete repeated mea- 
sures procedure. Note that each column 
(Table 3) contains the same number of 
(people and that each pair of people appears 
а the same column only once in the ехрегі- 
ment. If we interpret the columns as being 
* groups in addition to being treatments, then 
certainly for the example in Table 3, groups 
are being uniquely reconstituted each time, 
in accordance with our earlier desiderata. 
The following simple proposition shows that 
this will always be true when А = 1 and 
^ (within the context of v, k, А designs) will 
be true in only that case. It can be thought 
of as evidence for a type of “duality”; in this 
class of BIB designs, limiting the number of 
times that treatments appear together in the 
same block also restricts the number of times 


that people appear together. 


ut 


recently, ¢ has been used instead of v to denote the 
number of treatment levels. So we have used the 
term “v, k, А” to refer to the class of problems, 
but we use 2 instead of v to denote the number of 
treatments, since most books on experimental design 
1 use such notation. 


& 
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ТаЫе 3 
Balanced Incomplete Block Design 
Treatment 
Block 1 2 3 
1 % 0 
2 0; 0 
3 0; 0; 


Note. Each 0 is one observation from the row block. 


Proposition 


In a v, k, A configuration, à= 1 if and 
only if no two people appear together in 
the same group more than once. 

Proof. Let i and j be integers 1 thru ¢ 
inclusive, let L and m be integers 1 thru b 
inclusive, and let A be the incidence matrix. 
Assume A — 1, and suppose that two people 
appear together more than once. This implies 
that Ад = Ал) = 1 for some pair of sub- 
jects L and m and some treatments i and j. 
But A = 1 implies that Aj, = Ај = 1 for 
one and only one L, a contradiction. Now 
assume that no two people appear together 
more than once, and assume A= л > 1. This 
means that » integers Lp exist, such that 
for i=j, Aj, = Ar; = 1 for all n values 
of L. But this means that two people, for 
example Lı and Lz, see each other at least 
twice, in treatments i and j, a contradiction. 
This completes the proof. 

It is interesting that by restricting our 
attention to v, &, A designs in which A — 1, 
we can ensure the added advantage that (to 
use our interpretation) no pair of subjects 
meets more than once in the course of the 
experiment, In this way, uniqueness of each 
experimental situation can be guaranteed, 
thereby discouraging subjects from treating 
later treatments like previous ones. We call 
this class of designs group balanced incom- 
plete block (Group BIB) designs. 

In addition to the formal consideration 
shown in the proposition, there are several 
practical considerations that we find limit 
the researcher's choice of group BIB designs. 
First, if we were to let А be other than 1, 
then people could meet more than once dur- 
ing the course of the experiment. Although 
this might be acceptable in some types of 
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Table 4 
Several Group Balanced Incomplete 
Block Designs 


Design parameters 


t r b k A 
3 2 3 2 1 
4 3 6 2 1 
5 4 10 2 1 
6 5 15 2 1 


Note. Each row represents one design. 


experiments, clearly our desire to make each 
experimental group maximally different causes 
us to prefer uniquely constituted groups. 
Hence, for this study, we restrict our at- 
tention to designs in which А = 1. Second, 
observe that & must be at least 2, since other- 
wise subjects are not reused. But since A — 
1, if А is larger than 2, an inordinately large 
number of treatments are required to create 
a group BIB design (7 at a minimum and 
13 or higher after that; see Table 11.3, 
Cochran & Cox, 1957, p. 469). Further re- 
stricting our attention, then, to cases in 
which & = 2, it follows from Equations 3 
and 4 that r —/ — 1; that is, the size of 
the group will be one less than the number 
of treatments. 

Besides the practical benefit of keeping & 
set at 2, there is a possible substantive 
benefit: We are assured that any individual 
will receive the minimum possible number 
of treatments (while still being reused), thus 
minimizing the possibility of fatigue and/or 
learning effects. This is discussed more ex- 
tensively later. 

Since A and & are restricted to be con- 
stants by these considerations, it follows 
again from Equations 1 and 2 that for prac- 
tical purposes (i.e., for values of ¢ at or be- 
low 7), the entire design is specified by one 
parameter ¢. If a researcher knows the 
number of treatment levels to be given, both 
the group size 7 and the number of subjects 
needed b can be determined. Table 4 enu- 
merates some of these designs. 


Advantages of the Design 


Such subsets of incomplete block designs 
are important for several reasons. First, it 


JOHN M. LIGHT AND JERALD SCHUTTE 


can be observed that the number of subjec 
needed to produce one group observation 4 
each treatment (i.e., 5) will be considerably; 
less than the minimum number of subjects 
needed for a RB or LS design, even whe 
this incomplete design is reconstituted more 
than once to achieve multiple observations, 
per treatment level. 

Second, if the group measure is taken on 
the individual and aggregated, our design 
allows a row effect (i.e., subject effect) to bef 
computed and used to adjust the (еа те 
mean as in any incomplete block design, 
That is, 


Xy= pt Bt rte, (5) 
where the estimate of В; is used to adjust 
the estimate of ту. This means we can ac- 
count for the effects of an individual being! 
involved in a repeated measure and use this 
information to help assess the treatmenté 
effects. 

More importantly however, when the 
measure is at the group level, this procedure 
minimizes the problem inherent in assuming 
intertrial independence. Since subjects take 
more than one treatment, such independence 
will not hold in general. However, as pre- 
vious arguments suggest, the group BIB 
design is explicitly arranged to minimizé | 
violation of this assumption under a use- 
fully wide set of circumstances. Thus we 
think of the statistical analysis of a group 
BIB design as reducing to a one-way analy- 
sis of variance (anova) in which each level 
of the treatment variable has one (group) 
Observation, Each such design, conducted 
with new subjects, constitutes an additional 
observation per treatment level. Use of ап, 
ordinary one-way Anova on such data is 
justified because each group (ie. unit of 
analysis) is independent and uniquely con- 
stituted, and each individual appears in the 
minimum possible repetitions (2). 

As we suggested earlier, the fact that sub- 
jects have not seen each other before should $ 
create novel contexts that discourage use of 
information from previous trials. Further, no 
subject will ever have more than one such 
previous trial. Naturally, in experiments in 
which learning is an important variable (¢.8 4 


ў 


N 


» 
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in studies of group problem solving in which 
~problems are similar and in which time might 
be the dependent variable), we would not 
expect this kind of argument to hold. Gen- 
erally, when the treatments are more the 
focus than the group context, there are likely 
to be carry-over effects. If the group context 
is the focal point for the subjects, however 
(with treatments more in the background, 
such as conditions for negotiations, etc.), we 
would expect group interaction to be less 
affected, and the approach presented here 
may be helpful. 

These advantages provide intuitive and 
partial statistical grounds for assuming that 
such observations are independent. This 
eliminates much of the need to worry about 
time-order and various interaction effects, 
as in RB and LS designs. 


Conclusion 


We have discussed the disadvantages of 
doing group-type experiments by the usual 
methods—completely randomized factorial 
designs—as compared with attempting to re- 
use subjects in randomized block repeated 
measures or Latin square designs with time 
order as one of the nuisance variables. We 
believe our approach offers a good com- 
promise between these techniques. By allow- 
ing subjects to be reused only in certain 
restricted ways, we have created conditions 
in which the usual problems of repeated 
measures designs should be minimized. At 
the same time, the researcher avoids the 
(often prohibitive) costs associated with re- 
cruiting and scheduling enough subjects to 
fill a completely randomized design. 

The solution we have suggested is a 
modest one. On the other hand, it is im- 
portant to see that its weaknesses differ from 
the weaknesses of other kinds of repeated 
measures designs; thus the group balanced 
incomplete block design should be thought 
of as another tool in the researcher's tool- 
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bag, one which will be useful only some of 
the time, We believe that when it is appli- 
cable, though, it may simplify experimental 
procedures significantly, enough to warrant 
being considered for use in any experiments 
on groups. 
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Comparison of Sequences 


Lawrence J. Hubert 
School of Education, University of California, Santa Barbara 


The problem of comparing K numerical sequences, each defined over the same 
n objects, is approached through the use of a K-variate scoring function. Be- 
sides encompassing some old work of Spearman, the sequence-comparison par- 
adigm can be used to discuss a number of separate data-analysis procedures 
of particular interest to the behavioral sciences, for example, nominal scale 
response agreement among multiple raters, Friedman's test, Kendall's coeffi- 
cient of concordance, Page's test, and so on. The major intent of this article 
is pedagogical, and it emphasizes a general conceptual framework that con- 
veniently organizes a number of well-known statistical strategies. 


It should be apparent to every beginning 
student of statistics that many, if not most, 
data-analysis strategies are based on sums of 
squared differences. As is well-known, sums 
of this type appear in the definition of a 
variance, in least-squares regression, in chi- 
square goodness-of-fit tests, and so on. Upon 
reflection, however, a psychologist with even 
a superficial knowledge of test theory might 
consider this reliance on squared differences 
misplaced. As Spearman (1906) noted some 
70 years ago, extreme scores are expected to 
contain relatively large amounts of measure- 
ment error; consequently, statistical pro- 
cedures that accentuate these extremes by an 
additional process of squaring could easily 
compound the difficulties inherent in making 
inferences from fallible data. This last observa- 
tion, along with some outdated problems of 
computational tractability, originally lead 
Spearman to consider an alternative measure 
of correlation, now called Spearman's footrule, 
based on a sum of absolute differences between 
two numerical sequences. In fact, Spearman 
proposed the footrule as a direct competitor 
to his more well-known rank correlation 
coefficient based on an analogous sum of 
squared differences. 


Partial support of this research was supplied by 
National Science Foundation Grant SOC-77-28227. 

Requests for reprints should be sent to Lawrence J. 
Hubert, Graduate School of Education, University of 
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Although Spearman's contributions are now 
important primarily for historical reasons, 
this early work still represents a significant 
effort to develop alternative norms for the 
comparison of two numerical sequences. In 
particular, the present discussion is concerned 
with the more general problem of measuring 
the correspondence among K numerical se- 
quences using an arbitrary scoring function, 
but I rely on the two simple norms presented 
by Spearman as an introductory example. 
Besides suggesting several interesting theoret- 
ical generalizations of Spearman's ideas, these 
extensions to multiple sequences lead to several 
new insights into the superficially different 
problem of measuring nominal scale response 
agreement among multiple raters (cf. Cohen, 
1968; Fleiss, Cohen, & Everitt, 1969; Hubert, 
1977). In addition, the same general structure 
is relevant to the problem of analyzing m 
rankings, for example, Friedman's two-way 
analysis of variance by ranks (Friedman, 1937), 
Kendall's coefficient of concordance (Kendall, 


1970), and Page's L test (Page, 1963). It should | 


be noted at the outset that this article's 
major contribution is pedagogical, and I do 
not intend to advocate any new data-analysis 
procedures. Inasmuch as all of the techniques 
mentioned here have been discussed in detail 
in the literature, there is no need to review the 
specifics of these individual data-analysis 
strategies. Apparently, the ubiquitous nature 
of the sequence-comparison problem has not 
been recognized in the methodological litera- 
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COMPARISON OF SEQUENCES 


ал explicit discussion of the general paradigm 


1 ture of psychology, and for this reason alone, 
| may be of value to develop in greater depth. 


Background 


To introduce some terminology, suppose 
that (xy ..., жа) and (уз, ..., Yn) denote two 
numerical sequences in which the correspond- 
ing elements x; and y; are matched in some 
manner, Clearly, this framework is the natural 
context for developing a measure of association 
based on the m bivariate pairs (xs, ул), ..., 
(кл, Yn). When rank correlations are desired, 
an initial transformation of the xs and the ys 
to ranks may also be imposed, but for our 
purposes this reduction is unnecessary. At 
least for now, the original sequences of the 
xs and the ys can be retained. 

Instead of proceeding directly to calculate 
a traditional Pearson product-moment correla- 
tion coefficient, suppose that a more general 
measure of proximity between the two se- 
quences is of initial interest. In particular, 
let f(-, -) be a bivariate function and define 
a summary measure of proximity between the 


4 


© sequences (xi ..., x») and (уһ ..., Уһ) by 
the index Г: 
T — f(x, y). 
/ = 


Although the function f(-, +) is arbitrary, 
two special cases are of obvious importance. 
If f(«, y) = |x — y|, then Г forms the basis 
for Spearman's footrule; if f(x, y) = (x — у), 
the basis for Spearman's more common ran 
correlation statistic is obtained. As a conven- 
. tion, P, will refer to an index defined by 
| f(x, y) = |x — y|%; thus, the footrule corre- 
sponds to the use of Га and Spearman’s more 
common statistic corresponds to the use of Гз. 

Although various normalizations of Г may 
be desirable to transform a raw index into a 
suitably restricted measure of association, 
examination of these normalizations can be 
delayed until later and then discussed in 
greater generality. As it stands, the raw index 
Г is sufficient for purposes of hypothesis 
testing in the permutation context; for 
example, the tables given in Kendall (1970) 
for the better known Spearman index of rank 
correlation are given in terms of T'». To carry 
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out such a test, an index of the form specified 
by Г is evaluated under the notion of indepen- 
dence, in which all permutations of the ys 
against the fixed sequence of xs are considered 
equally likely (or equivalently, fixing the xs 
and permuting the ys). Thus, the exact null 
distribution of Г can be obtained by evaluat- 
ing the index over all n! permutations of the 
ys and tabulating the resulting frequency 
distribution. If the index T' is sufficiently 
extreme when compared to this distribution, 
the hypothesis of independence can be rejected. 

Unless the xs and the ys are untied within 
their respective sets and a transformation to 
ranks (or to some other canonical form) is 
imposed, the permutation distribution has 
to be recalculated for each different applica- 
tion. Since this is a very extensive computa- 
tional burden, large-sample (normal) approx- 
imations are desirable. As it turns out, these 
approximations are fairly easy to obtain 
through general formulas for the mean and 
variance of Г, which can then be specialized 
for particular defining functions f(-, +). 


T as a Bilinear Permutation Statistic 


The index Г has the form of a bilinear 
permutation statistic (see Puri & Sen, 1971), 
and thus, assuming all permutations of the 
ys are equally likely, 


ET) = 0/9 È È fes y); 
var (Г) = (1/[n(n — 50 / А 
= (А; + Аз) + пад, 
where 


Ај = [X > f(x y) F; 


ful im 


A= SLE G2 


j=l i=l 


Аз = ў cS Swi, y) P; 


í=] је 
4 = У LX fli y). 
j=l i=l 


Specializing these formulae for the case in 
which the sequences х1, ..., x, and уз, ..., Yn 
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represent the untied ranks of 1, 2, ..., т in 
some order, we have 


PD 
E(T) = (и — 2/3; 


var (T3) = (n + 1) (22 + 7)/45, 


T»: 
E(T;) = n(n? — 1)/6; 


var (Го) = [1/(m — 1) ]L(n* — »)/6}. 


The mean and variance of I’; for untied ranks 
are available in Spearman (1906) and those 
for T; in Kendall (1970). Finally, under very 
mild conditions on the values assigned by the 
function f(-, -), T can be shown asymptotically 
normal as n > ©. For a discussion of several 
possible sufficient conditions, the reader is 
referred to Puri and Sen (1971, p. 72) and 
Hoeffding (1951). 


Correlations Among Indices 


In addition to information regarding the 
single index Г that can be obtained as shown 
above, it is relatively straightforward to 
calculate the correlation between two such 
indices, say Г and I’, based on two different 
bivariate functions f(-, -) and f’(-, -). First, 
the sum of f(-, -) and f’(-, -) is considered 
as a new bivariate function, and the variance 
of this index is obtained. The covariance for 
T and I" is then isolated by subtracting off 
the variances for Г and T" and dividing by 2. 
Carrying out this process leads to the following 
general expression for the covariance between 
Г and I” over the same n! permutations: 


cov (T, I") = (1/[n(n — 1) }}L(1/m) By 
— (В, + Bs) + By], 


where 
Bie CEE fes s) E X Io 901; 
в, = ELE fes y) Gs y); 
в, = ELE fes WE б 998; 


В, = Y D SF (xi, уд Gs V). 


jel i= 


Finally, normalizing by the square root of the 
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variances for Г and T", the Pearson correlation 
between T and I" can be derived. As am 
example for the special čase of Г = Г; and 
Г’ = Г, and where untied ranks are used fot 
the xs and ys, this procedure leads to 
following simplification : 


3 n-ri 
p(T, Ts) = EL і 


Surprisingly, this correlation is bounded 
away from 1 for large n and approaches 
3/V10 as n — х. This situation contrasts with 
an asymptotic correlation of 1.00 between 
Spearman's rank correlation index based om 
T and Kendall's tau statistic (Kendall, 1970 

Although the details of index correlation will; 
not be developed in any further detail, it is 
significant to note that such correlations are 
easily obtained once the variance of a genera 
statistic, such as Г, is derived (or the later 
extensions of to multiple sequences). Thus; 
when a researcher is faced with alternative 
choices for an index, it may be of practical 
interest to know how highly the various 
choices intercorrelate under independence and 
for various values of п or, more specifically, 
to know if the intercorrelations are less than 
unity even when » is assumed infinite. 


Other Applications of T 


Besides the Spearman indices, the measure? 
T also includes several other statistics of 
interest to psychology that have been devel- 
oped in comparative isolation. For instance, 
as discussed by Hubert (1978), the index T 


+++, Xn represents the labels for the R 
categories used by Rater 1 to classify ^ 
objects, and y, yo, ..., y, represents the 
labels used by Rater 2 to classify the same || 
n objects into C categories. Typically, R = C 
and the number of objects placed in the same 
category by the two raters is of interest. Mort 
generally, we can define 


f, yi) = Wuv, 


if x; is placed in the category labeled и БУ 
Rater 1, 1 X и < R, and in the catego! 
labeled v by Rater 2, 1 < » < C. The index 


, is then a raw index of weighted nominal scale 
7" жезропзе agreement that can be subjected to 
various normalizations to provide an appro- 
priately restricted final index. As with the 
Spearman statistics, however, the raw index Г 
| can be considered by itself for the purposes of 
hypothesis testing. It should be apparent 
that an appropriate choice of weights (e.g., 
Wus = |u — v|*) leads us directly back to 
the index Га. Also, for Cohen's problem of 
nominal scale response agreement, the effect 
of different choices for the weights can be 
partially assessed by the general covariance 
formula between Г and T” given earlier. 


Multiple Sequences 


The obvious extension of the index T to three 
sequences Xj, ...) nj Ур «ry Yn} By «+ +) Ба 
relies on a trivariate function g(x;, у, 21). As 
a notation, suppose the index associated with 
this function is given by As: 


= 2 gno yo 2), (1) 
and assume that all n! reorderings of the ys 
and all л! reorderings of the zs are equally 
likely under a hypothesis of independence. 
By carrying out the usual moment calculations 
] (cf. Hubert, 1979), 


a РЭД gx Yn 2001; 


E(As) = 
V(As) = eem. У (к ys 2T 
"i ARIS. > Exo, yj, 20)? 

| - secre», Y 


m. LU gs Yn a) F 

d RS go Yi 2) P) 

F SEC па — ThE LE £6 У» 2)? 
TED gs Yip 2) P 

ps rr Gs, Yi 2) T) 
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2 
EL > Elti ys 2) P 
5 
1 
tx ÉL gx, у» 21)" 
m 


— MEE eles y з) 
i jk 
zt PP» g(xo уь 2) 


UF YU gs y Be) T) 


T lower order terms. 


(2) 


The general form for these moment expres- 
sions for K sequences follow in a similar 
manner. If x1”, XO, for 1<r<K, 
denote the K ee esl if the index 
Ax is defined using a K-variate function 
hin), 28,02), ..., tig O07] then 


Ак 
=D Мао, 2,0, +x]; 
Е(Ак) 
g = QE „Ма, ачк]; 
уаг (Ак) 
: 2 
x е, У An, 09] 
атик 
MT У про... OT 
z ine iK 
1 K- 
ЕЕ 
XC E hoa 0,. у y? 
inserir 
=> Y Кент акр 
по ве 
а У) она у. eg OT)? 
iK des Kl 
TX Y. LUE PEST 
fd de 
eeu Es ( E 
§K-1,9K ће ик-а 
На, «tig 00р 
I E UMS (—1)к 
X E (Aent, о ји) 
бек 
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(5N, У Мх)... 
fies iK 


1 
ak, 


n 
1 
Set XE Hx. unu OY 
n Hite aK 


++ 253 hlz, 0. . sse ЈУ) 


iK due 


E врату 
н.к 


+ lower order terms. 


Contingency Table Applications 


As one particular application of the index 
Ar, suppose a threefold contingency table is 
given having R rows, S columns, and T 
layers, and let пи denote the number of 
observations in row 7, column s, and layer /. 
If w, defines a fixed weight attached to the 
corresponding cell, then the raw index A; 
can be written as 

As = У Wein, 
rust 
where the function g(x;, у, 2+) in Equation 1 
is defined as wn if x; belongs to row у, у; 
belongs to column s, and 2, belongs to layer 1. 
The moment formulas reduce in a similar 
manner ; for instance, 


1 
E(As) = m У пат Пат; 
ти 


and using the approximate variance that 
ignores lower order terms, 


2 
var (As) = ж У прићи Пећи) 
rat 
1 
+ a r Wea Meo Mea Meat) 


1 
nd ai Mr Онатоп)? 
r га 


TX naQ пате Met)? 


ат > по. Wratft, Па.) ]. 


If we let pr. = п.../т, р.а. = n../n, and 
put n.a/n, then 


Е(Аз) = п X wahre 
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and 


* 


var (As) ~ n[2( E шр Dp. t) 
ai Ze Prube pep-i 
-2 Ре умрла)“ 
-2 > Wer.) 


—X pa пир p). 
t re 


x 
When R = 5 = Т, and three raters are 
assumed to define the dimensions of the 
three-way contingency table, a number of raw 
indices of rater agreement can be obtained 
by varying the definition of wr. As indicated 
by David and Barton (1962) and developed 
in the rater context by Hubert (1977), many 
different concepts of agreement are possible. 
I shall list several of these definitions later 
but give the specializations of the moments 
for only the first three since these appear to 
be the most natural. The reader is referred 
to David and Barton (1962) for a discussion 
of asymptotic normality for these various 
alternatives when the weights are restricted 
to be dichotomous (i.e., 0 — 1) or when they 
arise as sums of dichotomous weights. 


1. DeMoivre: An agreement occurs if and 
only if all raters place an object in the same 
category. Thus, letting ша = 1 if 7 =5 =! 
and 0 otherwise, we obtain 


E(As) = 2? Pre Per Poor} 
var (Аз) ~ "o Prep)? 
+ > Pre Per-Por(L — pep 
= pres — ре.) 


Alternatively, if we let a,, denote a weighting 
between Raters 1 and 2, b, a weighting 


1 It should be noted that these expressions correspond | 
to those given in Hubert (1977) except for а typo- | 
graphical error in the first definition due to DeMoivre- 
Here, the sums over & should have been products over 
k; also, the last such product should be over k’ zh 
(this last change also points out a misprint in David 
and Barton, 1962). The correct formulas are given in 
the text of the present article. 


A— 
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between Raters 2 and 3, and с, a weighting 
between Raters 1 and 3, w,, = minimum 
(are, фи, Си). In the specific cases we are 
considering here, а, = 1 if r — s and 0 
otherwise, ба = 1 if r = t and 0 otherwise, 
and c, = 1 if s — and 0 otherwise. This 
notation will be used below as well. 

2. Target: If the first rater is considered a 
target, then an agreement occurs if and only 
if another rater places an object in the same 
category as the first. More explicitly, let 
Wrst = а + bri, then 


Е(Аз) = п 2 (ага + br) prp 
= "D Bebe b) 

var (Аз) = n(Q[ L (а + bet) Pr De 
mur 2 (ага + Det) Pro. pap 
a > PY (а + byt) Pua р.а 
рУ (Or ope pd 
= > РУ (а. + bo) prb] 
= "DL Pepe — Por — Pre) 

“iF x фер — por — be) 

t pebr) Е t» br). 


3. Pairwise: An agreement occurs if and 
only if two raters categorize an object con- 
sistently. Thus, if we let Wrst = Grs + би + Cat, 
then 


E(As) = ne Prefer + x Pep 
rs > РАДОН 
var (A3) ~ "to Ферт) + c Pp? 
= 5 > Pob) 
oe > фер (1 — Pore — Pr) 
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HÈ р,..р...(1 — pre — р...) 
+ Ура р 1 — р. — pur) ]. 


Although these three interpretations given 
for Wrs are probably the most obvious, several 
other possibilities exist : 


4. If w, = 1 when at least one nontarget 
rater matches the target and 0 otherwise, 
then wrs: = maximum (ar, Ён). 

5. If w, = 1 when at least one pair of 
raters match and 0 otherwise, then w, = 
maximum (ars, bre, Си). 

6. If w, denotes the number of matches 
for pairs of raters with consecutive labels, 
then Wrst = Ors + Си. 

7. If 0, = 1 when a majority of raters 
match and 0 otherwise, then 10, = 1 if 
а + bre + си > 2 and 0 otherwise. 

Although I shall not pursue these latter 
definitions or extensions to more than three 
raters (or in general, a G-fold contingency 
table for G greater than three), the approach 
follows exactly that given above and can be 
carried out by the reader. 


Measuring Concordance in K Rankings 


One of the standard nonparametric data- 
analysis problems is discussed under the 
title of “Friedman’s Test" ог ‘Kendall’s 
Coefficient of Concordance.” Here, K judges 
assign numerical values to л objects (e.g., 
ranks) and our interest is in (a) testing 
whether the т objects can be considered 
equally preferable and (b) measuring the 
degree of concordance among the K judges. 
In our context, the traditional approach to 
both of these problems relies on a particular 
K-variate function [4 0), ..., Xix] of the 


form 
Mss... ag] = У Mn, a 90], 
k«k^ 
where 


Mag, tip t] = [5,9 — x; 0627. 
Thus, the raw index A, can be written as 


Ax = XX, (s — rt), 
+ k&k' 


1104 


When viewed in a К by ж analysis-of- 
variance format, A; is merely К times the sum 
of squares within the n columns. Furthermore, 
if ranks are used rather than the original 
observations, implying that all row sums are 
equal, Ax is simply K times the sum of squares 
for interaction. Thus, since Ax/K and the 
sum of squares between columns must sum to 
the constant total sum of squares, a test 
statistic and a final normalized index could 
be defined equivalently using either of the 
former two quantities. Friedman's statistic 
and Kendall’s coefficient of concordance W 
are defined in terms of the sum of squares 
between columns, but, as we will see later, 
W could be obtained just as well using Ax/K. 

Viewed another way based on the Spearman 
norm I's, denoted here by T for the kth 
and k'th sequences, 

Ak = УГ, 
keck^ 
and using the general formulas given pre- 
viously, 
Е(Ак) = У EC ie); 


к< 


уаг (Ак) = Ea). 


This result is also reflected in the fact that the 
Г indices are independent in pairs (cf. David 
& Barton, 1962, p. 218), and suggests that 
very simple mean and variance formulas 
result when the function (x; 0), ..., Xip?) 
is additive with respect to the appropriate bi- 
variate functions. Thus, similar results would 
hold for Tx» defined using Гл or any other bi- 
variate function. Moreover, it is relatively 
simple to propose alternative nonadditive func- 
tions that measure variability within a column 
of the K by n table (e.g., the range) that would 
lead to alternative raw indices of concordance. 
It should be apparent at this point that the 
problem of nominal scale response agreement 
among K raters can be rephrased in a general- 
ized Friedman context involving K sequences 
of n observations. Instead of using ranks, 
however, category labels are attached to each 
of the » objects by each rater. 


Testing for an a Priori Order in. K Rankings 


Instead of a general hypothesis test of the 
Friedman type, we can also define a test 


LAWRENCE J. HUBERT 


procedure sensitive to a particular a priori 
ordering of the » objects. Such an extension 
can be viewed as an analogue of the target-rater 
agreement problem mentioned earlier. If ап 
a priori set of weights is given by the first 
sequence 210), ..., x"), then an appropriate! 
test statistic can be given in the form used 
by Page (1963) and by Pirie and Hollander 
(1972) based on the function 


K 
hx, Py 2. су tig 0] = a4, 0 E yy. 
k=? 
Thus, the index 


n K 
Ak 7 У 0 У ai, 
k=2 


i=l 


can be interpreted as a weighted sum of column 
totals, where each total is defined over the 
last K — 1 sequences. Typically, the weights 
are integers from 1 to m, and the values 
x; for k> 2 are ranks (Page, 1963) ог! 
normal scores (Pirie & Hollander, 1972). The 
mean and variance formulas follow directly 
from the previous expressions, or alternatively, 
since Ax can be defined by the sum of the 
K — 1 independent terms 


n n 
Ужа у, ..., У rD, 


i=l i=l 


the mean and variance can be obtained by the 
usual moment formulas for linear combinations 
of random variables. Obviously, the formulas 
for the mean and variance of each individual 
term can be found by the relatively simple 
expressions used in the bivariate function 
examples based on two sequences. 


Significance Testing 


, Although some proofs of asymptotic normal- 
ity exists for various special cases of the in- | 
dex Ax (cf. David & Barton, 1962; Hoeffding, | 
1951), as well as proofs for the convergence | 
to a chi-square random variable for related | 
statistics such as Friedman's (see Lehmann, | 
1975), all these approximations are of varying | 
adequacy depending on the size of K, n, the || 
scoring function, and the patterning of entries 
in the various sequences. In some cases, such 
as the Page test just discussed, the normal 
approximation is easy to demonstrate and i$ 
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Л probably very good if K is reasonably large. 
Та this latter case, the distribution of the 
index is generated by a sum of independent 
random variables, and each individual term is 
itself asymptotically normal under mild reg- 
ularity conditions. Most ideally, however, any 
cbserved index would be so extreme that a 
simple Chebyshev inequality in conjunction 
with the available moments would be sufficient 
to guarantee significance of the index at some 
adequate level. Alternatively, approximate 
permutation tests, such as those discussed by 
Hope (1968), Cliff and Ord (1973), Edgington 
(1969), and others, could be used. Given the 
continuing reduction in computer costs, this 
latter alternative of sampling from the 
complete distribution may be the most 
appropriate to follow in the years to come (see 
Hubert, 1979, for an illustration of how an 
approximate permutation test could be carried 
out). 


х 


) 


pz rewritten as max Ax — Ax. Here, 


1 


Indices 


Although the raw index Ax is sufficient for 
hypothesis testing, researchers will typically 
desire some normalized version of Ax as а 
final measure of concordance or agreement. 
Many different normalizations are possible, 
but several forms appear continually in the 

\ literature. For instance, if it is assumed that 
Ax is nonnegative and that large values of 
Ag denote greater degrees of concordance 
(where this term is being used generically), 
two general expressions can be given as 


Ax/maximum Ax; (3) 
[Ax — E(Ax)]/fmaximum Ax — E(Ax)]. (3а) 


If the “keying” of Ax were in the opposite 
direction and were denoted by A’x, these two 
indices would be given as 


A'K 


_ —_——— $ 4 
1 — maximum A'x ' @) 
A'K 
aa 4. 
1- EU) (4a) 


The indices in Expressions 3 and 4 and in 
Expressions 4 and 6 are analogues, and to 
transform one expression to the other, the 


raw index A’x in Expressions 5 and 6 is 
Ax is the 
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raw index used in Expressions 3 and 4 and 
we note that since Ax is nonnegative, maximum 
А'к = maximum Ак. 

The indices in Expressions 3 and 4 lie 
between 0 and 1; those in Expressions 3a 
and 4a are bounded above by 1 but may take 
on negative values as well. An example of 
Expression 4 would be Kendall’s coefficient 
of concordance W based on the index Ax 
defined as K times the sum of squares for 
interaction. The form given by Expression 3a 
is Cohen’s (1968) general expression for his 
index of nominal scale response agreement 
kappa. Finally, the measure in Expression 
4a provides the basis for one version of 
Spearman’s rank order correlation coefficient 
based on T, as well as for the general degree-1 
statistic introduced by Hildebrand, Laing, and 
Rosenthal (1977) in a related nominal scale 
response-agreement context, using arbitrary 
weights on the cells of a contingency table.’ 

Although a host of different normalized 
indices are possible (cf. Hubert & Levin, 
1976), the forms given by Expressions 3a and 
4a appear to be of continuing importance in 
the behavioral sciences because they are 
“corrected for chance.” It is important to 
remember, however, that any normalization 
is more or less arbitrary, and hypothesis 
testing can be carried out using only the raw 
index Ax. 


Discussion 


There are a number of directions in which 
the preceding material can be extended. For 
example, instead of permuting the entries 
within the rows of the K by п table, suppose 
the elements can be rearranged throughout 
the table. This inference model has been 
discussed in the context of nominal scale 
response agreement (Hubert, in press), and 
it has been pointed out that when K is 2, the 
index D; provides an unnormalized intraclass 
correlation. 

In addition to alternative inference models, 
other applications of the sequence-comparison 


1In many cases it is traditional to define maximum 
Ax or maximum A'x from some ideal case, for example, 
untied ranks, and to treat this quantity as a constant 
irrespective of the configuration of ties in the K by 
n table. 
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notion could be developed. For instance, 
Cochran's Q statistic and McNemar's test for 
correlated proportions are really special cases 
of Friedman's test (see Lehmann, 1975), and 
thus, these techniques could be rephrased 
within the K sequence-comparison framework. 
Also, a variant on the type of degree-1 con- 
firmatory analysis developed by Hildebrand, 
Laing, and Rosenthal (1977) for a bivariate 
table could be extended to a K-way con- 
tingency table by using the sequence-compar- 
ison interpretation relevant to nominal scale 
response agreement. 

What is important in all of these applications 
or extensions is the recognition of problem 
commonality. The field of nonparametric 
statistics is very broad and diversified, and 
consequently, general organizing principles 
may be of immense pedagogical help to a 
student attempting to organize the field into 
à coherent cognitive structure. 
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Choosing Between Predictable and Unpredictable 
Shock Conditions: Data and Theory 
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This article reviews the literature on predictability and describes the factors 
that affect choice. Particular emphasis is given to the reliability of basic find- 
ings, including replications and failures to replicate. Behavioral measures re- 
lated to choice are also reviewed, and some physiological correlates of predict- 
able and unpredictable shock are noted. The data allow several firm conclusions 
to be drawn regarding preference, notably that (a) rats (albino, hooded, male, 
female) prefer predictable shock conditions; (b) they prefer predictable con- 
ditions whether shock is avoidable, escapable, or inescapable, and whether it 
is scrambled or unscrambled grid shock; (c) this preference occurs with dif- 
ferent procedures, apparatus, and shock delivery systems, such as water elec- 
trodes or electrodes attached to the tail, back, ears, or pubis bone; (d) fish 
and birds also prefer the signaled condition; and (e) although the preference 
is robust, it is affected by shock intensity, signal duration, intershock intervals, 
amount of training, and the dependability of shock-free periods. Other factors 
that may affect preference are also noted. Finally, the theoretical views of 
conditioned reinforcement, of information, of preparation, and of safety are 
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evaluated, and their strengths and weaknesses are described. 


Research over the last two decades reveals 
clearly that preference for schedules of both 
* positive reinforcers (e.g., food) and negative 
reinforcers (e.g., shock) is governed by more 
than reinforcer-related variables such as rate, 
magnitude, and duration. Subjects confronted 
with a choice between a predictable versus an 
unpredictable shock or food condition most 
often choose the predictable one, even though 
all other factors are constant.' This result is 
perhaps not surprising, since predictability 
may afford the subject the opportunity to 
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prepare for the reinforcer in a way that mini- 
mizes its aversiveness or maximizes its at- 
tractiveness. We argue, however, that despite 
its intuitive appeal, this "preparation" hy- 
pothesis does not adequately account for cur- 
rent data. We also examine the strengths and 
weaknesses of alternative explanations. 

In our discussion, we focus on predictable 
and unpredictable shock conditions with non- 
human subjects. We first review studies in 
which animals are given a choice between 


1 There are times in the appetitive situation when 
subjects choose an unpredictable condition over a 
predictable one (e.g. Herrnstein, 1964). In Herrn- 
Stein's study and in others, animals consistently 
preferred a variable interval schedule for food over 
a fixed interval schedule for food of equal value, 
Such findings suggest that predictability based on 
periodicity and predictability based on signaling 
may have different properties or that aperiodicity 
may have effects other than those produced by 
lack of signaling. 
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predictable and unpredictable shock; we dis- 
cuss the basic phenomenon and some factors 
that affect the strength of preference. Then 
we review some of the issues concerning 
basic findings, research methods, replications, 
and failures to replicate. In a separate sec- 
tion, we note some behavioral and physio- 
logical correlates of predictable and unpre- 
dictable shock. Finally, we assess the dif- 
ferent theoretical interpretations that have 
appeared in the literature on choice and 
predictability. Specifically, we assess the 
preparation hypothesis, the discriminative 
stimulus (CS) or safety hypothesis, the in- 
formation hypothesis, and the conditioned 
reinforcement hypothesis. 


Initial Data and Theory 


Earlier theorizing about the factors con- 
trolling behavior in aversive situations origi- 
nated in studies focusing on the properties 
acquired by stimuli preceding shock. One 
commonly held view is that the pairing of 
previously neutral stimuli with an aversive 
stimulus, such as shock, results in conditioned 
aversiveness (e.g., Mowrer, 1947; Schoen- 
feld, 1950). This view, which appeals to a 
conditioned reinforcement process, has played 
a particularly important role in theoretical 
accounts of avoidance, escape, and punish- 
ment behavior. With respect to choosing be- 
tween signaled and unsignaled shock, this 
view predicts that schedules of unsignaled 
shock will be less aversive than (preferred 
over) schedules of signaled shock because the 
latter include the aversive properties of shock 
plus those of the signal. 

Coppock (1954) reported one of the first 
studies related to preference and to the prop- 
erties acquired by stimuli preceding aversive 
events. During training, rats were exposed 
to either signaled (cue preceded shock) or 
unsignaled (cue occurred during shock) tail 
shock while in a restraining apparatus, 
Shocks were omitted during a testing phase, 
but by turning its head to one side, the sub- 
ject could produce the cue previously asso- 
ciated with shock. The analysis of head 
movement toward the signal suggested to 

Coppock that stimuli preceding shock ac- 
quire greater control over responding than 


P. BADIA, J. HARSH, AND B. ABBOTT 


do stimuli occurring during shock. Accordi 
to Coppock, these findings were unexpecte 
since the shock signal should have acquired 
conditioned aversive properties. Coppock 
concluded that contrary to conditioned re- 
inforcement theory, stimuli preceding shock 
can acquire positive reinforcing properties 
but he gave no rationale for how this ос 
curred. 
^A few years later, Knapp, Kause, and 
Perkins (1959) gave rats a choice between 
immediate and delayed shock (Experiment! 
1), using a T maze. Their subjects preferred 
the immediate condition. Knapp et al. argue 
that their findings could not be accounted 
for by existing conditioned-reinforcement 
theory, which in this case would predict that 
painful stimuli immediately following a re- 
sponse should become more aversive than 
the same stimuli when delayed. In Experi- 
ment 2, they gave subjects a choice between 
delayed shock preceded by a signal and де 
layed shock followed by a signal. The sub- 
jects preferred the signal-shock condition, 
Arguing again that conditioned reinforcement 
theory was inadequate to explain their find- 
ings, Knapp et al. suggested the preparatory- 
response hypothesis developed earlier by 
Perkins (1955). In brief, this hypothesis 
states that signals preceding appetitive ог 
aversive events allow the organism to prepare 
for the receipt of stimulation. Such prepara- 
tion allegedly maximizes the reinforcing 
properties of appetitive events or minimizes 
the aversive properties of painful events. 

About a year later, Mowrer (1960) de- 
scribed two studies conducted by Mohammed 
Akhtar. In both studies, signaled or unsig- 
naled shocks were delivered to a grid floor 
at irregular intervals. If the subjects faced 
in one direction, they received signaled 
Shocks; if they faced in the other direction 
shocks were unsignaled. Shocks were avoid- 
able or escapable. Four of five rats in the 
first experiment preferred the signaled avoid- 
able shock, and all four subjects in the 
second experiment preferred the signaled 
shock condition. Mowrer’s interpretation 0 
these results anticipated much of the theor 
izing that later developed and that is no 
referred to as the safety hypothesis. In det% 
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| scribing the characteristics of the signal con- 


| 


| 


dition, Mowrer stated that a subject “could 
sharply discriminate between brief periods 
when it was in real danger (ie. when the 
tone was on) and the rest of the time when 
the rat was perfectly safe and could well 
afford to ‘relax’” (p. 194). In answer to 
why the rats might seek out a warning signal 
preceding aversive shock, he stated: “They 
did not seek the warning signal as such; in- 
stead, they sought the situation in which the 
warning signal occurred, because . . . they 
experienced less total fear here than in the 
no-signal situation" (p. 196). Similar inter- 
pretations were subsequently offered by others 
(Badia, Culbertson, & Lewis, 1971; Denny, 
1971; Lockard, 1963; Seligman, 1968). The 
development of various theoretical positions 
is discussed in detail later. 

A theoretical view stressing the importance 
of information emerged at about the same 
time that Mowrer's views were made known 
(Berlyne, 1960). In brief, Berlyne suggested 
that uncertainty about the occurrence of bio- 
logically significant events creates a state of 
conflict and that stimuli that reduce this con- 
flict (ie., provide information) are reward- 
ing. This “uncertainty reduction” view clearly 
predicts that preshock stimuli should be 
sought out for their informational value. 
Therefore, the data that were compatible with 
the views of Coppock (1954), Mowrer (1960), 
and Perkins (1955) were also compatible 
with those of Berlyne (1960). 

The stage was set for assessing how well 
the various hypotheses of conditioned rein- 
forcement, preparation, uncertainty, and 
safety could account for the accumulating 
data on preference for predictable versus un- 
predictable shock. Inevitably, subsequent tests 
involved different procedures and different 
parameter values, raising both methodological 
and empirical issues. Some of these remain 
unresolved, and heated arguments have ap- 
peared in the literature (Badia & Harsh, 
1977a, 1977b; Biederman & Furedy, 1976b; 
Harsh, 1978). ` 

What does the literature reveal regarding 
the phenomenon of choice? Is the phenome- 
„поп reliable? Is it robust? What are the 
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boundary conditions? What theoretical inter- 
pretations are most favored? 


Literature Review 
How Reliable Is the Phenomenon? 


Since the early studies of Coppock (1954), 
Knapp et al. (1959), and Akhtar (in Mowrer, 
1960), numerous studies have confirmed the 
preference for signaled over unsignaled shock 
schedules. A reliable and robust preference 
for signaled shock has appeared with a variety 
of choice procedures, shock delivery methods, 
and species. 

Some of the earliest research specifically 
directed toward the question of preference 
concerning signaled or unsignaled shock was 
performed by Perkins, Levis, and Seymann 
(1963) and by Lockard (1963). Perkins et 
al. ran rats and found that 14 of the 16 sub- 
jects spent about 70% to 80% of the time 
on the signaled shock side of the shuttle box. 
When the conditions were reversed, however, 
a reversal in preference did not occur. Perkins 
et al. attributed this failure to reverse to the 
fact that there were only three test days. Re- 
versal was not a problem in a subsequent 
study by Perkins, Seymann, Levis, and Spen- 
cer (1966). Lockard’s (1963) results were 
similar to those of Perkins et al. After 12 
sessions, her subjects were spending about 
90% of the trials on the signaled side of the 
apparatus, Control subjects presented with a 
random signal and shocks were indifferent. 
Lockard demonstrated that neither signals 
alone nor random signals and shocks were 
able to maintain a preference for the signaled 
condition; a strong preference emerged only 
when signals preceded shock. Both Perkins et 
al. and Lockard used unscrambled shock.’ 

Lockard (1965) continued her earlier in- 


?With unscrambled grid shock, the charge on 
each grid bar is fixed and alternate grid bars carry 
opposite charges (positive or negative), With scram- 
bled shock, the charge on each grid bar is rapidly 
alternated so that in each cycle it is momentarily 
the opposite of the charge on other bars. Scram- 
bling the shock is designed to eliminate unauthor- 
ized avoidance responses (e.g. standing on grid 
bars of the same polarity) that sometimes occur 
with unscrambled shock. 
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vestigation of preference using two different 
shock (unscrambled) intensities (.221 mA or 
.236 mA) and four signal-shock intervals (5 
sec, .5 sec, O sec, or random). Four groups 
had the same level of shock for both compart- 
ments, and four groups had a 7% higher shock 
level for the signaled side. Only subjects re- 
ceiving equally intense shocks on both sides 
in the .5-sec or 5-sec groups tended to prefer 
the signaled condition. Subjects with the 0-sec 
signal-shock interval performed the same as 
the random-interval group did. Also, the level 
of preference was far weaker than it had been 
in the first study (Lockard, 1963). One rea- 
son why Lockard failed to replicate the high 
preference found in her earlier work may be 
that the shock levels were too low in the later 
study. Other variables were examined by 
Perkins et al. (1966). They used unscram- 
bled shock with a variety of intershock in- 
tervals, shock durations, and signal durations. 
Only a brief summary of their extensive find- 
ings can be given here. In particular, they 
showed that parameter values were important 
determiners of choice for the signaled condi- 
tion. The most marked preference occurred 
with 12 to 60 shocks per hour, an 18-sec sig- 
nal duration, and either a .5-sec or 5-sec 
shock. When the signal duration was de- 
creased, preference was weaker. Preference 
was also weaker when the intershock interval 
resulted in 2 shocks per hour. An interesting 
and important part of their study was the de- 
livery of shock through ear clips for one 
group (Experiment 5). The results of this 
experiment will be described later in the sec- 
tion on shock delivery systems. 

A replication of Perkins et al. (1966) was 
performed by Furedy and Walters (Note 1) 
but with an added condition. Half of their 
subjects received scrambled shock, the other 
half, unscrambled shock. Subjects receiving 
scrambled shock spent a higher percentage of 
time on the signaled side than did subjects 
receiving unscrambled shock (80% vs. 65%). 
However, because of equipment failures and 
because the typical learning curve for prefer- 
ence was not found, the authors doubted the 
validity of their findings. They performed a 
second experiment with improved equipment 
and found that subjects given either scrambled 
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or unscrambled shock preferred the signaled 
condition. However, the preference was, 
stronger with unscrambled shock. A clear 
learning curve was also observed over the 
first 2 days similar to that reported by other 
researchers. The data of Furedy and Walters 
suggested that scrambling may have some ef- 
fect but that it clearly does not eliminate 
preference for the signaled condition. 

Several other studies using a shuttle box 
and scrambled shock reported that subjects 
preferred a signaled shock condition over an 
unsignaled one, and preference was marked 
for most conditions (Frankel & Vom Saal, 
1976; Gliner, 1972; Hymowitz, 1973), 
Gliner’s study also assessed the effects of 
shock intensity on choice and the effects of 
signaling shock on the physiology of the or- 
ganism. He found that a marked preference 
for the signaled condition developed for both 
high- and low-shock-intensity subjects, but 
the preference developed more slowly for sub-~ 
jects receiving high-intensity shock, Also, he 
found less physiological deterioration when 
shock was signaled, a phenomenon we de- 
scribe in more detail later. 


Observing Response Procedure and Choice 


Except for the early studies of Coppock 
(1954), Knapp et al. (1959), and Akhtar 
(in Mowrer, 1960), all studies reviewed thus 
far used a shuttle box procedure to assess 
choice, and the subjects preferred the sig- 
naled condition. Other procedures have been 
used with similar results. An extensive series 
of experiments has been performed using an 
operant changeover procedure (e.g., Badia & 
Culbertson, 1972). With this procedure, base- 
line responding on a changeover lever is re- 
corded while a subject is being exposed 
(training) to the different conditions of the 
experiment. Shocks are usually presented non- 
contingently on some schedule, and different 
correlated stimuli identify the signaled and 
unsignaled conditions. Following training, 
subjects are given an opportunity to choose 
the condition in which they prefer to remain | 
(testing). Subjects are placed in an imposed | 
condition of either signaled or unsignaled | 
Shock, and a lever press (changeover re 


l 
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sponse) changes the condition to the opposite 

“one for a fixed period of time (e.g., 1 min.). 
At the end of this fixed period, the initial 
imposed condition is then reinstated and re- 
mains in effect until another changeover re- 
sponse occurs. This procedure is similar to the 
one introduced by Wyckoff (1952). 

The first preference studies using the 
changeover procedure were reported by Badia, 
Culbertson, and Lewis (1971) using signaled 
and unsignaled avoidable shock, and by Badia 
and Culbertson (1972) using either signaled 
or unsignaled escapable (Experiment 1) or 
inescapable (Experiment 2) shock. Subjects 
often spent 90% or more of the time in the 
signaled condition, and it did not matter 
whether shock was avoidable, escapable, or 
inescapable. The strong preference for the sig- 
naled condition was shown within each sub- 
ject repeatedly over a series of acquisition 
and extinction conditions. Three different ex- 
tinction conditions were also given to identify 
stimuli controlling the preference. Under ex- 
tinction conditions, a response produced (a) 
the stimulus (signal) identifying the shock 
period, (b) the stimulus identifying the shock- 
free period, or (c) neither stimulus. Change- 
over responding was greatest under the sec- 
ond condition, intermediate under the first, 


and least under the third. An exact replica- 


tion of Badia and Culbertson's (1972) study 
was undertaken by Lewis and Gardner (1977), 
and the data obtained were virtually identical. 

Another series of experiments assessed the 
robustness of the phenomenon (Badia, Coker, 
& Harsh, 1973; Badia, Culbertson, & Harsh, 
1973). Subjects in each experiment were in- 
itially given a choice between signaled and 
unsignaled shock schedules, with shock param- 
eters equal in the two conditions. Then den- 


"sity, duration, or intensity of signaled shock 


was systematically increased over that of un- 
signaled shock. Subjects chose the signaled 
condition over the unsignaled one when shock 
parameters were equal and continued doing 
so when signaled shocks were up to four 
times more dense, four to nine times longer, 
or two to three times more intense than un- 
signaled shocks. A control condition for the 
shock density study showed that subjects 


„would choose a lower density shock schedule 


1111 


over a higher one when all shocks were unsig- 
naled. However, choice of the lower density 
schedule was eliminated by signaling the 
shocks on the higher density schedule. 

Most experiments testing the effects of pre- 
dictable versus unpredictable aversive shock 
conditions have employed warning signals to 
make shock and shock-free periods predicta- 
ble. One can also increase predictability by 
making the occurrence of shocks temporally 
regular. Badia, Harsh, and Coker (1975) as- 
sessed the relative effectiveness of temporal 
regularity and signaling on preference using 
the changeover procedure. Subjects chose be- 
tween fixed-time (FT) and variable-time 
(VT) shock schedules under several signaling 
conditions: (a) unsignaled FT versus unsig- 
naled VT shock, (b) signaled FT versus sig- 
naled VT shock, and (c) unsignaled FT ver- 
sus signaled VT shock, Subjects chose the FT 
over the VT shock schedule when both were 
unsignaled and again when both were sig- 
naled. They chose the VT over the FT sched- 
ule only when the VT shocks were signaled 
and the FT shocks unsignaled. 


Shock Delivery Systems and Choice 


Some authors have been critical of the 
methods used to study signaled and unsig- 
naled shock (Biederman & Furedy, 1976b). 
One of the major criticisms is that preference 
for the signaled condition occurs only when 
shock is modifiable, that is, only when sub- 
jects are able by means of postural adjust- 
ment on the shock grid to partially or totally 
avoid the shock or to minimize its aversive- 
ness. The criticism is of obvious import to 
both theoretical and methodological issues. 
If true, it would bear directly on the devel- 
opment of theoretical statements. A number 
of studies relate directly to the alleged prob- 
lem of differential shock modification. Various 
procedures have been used to eliminate the 
problem or to monitor the extent to which it 
might occur. To ensure that subjects do not 
differentially modify the shock, some investi- 
gators have delivered shock through ear clips 
(Perkins et al., 1966), through tail elec- 
trodes (Coppock, 1954; Miller, Daniel, & 
Berk, 1974; Miller, Marlin, & Berk, 1977), 
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through implanted electrodes (Cotsonas, 1972; 
Griffin, Honaker, Jones, & Pynes, 1974), or 
through water electrodes (Fisher & Badia, 
1975). The use of attached electrodes makes 
it virtually impossible to modify shock through 
unauthorized escape or avoidance responses. 
Other researchers have monitored and mea- 
sured the amount of shock received by sub- 
jects when shock was signaled and when it was 
unsignaled (Lockard, 1963). A consistent pat- 
tern emerges from this research. 

The first experiment using surface elec- 
trodes, in this case ear clips, was Experiment 
5 of the Perkins et al. (1966) study. These 
investigators found that 13 of the 16 subjects 
preferred the signaled condition on Day 1. 
Of the 10 subjects completing the second day 
of testing with the ear clips still attached, 
5 preferred the signaled condition. For the 3 
subjects completing 3 days and 2 subjects 
completing 4 days of testing, the preference 
for the signal condition was marked. Clearly, 
the subjects of Perkins et al. preferred sig- 
naled over unsignaled shock conditions even 
with shock applied directly to the body. Other 
studies using tail shock have been reported 
by Miller et al. (1974) and Miller et al. 
(1977). The subjects in these studies also 
chose the signaled over the unsignaled condi- 
tion and in some cases over successive re- 
versals. 

Findings similar to those for tail shock have 
been reported with back electrodes by Cot- 
sonas (1972). Brass safety pins were inserted 
into the lower back in the first part of the 
experiment. Two subjects were tested, and 
both preferred the signaled condition; only 
one later showed a reversal when the condi- 
tions were changed. In the second part of the 
experiment, eight subjects were run. Seven 
of the eight preferred the signaled condition; 
all seven succeeded in three reversal condi- 
tions. The next experiment kept subjects in 
the same condition for 15 days prior to re- 
versing for 15 days. The subjects did not con- 
tinue to remain on the signaled side. The 
latter outcome is possibly due to problems 

arising from chronically implanted electrodes. 
Other investigators using grid shock have not 
found any attenuation of preference over time. 

The generality of the preference for sig- 
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naled over unsignaled shock in terms of shi 
delivery systems and in terms of species was: 
increased by Griffin et al. (1974) and | 
Fisher and Badia (1975). Griffin et al. im- 
planted electrodes in pigeons by attachi 
them to the pubis bone and gave their sul 
jects a choice between the two shock condi- 
tions. Two pigeons were run using a procedur 
similar to that of Badia and Culbertsoi 
(1972). Variability tended to be high; never- 
theless, one subject clearly changed to thi 
signaled condition, and the other showed 
similar but less marked trend under the high 
intensity shock condition. Fisher and Badia 
(1975) used the changeover procedure with 
goldfish in a shuttle box. Water electrodes 
distributed shock evenly throughout the shut- 
tle box. The subjects preferred the signal con- 
dition, and their performance was similar to 
that of rats tested under similar conditions, 
including performance under the three differ- 
ent extinction conditions used by Badia and 
Culbertson (1972). 

Others have used different procedures to 
determine if signaled shock was more modifi- 
able than unsignaled shock. These studies 
monitored the development of skeletal prepa- 
ratory responses to the signal (Badia & Ab- 
bott, in press; Biederman & Furedy, 1973, 
Experiment 1; Biederman & Furedy, 19766; | 
Furedy & Biederman, 1976; Lockard, 1963; 
Marlin, Berk, & Miller, 1978; Abbott & 
Badia, Note 2). Lockard used a pen recorder 
to monitor when subjects avoided unscram- 
bled shock. Only about 1.4% of the shocks 
were avoided; of these, equal numbers were 
avoided by the signaled shock group and ran- 
dom signal group. Lockard also failed to de- 
tect any systematic differences in the postural 
behavior of the two groups. 

Different results were obtained in a series 
of experiments by Biederman and Furedy in 
which the current flow through the subject 
was monitored during shock. Biederman and 
Furedy (1973, Experiment 1) found that the 
degree of unscrambled shock attenuation сог- 
related weakly (.40) but significantly with 
the degree of preference for the signaled con- 
dition. Subjects receiving scrambled grid 
shock or tail shock showed no attenuation 
and no preference. Furedy and Biederman 
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(1976, Experiment 3) obtained similar re- 
ülts using Lockard's (1963) shuttle box pro- 
cedure with the training phase omitted. Un- 
scrambled shock was used. Modification of 
shock was again observed, and there was a 
relation between degree of modification and 
degree of preference for the signaled condition. 

The reports of Biederman and Furedy seem 
to provide evidence that modification of shock 
can be an important determinant of prefer- 
ence. However, the studies may be faulted. 
First, the findings provide no direct evidence 
on differential modification of shock between 
the signaled and unsignaled conditions. Sepa- 
rate modification scores were not reported. 
If preference for the signaled condition is re- 
lated to better avoidance or reduction of 
shock in the signaled condition, this needs to 
be demonstrated. Second, even if one is will- 
ing to accept the observed correlation between 
preference and overall modification as sug- 
gestive of differential modification, there is 
another problem. Biederman and Furedy used 
exceptionally long shocks (5 sec), whereas 
most researchers used much briefer shocks 
(e.g., .5 sec; see Table 1) specifically to mini- 
mize the problem of avoidance. Attenuation, 
if it does occur, is more likely with long 
rather than short shocks. It may be that 

ock attenuation with long durations is a 
sufficient but not a necessary condition for 
preference to emerge. Another problem re- 
lates to training, that is, the exposure of sub- 
jects to the signaled and unsignaled conditions 
prior to their being given a choice. This phase 
provides subjects the opportunity to asso- 
ciate each condition with its correlated stimu- 
lus. Biederman and Furedy obtained their 
results only aíter training Was eliminated. 
Other researchers have found that consider- 
able experience with the signaled and un- 
signaled conditions is necessary before sub- 
jects acquire a preference for the signaled 
condition (e.g, Badia & Culbertson, 1972). 
The parameters chosen by Biederman and 
Furedy would appear to minimize a prefer- 
ence for signaled shock based on associative 
factors and maximize the potential for dif- 
ferential modification of signaled and unsig- 
naled shocks. Therefore, Biederman and 
Furedy may be studying а different phenome- 
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non. A chronological listing of studies inves- 
tigating choice under signaled and unsignaled 
conditions is located in Table 1. The listing 
includes procedures used, parameters varied, 
and results obtained. 

Evidence suggesting that differential modi- 
fication of shock may occur was reported by 
Marlin et al. (1978) and was based on ob- 
servation of subject postures during signals 
and during the intertrial interval but not 
during shock. Following preference testing in 
which the subjects chose the 100% signal 
condition over one in which signals preceded 
shock only 80% of the time, each subject 
was observed across 20 shocks. The subjects 
were found to be rearing at shock onset 6476 
of the time and appeared to avoid 6.5% of 
the shocks as indicated by the lack of re- 
sponse. The incidence of full rearing (front 
paws off the grid, body axis more than 45? 
from horizontal) was only marginally greater 
during signals than in their absence (3076 
and 29%, respectively), but partial rearing 
(body axis less than 45° from horizontal) 
occurred far more frequently during signals 
than during signal absence (59% and 3%, 
respectively). Asserting that rearing responses 
modify shock, Marlin et al. concluded that 
scrambled-grid shock allows shock modifica- 
tion and should be replaced with a fixed-elec- 
trode preparation to exclude its possibility. 
Their own fixed-electrode studies (e.g., Miller 
et al, 1977) convinced them, however, that 
the preference for signaled shock is not solely 
determined by modification. 

Although the Marlin et al. (1978) results 
are suggestive, the lack of interrater reliabil- 
ities, the failure to use a blind procedure to 
rate modification to signaled and unsignaled 
shocks, and the failure to document a rela- 
tion between observed postures on the grid 
and actual modification of shock reduce the 
value of these data. A study that is not open 
to these criticisms was conducted by Badia 
and Abbott (in press). The changeover pro- 
cedure was used to assess preference for sig- 
naled over unsignaled shock, and all param- 
eters were similar to previous studies (e.g., 
Badia & Culbertson, 1972). Additional equip- 
ment permitted the measurement of current 
flow through the subject. The study revealed 
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that whether shock was signaled or unsig- 
led made no difference in the duration of 
contact with the grid bars. If anything, there 
was a suggestion that 10 of the 12 subjects 
actually received slightly longer shock dura- 
tions when shock was signaled. Also of inter- 
est were oscilloscope tracings. These tracings 
indicated that the subjects were rapidly mak- 
ing and breaking grid contact during shock 
primarily because of running. Such activity 
permits reduction in grid-contact time for 
th signaled and unsignaled conditions, but 
it does not permit the kind of response that 
would allow precise control of current flow. 
We should also note that the Badia and Ab- 
bott findings confirm those reported earlier 
by Lockard (1963). 


Conclusion 


Based on this portion of the literature re- 
"view, a number of firm conclusions are war- 
ranted. Clearly, organisms prefer signaled 
over unsignaled shock conditions, whether 
shock is avoidable, escapable, or inescapable. 
Preference for the signaled condition has oc- 
curred in rats, in pigeons, and in fish. When 
the conditions of the experiment are reversed, 
preference has also reversed, although an oc- 
,casional failure to reverse with reversed con- 
ditions has been noted. 

With rats, a preference for the signaled con- 
dition has been found with both males and 
females, and with hooded and albino animals. 
The same results have been reported with dif- 
ferent apparatus and different procedures. A 
preference for the signaled condition develops 
whether shock is scrambled or unscrambled 
and with various shock delivery systems or 

surface electrodes, such as electrodes attached 

to the tail, to the back, to the ears, or to the 
pubis bone, or with water electrodes. In ad- 
dition, studies measuring the duration of 
scrambled shock received by subjects under 
signaled and unsignaled conditions suggest 
that differential shock modification, shock 
avoidance, and decreased shock duration are 
not necessary conditions for preference. Fi- 
nally, we can conclude that the preference 
is robust in that subjects prefer longer, 
stronger, and more dense signaled shock over 
+ 
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shorter, weaker, and less dense unsignaled 
shock. 


Factors Affecting Preference 


There seems to be little question that sub- 
jects prefer signaled shock conditions and for 
reasons other than those related to shock 
modification. We now discuss the factors that 
strengthen or weaken preferences. 

One factor affecting the strength of pref- 
erence is shock intensity. Harsh and Badia 
(1975) used constant-current scrambled shock 
ranging in intensity from .15 mA to 1.0 mA, 
in steps of either .15 mA or .20 mA. They 
found that the amount of time spent in the 
signaled condition varied systematically with 
shock intensity over the lower. and middle 
range of intensities used. Subjects did not 
choose the signaled condition at low shock 
intensities. 

Another factor demonstrated to affect choice 
is the average intershock interval (Harsh & 
Badia, 1976). These investigators used a con- 
stant-wattage scrambled shock of .5 sec over 
six variable-time intershock intervals and a 
constant 30-sec signal. The intershock inter- 
vals averaged 510 sec, 270 sec, 150 sec, 90 
sec, 60 sec, and 45 sec. Choice of the signaled 
condition was directly related to the average 
intershock interval for six of the eight sub- 
jects in that short intershock intervals weak- 
ened preference, and long intershock intervals 
strengthened it. However, since only one sig- 
nal duration was used, it cannot be deter- 
mined from this study whether it was the 
absolute shock-free period or the ratio of the 
shock period (signal present) to the shock- 
free period (signal absent under the signaled 
condition) that controlled choosing the sig- 
naled condition. 

The question of whether preference for the 
signaled condition is controlled by stimuli 
identifying the shock period (signal) or the 
shock-free period (signal absence) is impor- 
tant empirically and theoretically. One way 
of evaluating the role that these stimuli as- 
sume is by varying their dependability. In a 
study by Badia, Harsh, Coker, and Abbott 
(1976), the dependability of the signal in 
identifying a shock period was varied by 


1118 


holding the total number of signals constant 
at 180 and varying parametrically the num- 
ber of shocks from 180 to 3. Under these 
conditions, the probability of shock in the 
event of a signal varied, that is, р (US|CS) 
751.0, but the probability of shock in the 
absence of the signal always remained con- 
stant, that is, p (US|CS) — 0. Badia et al. 
found that subjects changed to the signaled 
condition when the value of р (US|CS) = 
1.0 and also when the probability was sys- 
tematically reduced to less than 1.0. Appar- 
ently, the dependability of the signal that 
identified a shock period was not important 
as long as the probability of safety was not 
degraded, that is, as long as p (US|CS) re- 
mained at 0. A variety of additional condi- 
tions, including controls for the intershock 
interval and sensory stimulation, were run. 

In the second experiment, Badia et al. 

(1976) degraded the dependability of the 
stimulus identifying the shock-free period 
(safety), that is, p (US|CS) == 0, while keep- 
ing constant the dependability of the stimu- 
lus identifying the shock period, that is, p 
(US|CS) = 1.0. The results showed that as 
the dependability of safety varied, preference 
for the signal condition also varied. When 
safety was dependably identified, preference 
for the signaled condition was strong, but 
when safety was undependably identified, 
preference for the signaled condition weak- 
ened. These results suggest that stimuli iden- 
tifying shock-free periods are more important 
than stimuli identifying shock periods, The 
results also raise some interesting questions 
concerning the relative value of safety. Since 
subjects chose the signaled condition when 
safety was a nonzero value, that is, p (US|CS) 
70, it is obvious that safety need not be 
absolute. Results similar to those of Badia 
et al. have been reported by Safarjan and 
D'Amato (1978). These investigators also 
concluded that preference for the signal con- 
dition was strongly related to the safety func- 
tion of signal absence but not to the warning 
function of signal presence. 

We have already noted the importance of 
signal duration in the early studies of pref- 
erence (Perkins et al, 1966). Similar find- 
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ings were reported by French, Plein al 
Leeb (1972). These early findings sin a 
confirmed and oe. by other M 
(Abbott & Badia, in press) using short si | 
durations with a different procedure and i | 
a different apparatus. It was found that sul | 
jects generally did not prefer the signaled) 
condition when the signal duration was | | 
than 1 sec. Signal durations of 1.5 sec of 
longer, however, resulted in a strong pref | 
ence. As noted, Perkins et al. found that pref 
erence was strongest with signal riim d 
18 sec; French et al. found prefetence om 
est with 30-sec durations. The theoretical i 
plication of these findings is discussed later. | 
Other studies are relevant to an analysi || 
of the factors affecting preference, and s 
eral investigators have assessed the attracti 
or aversive properties of stimuli that se 
warning and safety functions (Arabian 
Desiderato, 1975; Collier, 1977; Harsh 
Badia, 1974). Harsh and Badia gave rats 
choice between signaled and unsignaled shot 
while they were responding on a variable, 
interval food schedule. All the subjects сћоз 
the signaled condition, even though respond? 
3 


ing for food in the presence of the signal was 
suppressed. The rate of responding for foi 
was lowest in the presence of the signal 
highest in its absence. An intermediate тай 
of responding occurred under the unsignaled 
shock condition. Although it is clear thi 
response suppression cannot always be use 
as an index of aversiveness (cf. Rachlin 4 
Herrnstein, 1969), under these conditions 
believe that the results suggest that the pr 
ence of the signal is the most aversive, thal, 
stimuli associated with the unsignaled condi: 
tion are less aversive, and that the absence of 
the signal under the 'signaled condition is the) 
least aversive. 

The studies by Arabian and Desiderall 
(1975) and Collier (1977) assessed the chat’ 
acteristics of stimuli associated with shock 
periods (danger), stimuli associated wit! 
shock-free periods (safety), and signals pre 
ceding shock (warning stimuli). Three group 
of rats were tested in each of the studies. Om! 
group was exposed to a situation in whi 
stimuli identified shock and shock-free Ре 
riods. In addition, during the shock periods 5 
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this group had a signal preceding each shock. 
MPherefore, this group had stimuli identifying 
safe periods and danger periods, and also 
warning signals during the danger periods. A 
second group was essentially the same as the 
first, but instead of a signal preceding each 
shock during the shock periods, a random sig- 
nal was used. For the second group, then, 
safe and danger periods were identified, but 
the warning signals were absent. The third 
group did not have stimuli that identified 
"safe, danger, or warning periods. After train- 
ing under the above conditions, the subjects 
were given a choice. Both studies reported 
that their subjects preferred the condition in 
which shock and shock-free periods were iden- 
tified over the condition in which they were 
not. The subjects also preferred the condition 
of discriminable shock and shock-free periods 
when warning signals were added over the 
condition without discriminable shock and 
'shock-free periods. Different findings were re- 
ported, however, when the subjects were given 
a choice between a condition in which stimuli 
identified shock and shock-free periods and 
one in which discriminable periods plus warn- 
ing signals were present. Collier found that 
the subjects preferred the condition contain- 
ing warning signals, whereas Arabian and 
Desiderato did not. 


Conclusion 


Our review of this portion of the literature 
dealing with the factors affecting choice per- 
mits a number of conclusions. The literature 
suggests that choice of the signaled condi- 
tion does not occur at low levels of shock in- 
tensity. Generally, as shock intensity in- 
creases, so does choice of the signaled con- 
dition, The relation found between choice and 
shock intensity is similar to that found be- 
tween performance and shock intensity in 
avoidable and escapable situations. The de- 
pendability of stimuli identifying a shock 
period appears to be relatively unimportant in 
terms of choice. Subjects chose the signaled 
condition even though the dependability 
varied markedly. On the other hand, the de- 
pendability of a stimulus identifying a shock- 
[ree or safe period is important. When this 
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stimulus is made relatively undependable, 
choice of the signaled condition decreases. 
Long signal durations tend to be more effec- 
tive than short signal durations. Signal dura- 
tions of less than 1.5 sec generally do not 
result in a preference for signaled shock. 
Finally, very short (45 sec) or very long (120 
min) intershock intervals attentuate prefer- 
ence for the signaled condition. 


Failures to Replicate 


Several studies have obtained preference 
for a signaled shock condition only when 
the shock was unscrambled (Biederman & 
Furedy, 1973, 1976a, 1976c; Furedy & Bie- 
derman, 1976). These data suggest that pref- 
erences for a signaled condition emerges only 
when subjects can overtly modify the shock 
through skeletal-muscular responses. This 
conclusion clearly disagrees with the findings 
reviewed earlier showing that subjects prefer 
a signaled condition with scrambled grid 
shock, with tail shock, and with water elec- 
trodes (e.g., Badia & Culbertson, 1972; Fisher 
& Badia, 1975; Miller et al., 1974). 

Other studies by Biederman and Furedy 
(1973) have also failed to obtain a prefer- 
ence for signaled shock when it was scrambled. 
Yet, other investigators using a similar pro- 
cedure have not had difficulty (Abbott & 
Badia, Note 2). Various possible reasons for 
the unusual findings of Furedy and Bieder- 
man are described in some detail in Badia 
and Harsh (1977a, 1977b). In part, they re- 
late to the parameter values chosen by Furedy 
and Biederman, such as signal duration (3 
sec), amount of training (none), length of 
testing phase (two 3-hour sessions), and in- 
tershock interval (45 sec). Some of these 
values have been shown to be less than opti- 
mal for demonstrating preference. In addition, 
the uncommonly long shocks used (5 sec) 
may have provided unusual opportunities for 
the unscrambled-shock subjects to learn com- 
peting responses. 

We are aware of only one other study that 
failed to obtain a preference for a signaled 
shock condition over an unsignaled one (Crab- 
tree & Kruger, 1975). In that study, how- 
ever, several problems made finding a pref- 
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erence unlikely. The investigators used only 
one 13-hour testing session, an м of 1 in each 
cell for a 3 х 5 design, an intershock inter- 
val that was partially predictable, and a 3- 
hour pretest session with only the signal pres- 
ent, The latter procedure may have rendered 
the signal ineffective through the process of 
latent inhibition. 


Related Findings 


Studies using a choice as a dependent mea- 
sure represent only one portion of a considera- 
bly larger amount of literature dealing with 
the role of predictability in aversive situations. 
This literature is much too extensive to be 
reviewed in any detail here; however, the 
general findings of studies involving measures 
other than choice are discussed insofar as 
they relate to the choice literature. (For a 
recent review dealing with response suppres- 
sion under signaled and unsignaled conditions, 
see Hymowitz, 1979.) 


Avoidance, Escape, and Punishment 


The behavioral significance of signaling 
aversive events is clearly revealed in studies 
of avoidance, escape, and punishment. Sid- 
man (1955) published one of the first studies 
comparing signaled and unsignaled avoidance 
schedules. Both shocks and signals were under 
the subject's control in this study. Responses 
in the presence of the signal postponed the 
next shock, and responses in its absence post- 
poned both the signal and shock. The sub- 
jects allowed the signal to appear rather than 
postpone it, and overall avoidance responding 
dropped. Similar findings have been obtained 
by other investigators (e.g., Badia, Culbertson 

' & Lewis, 1971; Hyman, 1969; Keehn, 1959; 
Ulrich, Holz, & Azrin, 1964). One explanation 
of the differences in responding may relate 
to differences in shock density under signaled 
and unsignaled schedules. Signals may allow 
a more effective avoidance strategy (i.e., re- 
sult in fewer shocks). However, although 
some investigators have found shock density 
differences (e.g., Badia, Culbertson, & Lewis, 
1971; Ulrich et al, 1964), others have not 
(e.g., Ayers, Benedict, Glackenmeyer, & Mat- 
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thews, 1974; Logan & Boice, 1968; Powel 
1976). An explanation of the signaling effect 
based solely on an increase in response effec 
tiveness is not supported. 

Signaling also has a marked effect on be: 
havior under schedules of escapable shock, 
Badia and Culbertson (1970) compared 
behavior of rats under signaled and unsignaled 
escapable shock schedules and did not find 
differences in escape latencies. This outcom 
also suggests that behavioral differences re 
sulting from signaled and unsignaled sche 
ules are not due to response effectiveness. O 
the other hand, Badia and Culbertson did find 
clear differences in lever-holding time and in 
exploration. Holding was less frequent and 
exploration more frequent under the signaled 
condition. 

Comparisons of signaled and unsignaled 
punishment schedules have yielded finding 
similar to those obtained with escape and 
avoidance (e.g., Church, 1969). Church found 
that immediate shock resulted in more sup 
pression than did delayed shock. More inter- 
esting, however, a signaled punishment group 
showed less overall suppression than did a 
unsignaled punishment group, and the di: 
tribution of responding was different. Re: 
sponding in the absence of the signal was 
higher than in the presence of the signal. 

The studies reviewed thus far indicate that 
signaled and unsignaled avoidance, escape, 


they tend to gain control over behavior re 
lated to the postponement, termination, and, 
or prevention of shock. When signals are ab 
sent, the prevailing stimuli appear to set 
occasion for behaviors not related to the con- 
trol of shock (e.g., general activity, respond- 
ing for reinforcement). When signals are un4 
available, shock-related behaviors tend to pre 
dominate at all times. 


Response-Independent Shock and 
Behavioral Suppression 


In contrast to response-dependent sh 
(Church, 1969), other studies have assessed 
the effects of response-independent shock 
behavior maintained by reinforcement, that 


PREDICTABLE AND UNPREDICTABLE SHOCK 


conditioned suppression (see Blackman, 1977; 
Davis, 1968), Some researchers have com- 
pared shock schedules with signals to shock 
schedules without signals (e.g, Brimer & 
Kamin, 1963; Davis & McIntire, 1969; Davis, 
Memmott, & Hurwitz, 1976; Holmes, Jack- 
son, & Byrum, 1971; Seligman, 1968; Selig- 
man & Meyer, 1970; Shimoff, Schoenfeld, & 
Snapper, 1969; Weiss & Strongman, 1969). 
These studies have shown that when shocks 
are signaled, the base rate of responding in- 
itially drops but gradually recovers over time. 
When shocks are not signaled, however, the 
base rate of responding drops to a low level 
and shows little recovery. 

An example of correspondence between vari- 
ables influencing choice and conditioned sup- 
pression concerns the dependability of pre- 
dictors of shock and shock-free periods. In 
a study described in more detail earlier, 
Badia et al. (1976) reported that systematic 
reduction of the probability with which sig- 
nals were followed by shock had little effect 
on preference, whereas degrading the de- 
pendability of a stimulus identifying a shock- 
free period had a marked effect. A similar 
finding using a conditioned suppression pro- 
cedure was reported by Nageishi and Imada 
(1974). They studied the effects of varying 
the dependability of the shock-free periods 
on rats’ licking behavior. They found that 
as the dependability decreased, the basal rate 
of licking also decreased. 


Somatic Reactions to Shock 


There has been a variety of studies 
paring the somatic reactions to signaled and 
unsignaled shock, and many of 
indicate that the effects of signaled situations 
are less severe than the effects of unsignaled 
ones (Gliner, 1972; Mezinskis, Gliner, & 
Shemberg, 1971; Price, 1972; Seligman, 1968; 
Seligman & Meyer, 1970; Simpson, Wilson, 
DiCara, Jarrett, & Carroll, 1975; Weiss, 1970, 
1971a, 1971b, 1971c). The findings by Weiss 
(1970) аге representative. Weiss found 
marked differences in the stress responses of 
subjects receiving signaled shocks and of sub- 
jects receiving unsignaled shocks. Subjects 
receiving unsignaled shocks developed more 
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ulcers, lost more weight, and showed higher 
plasma corticosterone concentrations and 
higher body temperatures than did subjects 
receiving signaled shock. 

The Weiss (1970) study and others clearly 
Suggest that the somatic consequences of shock 
schedules are less severe when shocks are sig- 
naled. However, results involving somatic 
measures are not entirely consistent, and ap- 
parently contradictory findings have been ob- 
tained, Compared to unsignaled schedules, 
signaled schedules have been associated with 
heightened adrenal functioning (Bassett, 
Cairncross, & King, 1973; Paré, 1964), 
greater weight loss, and higher mortality rates 
(Brady, Thornton, & DeFisher, 1962; Fried- 
man & Ader, 1965). 

An explanation of the conflicting findings 
related to somatic reactions is not available. 
Weiss (1977) suggests that an examination 
of procedural variables reveals some consis- 
tencies. That is, somatic reactions to signaled 
schedules are less severe relative to reactions 
to unsignaled schedules when a direct-shock 
delivery system (e.g., tail shock) is used and 
are usually more severe when grid shock is 
used. Weiss (1977) related this pattern of 
findings to rats’ coping behavior with different 
shock delivery systems. According to his view, 
the inefficient coping attempts associated with 
grid shock but not with direct shock may lead 
to more pathological changes, Not all findings 
are consistent with this notion however 
(Gliner, 1972; Seligman, 1968; Seligman & 
Meyer, 1970). It is apparent that additional 
data are needed to clarify this issue, 


Theoretical Views 


There are several ways of organizing the 
literature on predictability and behavior, par- 
ticularly choice behavior. The findings could 
be organized along conditioned reinforcement 
lines. In its most simple form, this view states 
that neutral stimuli paired with primary re- 
inforcers (¢.g., food or shock) acquire rein- 
forcing properties similar to those of the re- 
inforcer. A conditioned reinforcement view 
can account for the findings obtained in ex- 
periments using food that subjects choose 
situations in which a signal precedes a posi- 
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tive reinforcer over situations in which no 
signal is given (e.g., Lutz & Perkins, 1960; 
Prokasy, 1956). To account for the prefer- 
ence for signaled over unsignaled food, it is 
assumed that the total amount of reinforce- 
ment is greater with the signal than without 
it, that is, the summation of the conditioned 
reinforcement occurring to the signal through 
pairings with food and the reinforcement of 
the food itself is greater than food reinforce- 
ment alone in the unsignaled condition. Gen- 
eralizing this logic to the aversive situation, 
analogous reasoning would predict the oppo- 
site results, that is, shock plus the acquired 
aversiveness of the signal should be more 
aversive than the condition with unsignaled 
shock alone. As our view indicates, however, 
subjects under aversive stimulation clearly 
prefer signaled over unsignaled aversive situa- 
tions. Obviously, conditioned-reinforcement 
theory alone cannot account for the literature 
showing that subjects prefer the signaled- 
shock condition. Nor can it account for ap- 
petitive findings showing that in some cases, 
subjects prefer unsignaled over signaled ap- 
petitive reinforcement (Hershiser & Trapold, 
1971). 
Another way of organizing the literature on 
predictability could follow an information- 
theory and uncertainty-reduction analysis 
(e.g, Berlyne, 1960). Berlyne's theory is 
based on the proposition that drive induces 
uncertainty and conflicts, and on the rein- 
forcing effect of their reduction. All informa- 
tion is considered desirable, and the theory 
does not provide a basis for selecting informa- 
tion. Rather damaging to the uncertainty re- 
duction view is the fact that it does not pre- 
dict the findings of Defran (1972), Dinsmoor, 
Flint, Smith, and Viemeister (1969), Kendall 
(1973), and Wilton and Clements (1971). 
The Dinsmoor et al. and Defran studies are 
most illustrative. These investigators used an 
observing-response procedure and found that 
subjects would respond for stimuli identifying 
food periods (Dinsmoor et al.) or shock-free 
periods (Defran), but that they would not 
respond for stimuli identifying shock periods. 
Also, many investigators have shown that ani- 
mals prefer information concerning reward 
over information concerning nonreward (Ріпѕ- 
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moor, Browne, & Lawrence, 1972), even when 
the information content is equal (e.g., Jenkins 
& Boakes, 1973; Peterson, Ackil, Frommer, 
& Hearst, 1972). Other findings difficult for 
the information-uncertainty-reduction hypoth- 
esis to address are those of studies showing 
that preference is a function of signal dura- 
tion (eg, Perkins et al, 1966), of reward 
magnitude (Mitchell, Perkins, & Perkins, 
1965), or of stimulus dependability (Badia 
et al., 1976). It is apparent that the informa. 
tion hypothesis is not sufficiently developed’ 
to apply to much of the current literature, 
especially the literature on choice. 

The two major analyses that have been 
applied to the choice literature are the prep. 
aration hypothesis and the safety hypothesis, 


Preparatory Response Hypothesis 


According to the preparation hypothesis, 
(Perkins, 1955, 1968), stimuli that precede’ 
biologically important events allow subjects 
to prepare to receive these events. In turn, 
preparation is thought to minimize the pain- 
fulness of aversive stimulation or maximize 
the attractiveness of appetitive events. These 
preparatory responses are considered to be 
classically conditioned responses acquired 
through the law of effect. | 

The conditioned reinforcement hypothesis, 
and the preparation hypothesis predict simi- 
lar outcomes in appetitive situations but not 
in aversive situations. Under aversive stimu 
lation, the preparation hypothesis predicts 
that subjects will prefer signaled over unsig: 
naled shock conditions. According to the 
earlier theorizing of Perkins (1955), signals 
preceding shock allow the subject to make 
Preparatory responses (either internal or eX 
ternal) to shock, which reduces its aversive 
ness. In a later version of the theory, Perkins 
(1971) generalized preparatory responding 10 
include the entire stimulus situation of shock 
and shock-free periods. This conception is i 
sharp contrast to the earlier view (Perkins 
1955), in which the emphasis was placed 0 
specific responses. It is the earlier version ! 
preparation that is most frequently tested an 
that is most testable. 

Advantages of the preparation hypothesi у 
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One advantage of this view is its parsimony. 
һе preparation hypothesis presents a one- 
factor view of conditioning in that both in- 
strumental conditioning and classical condi- 
tioning are explained in terms of the law of 
effect. In addition, this view fits nicely with 
our intuitive notions of the kinds of behavior 
that should occur in response to signals that 
' predict biologically important events. For ex- 
ample, given the opportunity, subjects should 
respond in ways that increase the attractive- 
Е ness or decrease the aversiveness of environ- 
mental events. Numerous experiments dealing 
with escape and avoidance bear out this ex- 
pectation. Preparatory responses clearly do 
occur under certain conditions. Another ad- 
vantage of the preparation view is that it 
emphasizes the importance of such signal 
parameters as duration, variability, and de- 
pendability in determining choice. As noted, 
much of the evidence concerning choice under 
"different signal conditions is consistent with 
preparation. It has been argued that in some 
Situations longer signals allow more adequate 
preparation than do shorter signals, an out- 
come that has been found in several studies 
(Abbott & Badia, in press; French et al., 
1972; Perkins et al, 1966). Particularly 
relevant are the findings of Abbott and Badia 
showing that signal durations of .5 or 1.0 
sec would not support a preference for the 
signaled shock conditions. According to the 
preparation hypothesis, these short signal 
durations simply do not provide adequate 
preparation time. Signal variability could also 
be considered important for preparation. To 
be maximally effective, preparatory responses 
must be precisely timed, and conditions that 
allow this would be preferred to conditions 
that do not. Evidence supporting this view 
thas been found by Safarjan and D'Amato 
(1977). This study reports that subjects pre- 
fer fixed over variable signal durations. The 
assumption of precisely timed preparatory re- 
Sponses also provides the rationale for pre- 
dicting preference for immediate over delayed 
shock (e.g., Knapp et al., 1959). 

The data on dependability of the signal re- 
ported by Badia et al. (1976) appear incom- 
patible with a preparatory view. However, it 
У be possible to interpret the data of Badia 
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et al. within a preparatory-response frame- 
work by (a) assuming a positive relation be- 
tween preference and the proportion of shocks 
to which preparation occurred (Experiment 
1) or (b) assuming preparation required little 
effort so that inappropriate preparation did 
not affect preference (Experiment 2). 
Difficulties with the preparatory hypothesis. 
The preparatory view can account for a sub- 
stantial portion of the literature on preference 
for predictable events. However, the strength 
of this view is also its weakness in that each 
successful account requires making a specific 
assumption about the nature of preparatory 
responses. Often these assumptions are de- 
duced from the experimental outcomes that 
they allegedly predict. Since assumptions ap- 
propriate to any given outcome can be postu- 
lated, it is unlikely that a definitive test of 
the preparation hypothesis can be made. An 
important question, therefore, is whether the 
various assumptions made across experiments 
represent a unified view of the preparation 
hypothesis. Unfortunately, when the assump- 
tions are viewed in this larger context, they 
often conflict. For example, to account for 
the choice of immediate over delayed shock 
conditions and of fixed over variable signal 
durations, it is assumed that preparatory re- 
sponses must be precisely timed to coincide 
with shock (e.g., Knapp et al., 1959; Safarjan 
& D'Amato, 1977). Presumably, longer delays 
to shock make precise timing difficult. Yet to 
account for the stronger preference obtained 
with long over short signal durations, the op- 
posite assumption is made, namely, that longer 
signals allow more effective preparation. Ob- 
viously, the precise timing and better prepara- 
tion assumptions make opposite predictions 
in the same situation. The only criterion for 
choosing one assumption over the other ap- 
pears to be the specific experimental outcome. 
A similar conflict of assumptions occurs when 
the rationale for choice is examined for stud- 
ies involving the dependability of shock given 
the presence of a signal, or the dependability 
of no shock given the absence of a signal 
(Badia et al., 1976). In this case, it must be 
assumed that preparation is either effortful 
or effortless, or that there either is or is not 
a relation. between preference and the pro- 
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portion of shocks to which preparation is 
made. In brief, when an internally consistent 
set of assumptions is adopted, coverage of the 
data is substantially restricted for the prep- 
aration hypothesis. 

In addition to logical difficulties, there are 
also empirical difficulties for the preparation 
hypothesis (Badia, Coker, & Harsh, 1973; 
Badia, Culbertson, & Harsh, 1973). As noted 
earlier, subjects in the experiments of Badia 
and his colleagues chose signaled over unsig- 
naled shock even though signaled shock was 
two to nine times longer, two to three times 
more intense, or four to eight times more 
dense. It seems unlikely that preparation 
would have reduced the aversiveness of sig- 
naled shock to that of unsignaled shock under 
all of these conditions. It is difficult to im- 
agine a preparatory response so effective as 
to lower the aversiveness of the longer, 
stronger, or more dense signaled shock below 
that of the shorter, weaker, or less dense un- 
signaled shock. The results of the Harsh and 
Badia (1975) study showing that preference 
for the signaled condition increases as shock 
intensity increases are also difficult to recon- 
cile with the preparation hypothesis. Presum- 
ably, preparation should occur at all intensity 
values. Other data argue against the prepara- 
tion hypothesis as it relates to specific skeletal 
responses. The results of Miller et al. (1974), 
of Fisher and Badia (1975), and of others 
using surface electrodes to deliver shock rule 
out skeletal preparation that would have re- 
sulted in the subjects receiving different 
amounts of shock. Similarly, the study by 
Badia and Abbott (in press) monitoring shock 
duration found no differences between sig- 
naled and unsignaled conditions. 

Some investigators have measured the re- 
sponse to an aversive event presented alone 
or preceded by a signal. Furedy and Doob 
(1972) summarized a number of experiments 
involving 150 human subjects and concluded 
that signaled shock failed to be rated less 
aversive than unsignaled shock. Similar re- 
sults were also reported by Furedy and Gins- 
burg (1973) and Furedy and Klajner (1972). 

An important series of studies reported by 
Gormezano and Coleman (1973) also relates 
directly to the question of preparation. Con- 
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trary to the preparation view, their findings 
suggest that the principle of reinforcement 
does not apply to classical conditioning. Go 
mezano and Coleman varied the effectiveness 
of preparation by attenuating the intensity of 
the US on trials in which a conditioned re 
sponse was elicited. According to the prepara: 
tion view, the frequency of conditioned ге 
sponses should have increased; instead, this 
manipulation resulted either in no change or 
in a reduction in the frequency of conditione 
responses. 

Although a substantial literature indicate: 
that a signal preceding shock attenuates a 
animal's distress vocalizations to that shock 
(e.g, Badia, Culbertson, Defran, & Lewis, 
1971), the attenuation occurs the first time 
that the signal and shock are paired. Finding 
differences such as these on the first trial sug 
gests that nonassociative factors are involved, 
Therefore the findings of Badia et al. cannot 
be used to support the preparation hypoth 
esis. There are also studies with human sub; 
jects showing that the galvanic skin response 
to shock is smaller when shock 15 signaled (e.g, 
Baxter, 1966; Kimmel, 1967). This diminu 
tion in the galvanic skin response has also 
been found in the rat and has been interpreted! 
as “preception” by Lykken (1962). Again, 
however, Badia and Defran (1970) have 
shown that the larger galvanic skin responses) 
occurring to unsignaled shock resulted from 
orienting responses (Sokolov, 1963) occur. 
ring to the omission of the signal. 


Safety Hypothesis 


The first version of the safety hypothesis 
was offered by Mowrer (1960) to account for 
rats choosing signaled over unsignaled avoid-| 
able or escapable shock. Subsequently, the/ 
hypothesis was also used by Lockard (1963) 
and Seligman (1968). Seligman, Maier, and 
Solomon (1971) were the first to describe the: 
analysis as the safety hypothesis, and the 
were primarily responsible for its develop 
ment. Badia, Culbertson, and Lewis (197!) 
began to systematically apply the hypothesis) 
to a wide range of preference findings, 
the safety hypothesis soon challenged 
preparation hypothesis as an interpretive 
model of preference in aversive situations. 
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, The safety hypothesis emphasizes that sit- 

‘uations with aversive stimuli can be divided 
into discriminably different components that 
vary in their degree of aversiveness. In its 
most simple form, emphasis is placed on dis- 
criminable shock and shock-free periods, 
whether shock stimulation is avoidable, escap- 
able, or inescapable. These discriminable 
periods are orthogonal. For example, when 
subjects are given a choice between signaled 

, and unsignaled shock, three distinct stimulus 
conditions exist: (a) the presence of the sig- 
nal (CS) in the signaled condition, (b) the 
absence of the signal (CS) in the signaled 
condition, and (c) the unsignaled condition. 
Although the same shock distribution is used 
for both shock conditions and overall shock 
rates are identical, local shock rates may vary. 
In the signaled condition, shock always oc- 
curs in the presence of the signal,  (US|CS) 
. = 1.0, and never in the absence of the signal, 
p (US|CS) = 0. Under the signaled condition, 
therefore, shock (unsafe) periods and shock- 
free (safe) periods are perfectly identified, 
even when shock is randomly programmed. In 
contrast to the signaled condition, neither 
safe nor unsafe periods can be identified in 
the unsignaled condition, and with random 
shock, the entire intershock interval may ac- 
quire properties of an unsafe period. Thus, 
when subjects are in the signaled condition, 
the shock period is identifiable and usually 
brief—at most lasting only as long as the sig- 
nal duration. Further, the safe period is also 
identifiable and considerably longer, since it 
consists of the total intershock times minus 
the signal duration. Presumably, subjects 
choose the signaled shock condition over the 
unsignaled one on this basis; that is, the safe 
periods are identifiable and are considerably 
longer than the unsafe periods. 

Advantages of the safety hypothesis. One 
advantage of the safety analysis is that it 
can reconcile such findings as the acquired 
aversiveness of preshock stimuli with the find- 
ings that subjects prefer situations that in- 
clude these stimuli. The safety hypothesis 
Permits the generalization that stimuli paired 
with a reinforcer, positive or negative, acquire 
the properties of that reinforcer. Findings 
£lealing with conditioned fear (e.g., McAllister 
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& McAllister, 1971), with conditioned sup- 
pression (e.g. see review by Davis, 1968), 
and with inhibition and facilitation of avoid- 
ance (e.g, Rescorla & LoLordo, 1965) are 
compatible with this view. Support for the 
safety analysis can also be inferred from stud- 
les showing that greater physiological deteri- 
oration usually occurs when shock is unpre- 
dictable. Other data also support the safety 
analysis. у 

If safety is important, then its duration 
should be a factor. Stimuli associated with in- 
creased durations of shock-free time should 
acquire differential control over responses that 
produce these durations. The latter point is 
important because it demonstrates that fac- 
tors other than the parameters of the aver- 
sive stimulus are controling behavior. An 
example of one such effect involves the inter- 
trial interval in avoidance learning, a factor 
known to be important in avoidance learning 
(e.g, Weisman & Litner, 1969a, 1969b). A 
more compelling set of data has also been 
provided by these investigators (Weisman & 
Litner, 1971). They used a procedure intro- 
duced by Rescorla and LoLordo (1965) in 
which a stimulus paired with shock or a dif- 
ferent stimulus paired with no shock was pre- 
sented. They systematically varied the dura- 
tion of the no-shock period across groups and 
then imposed the stimuli identifying shock 
and no-shock periods on an avoidance task. 
When the stimulus identifying safety was im- 
posed on avoidance, responding decreased. 
More important, the greater the duration of 
safety during the initial training, the greater 
the decrement in avoidance responding during 
testing. Other studies have also shown the 
duration of the shock-free period to be im- 
portant (e.g., Azrin, Hake, Holz, & Hutchin- 
son, 1965). As described earlier, Harsh and 
Badia (1976) demonstrated that the longer 
the duration of the shock-free period, the 
stronger the preference for the signaled over 
the unsignaled condition. 

The results of Badia, Culbertson, and Lewis 
(1971) and of Badia and Culbertson (1972) 
also support a safety analysis. These investi- 
gators analyzed the stimuli within a prefer- 
ence task that controlled choosing the signaled 
condition. They demonstrated through a series 
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of extinction trials that the stimulus that was 
correlated with the shock-free period con- 
trolled changing from the unsignaled to the 
signaled condition. They also demonstrated 
that the stimulus (signal) that was corre- 
lated with the shock period, to which prepara- 
tion could be made, did not maintain chang- 
ing to the signaled condition. Other data also 
support the safety analysis. Harsh and Badia 
(1975) found that preference for the signaled 
condition varied with the intensity of shock. 
In another study (Harsh & Badia, 1974), 
they found that although subjects preferred 
the signaled shock condition, responding for 
food was most suppressed in the presence of 
the signal and least suppressed in its absence. 
Difficulties with the safety hypothesis. Sev- 
eral findings are difficult for the safety anal- 
ysis to accommodate. One of these findings 
deals with signal duration. Abbott and Badia 
(in press) found that animals did not prefer 
the signaled condition with signal durations of 
5 or 1.0 sec but that with longer durations 
they did. Signal durations of the shorter length 
are clearly discriminable, thus allowing shock 
and shock-free periods to be identified; yet 
animals did not change to the signaled condi- 
tion. Also, the finding that longer rather than 
shorter signal durations resulted in stronger 
preference for the signaled condition is in- 
compatible with safety (French et al., 1972; 
Perkins et al, 1966). The recent work of 
D'Amato and Safarjan (1979) is also rele- 
vant. These investigators found that rats pre- 
ferred information about the duration of 
shock that they were to receive, even though 
this information was unrelated to shock and 
shock-free periods. However, Freeman and 
Badia (1975), in a study similar to D’Amato 
and Safarjan, found that information about 
shock intensity did not result in a preference. 
Other results indicate that safety is not a 
necessary condition for preference. In a study 
by Badia et al. (1976), the dependability of 
the stimuli identifying shock and shock-free 
periods was varied. Preference for the signaled 
condition was maintained even when a number 
of unpredictable shocks were delivered dur- 
ing the formerly shock-free (safe) period. 
Only when the dependability of safety was 
reduced to a relatively low level was the pref- 
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erence eliminated. It is difficult for the safet 
analysis to deal with these data because «ће; 
shock-free period was only relatively safe, 
not totally safe, under this latter condition. 
Clearly, evidence for and against both the 
preparation analysis and the safety analysis 
exists, and. perhaps both views have their 
merits under specific experimental conditions 
However, preference for signaled shock тау 
be determined by a number of factors, andi 
it may not be possible to incorporate all rele 
vant data under a single principle. 


Prospectus 


Our review of the literature has firmly es- 
tablished the reliability of preference for pre 
dictable shock situations under a variety of 
situations. The review rules out explanations 
of the phenomenon based on methodologica 
considerations, such as differential shock 
avoidance or shock attenuation, It also rules 
out theoretical explanations of preference 
based on conditioned reinforcement or on the 
inherent value of information. Evidence favor 
ing the remaining theoretical views of prep 
aration and safety has been noted. We have 
also noted evidence incompatible with each 
of these latter views. For preparation, as- 
sumptions about the properties of preparatory} 
responses across various conditions often are} 
in conflict when viewed together. In addition, 
preparation theory fails to adequately accoun 
for the evidence showing that signals acquire, 
aversive properties. The preparation view 
also has difficulty accounting for the prefer- 
ence for stronger, longer, or more dense sig- 
naled shock, for the controlling influence of 
stimuli correlated with safe periods, and for) 
the failure of explicitly reinforced preparatory | 
responses to maintain or strengthen condi- 
tioned responses. Similarly, the safety hypoth- 


safety hypothesis alone is sufficient to account 
for the available data. Predictable and unpre 


suggest. There may be other factors within 
ese situations that are important. 

One factor that may be important is local 
reduction in shock frequency or probability. 
Gibbon (1972, 1977), Herrnstein and Hine- 
line (1966), Hineline (1970), and others have 
shown that reduction in shock density, both 
overall and local, is sufficient to maintain 
operant responding. Reduction in shock den- 
sity also maintains choice behavior (e.g., 
Badia, Coker, & Harsh, 1973). Our view of 
Eu based on shock-density reduction 
suggests that subjects change to a signaled 
condition because doing so frequently transfers 
them from a relatively high-density shock con- 
dition (unsignaled condition) to a relatively 
low-density shock condition (signal-absent 
component of signaled schedule), even though 
overall shock density is the same for both con- 
ditions. This view is compatible with a safety 
analysis based on discriminable periods rela- 
tively free of shock. When stated in relative 
rather than absolute terms, the safety hy- 
pothesis can account for choice performance 
under different dependabilities of safety. It 
is obvious that relative safety and local re- 
duction in shock density are simply different 
ways of referring to the same controlling 
variable. 
= A second factor that may be important is 

the role played by classical conditioning. Even 

though this factor is explicitly recognized by 
the preparation hypothesis, it is treated only 
as a manifestation of the law of effect. Classi- 
cal conditioning may have implications for 
behavior and physiology that differ from those 
stated by the preparation hypothesis. For 
example, whereas the preparation view sug- 
gests that conditioning should have beneficial 
effects due to a reduction in the aversiveness 
Of the shock, the physiological evidence sug- 
gests that predictability can be more debilitat- 
ing under certain conditions and less so under 
other conditions (Brady et al., 1962). More 
attention needs to be given to the role of 
classical conditioning and its interaction with 
Operant choice behavior. 

A third factor that needs attention is con- 
trast. The signaled schedule may be compared 
to a multiple schedule in which the two com- 

| ponents are identified by the signal's presence 
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or absence. When components offering differ- 
ent reinforcement parameters are present in 
alternation, contrast effects often become evi- 
dent. Indeed, the reinforcing value of safe 
periods emerges only as such periods are con- 
trasted with periods of danger. The literature 
on contrast is extensive, but it is not yet clear 
how contrast alters the value and the physio- 
logical effects of signaled schedules. 

Whatever the eventual theory, it is evident 
that signaled shock schedules are more com- 
plex than previously thought. Classical con- 
ditioning, successive contrast effects, local re- 
inforcement, in addition to other factors, may 
all affect preference in signaled shock situa- 
tions. The solution to the puzzle is not yet at 
hand, but progress toward such a solution is 
clearly being made. 


Reference Notes 
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The Alpha Experience Revisited: Biofeedback 
in the Transformation of Psychological State 


William B. Plotkin 
State University of New York at Albany 


Presented is a review of empirical research and conceptual perspectives on the 
development of unusual experiential states during electroencephalographic 
(EEG) alpha-biofeedback training. It is concluded that the occurrence of the 
"alpha experience" is relatively independent of the strength or density of EEG 
alpha activity, and that the transformation in experience during feedback train- 
ing can be accounted for by eight categories of complexly interrelated factors: 
(a) sensory deprivation, (b) sustained alertness, (c) concentration/meditation, 
(d) introspective sensitization, (e) expectation, (f) perceived success at the 
feedback task, (g) attribution processes, and (h) individual differences. Con- 
ceptual and empirical implications for biofeedback and for the study of physio- 
logical-experiential relationships are discussed. 


During  electroencephalographic (EEG) 
alpha-feedback training, trainees are pre- 
sented with immediate moment-to-moment in- 
formation on the strength or density of their 
EEG alpha rhythms, which provides them the 
opportunity, in principle, to learn to increase, 
maintain, or decrease the strength of this 
brain rhythm. Of special significance is the 
observation by some researchers (Brown, 
1970; Hardt & Kamiya, 1976a; Hart, 1968; 
Kamiya, 1968, 1969; Nowlis & Kamiya, 
1970) that many persons report entering a 
quasi-meditational state of consciousness dur- 
ing alpha-enhancement feedback training. 
This state of consciousness, often called the 
“alpha experience," is usually identified as a 
pleasant, relaxed, and serene state, charac- 
terized by a loss of body and time awareness, 
an absence or diminution of thought, and a 
feeling of egolessness (Brown, 1970; Hart, 
1968; Kamiya, 1968, 1969; Nideffer, 1973; 
Nowlis & Kamiya, 1970; Plotkin, 19762, 
1977; Plotkin & Cohen, 1976; Walsh, 1974). 

Initially, researchers claimed that the alpha 
experience was intrinsically and directly asso- 
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ciated with enhanced alpha levels, and that 
the alpha experience was, in fact, caused b 
enhanced alpha levels (hence, the name, alph: 
experience). Frequently cited in support of 
this view (in addition to the early alpha-bio 
feedback studies) was the observation thal 
the EEGs of meditators often show increased 
alpha strength during meditation (Anandi 
Chhina, & Singh, 1961; Kasamatsu & Hirai, 
1969; Wallace, 1970). The possibility of di+ 
rectly influencing experience through volum 
tary control of the electrical activity of thé 
brain was a rather provocative notion. Indeed 
the great popular interest in biofeedback may 
have been primarily generated by the ide 
that brain wave – feedback training had the 
potential for being a more efficacious method 
than the traditional meditative disciplines fof 
effecting meditative states, or that at least 
it was а method better suited to the “moderný 
Western temperament.” 

More recently, however, considerable doubt 
has arisen that changes in a person's EEG 
alpha level have any direct or simple relation 
ship to the achievement of the alpha experi? 
ence. Several studies have failed to find 4 
significant occurrence of alpha experienc 
during alpha-enhancement feedback, у 
cially when the research participants wem 
not led to expect such an experience (Beatty! 
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1972; Lynch, Paskewitz, & Orne, 1974; Orne 
48 Paskewitz, 1974; Peper, 1971; Plotkin, 
1976a; Plotkin & Cohen, 1976; Regestein, 
Pegram, Cook, & Bradley, 1974; Travis, 
Kondo, & Knott, 1975). Plotkin (19762), for 
example, found that one of his groups of re- 
search participants, who did not know which 
brain waves were being studied and who were 
not told what kind of experiences to expect, 
described experiences that showed “no con- 
sistent similarities with the experiences that 
-have been widely associated with high and 
low alpha states" (p. 89). Travis et al. sum- 
marized the experiential reports of 140 persons 
who participated in four studies that examined 
the alpha-enhancement phenomenon. They 
concluded that the alpha-enhancement task 
is not as overwhelmingly pleasant as had been 
suggested by Nowlis and Kamiya (1970) and 
by Brown (1970). 
On the other hand, there are several stud- 
.ies that have replicated the finding of alpha 
experiences during alpha training. However, 
when the relationship between EEG alpha 
and experience has been examined, these stud- 
ies have uniformly failed to find significant 
correlations between the degree of alpha en- 
hancement and the intensity or likelihood of 
alpha experiences (Beatty, 1972; Lynch et 
al, 1974; Plotkin, 1977; Plotkin, Mazer, & 
Loewy, 1976; Sacks, Fenwick, Marks, Fen- 
ton, & Hebden, 1972). Lynch et al, for ex- 
ample, concluded that their research partici- 
pants’ “largely positive reactions to the feed- 
back procedure were not the result of large 
increases in alpha activity and are certainly 
not likely to have been a function of alpha 
activity levels alone" (p. 409). Using more 
formalized correlation techniques, Plotkin et 
al. found no correlation between the degree 
of alpha enhancement and the likelihood of 
an alpha experience. 

Even more damaging to the thesis that the 
alpha experience is the result of alpha en- 
hancement is the recently uncovered fact that 
there is absolutely no published evidence that 
alpha training has ever resulted in an un- 
equivocal case of true alpha enhancement 
(Johnson, 1977; Paskewitz, 1977; Plotkin, 
1978). That is, alpha levels have never been 


es to rise above prefeedback eyes-closed 
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resting baseline levels. The subbaseline in- 
creases in alpha production that have often 
been reported during alpha training have been 
shown to be the result of the gradual dissipa- 
tion or neutralization of alpha-inhibitory in- 
fluences, which is a case of disinhibition or 
habituation, not enhancement (Lynch & 
Paskewitz, 1971; Paskewitz, 1977; Paskewitz, 
Lynch, Orne, & Costello, 1970; Plotkin, 1978; 
Plotkin, Note 1). 

Cognizant of this problem, Hardt and 
Kamiya (1976a) have argued that the failure 
to find reliable and significant alpha enhance- 
ment is due to the use of “deficient method- 
ologies,” such as insufficient training time or 
a percentage-of-time measure of alpha rather 
than an amplitude-integration measure (Hardt 
& Kamiya, 1976b). However, Plotkin (1976b) 
has pointed out that most of these “suspect” 
studies used methodologies that were similar 
to, or nearly identical with, those of the origi- 
nal studies of alpha-feedback training—those 
by Brown (1970), Kamiya (1968, 1969), and 
Nowlis and Kamiya (1970). Moreover, a re- 
cent study (Plotkin, 1978) that employed the 
precise methodology recommended by Hardt 
and Kamiya (1976a), including almost 9 
hours of total training time, found no evi- 
dence for the learned enhancement of alpha 
strength significantly above optimal eyes- 
closed baseline levels, although in some cases 
alpha-enhancement training did result in the 
maintenance of optimal alpha levels. 

In summary, there is now solid support for 
the conclusion that alpha-enhancement train- 
ing per se is neither necessary for, nor espe- 
cially facilitative of, the achievement of the 
alpha experience. However, it is important 
to note that the phenomenological authenticity 
of the alpha experience is not being called 
into question here. The point is that alpha 
enhancement per se has not been instrumental 
in—or intrinsic to—the achievement of this 
experience. Although there is always the prob- 
lem of bias and compliance in the report of 
experiential states, most alpha researchers 
have learned that there is simply no doubt 
that many of their trainees have experienced 
highly unusual, meaningful, and occasionally 
profound alterations in consciousness during 
feedback training. That this is so is perhaps 
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Table 1 
Factors Involved in ihe Development of 
Unusual Experimental States During 
Electroencephalograph Alpha- Biofeedback 
Training 
ии“ 
1. Sensory deprivation due to 

(a) The biofeedback setting 

(b) Alpha-feedback-augmented sensory limita- 

tion 

. Sustained alertness (during sensory deprivation) 
. Concentration/meditation 
. Introspective sensitization 
. Suggestion and expectation due to 

(a) Preexperimental expectancies 

(b) Implicit suggestion 

(c) Explicit suggestion 
6. Perceived success at the feedback task 
7. Dual attribution of responsibility inherent in bio- 

feedback training 

8. Individual differences 


[A 


best demonstrated by the extraordinary eager- 
ness of many alpha trainees to repeat the ex- 
perience, to learn all they can about it, and 
to spend considerable sums of money to pur- 
chase or rent the equipment that is seen as 
necessary for the generation of the experience 
(Lawrence, 1972). A recent study (Plotkin, 
in press) that employed a strong demand for 
honesty on experiential reports also supports 
this view. The question at this point is not 
whether these experiential reports are dismis- 
sable as artifacts, but rather, given that the 
attainment of the alpha experience during 
alpha training is not related to any unusual 
change in EEG alpha, how then do we explain 
the occurrence of these experiences? 

We now appear to be in a position to offer 
an adequate answer to this question. Over the 
past few years there has accumulated a sub- 
stantial body of evidence that demonstrates 
that there are at least eight categories of 
complexly interrelated factors (variables) that 
account for the occurrence of alpha experi- 
ences and similar states during alpha-biofeed- 
back training. Table 1 presents an outline of 
these eight categories. Note that alpha en- 
hancement is not among them. 

Besides elucidating the development of 
alpha experiences, a review and discussion 
of these eight variables will highlight and 
illustrate several of the subtle, albeit critical, 
difficulties inherent in the attempt to estab- 
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lish direct or intrinsic relations between phy; 
ological states or processes and experienti 
or behavioral phenomena. There are compli 
methodological problems involved in deter 
mining how a particular physiological state i 
related, if at all, to a particular psychologi 
state or behavior, and in deciding whe 
biofeedback or other means of altering par 
ticular physiological activities are critical to— 
or incidental to—the observed changes in psy: 
chological state (Shapiro, 1977). An under 
standing of how unusual experiential sta! 
are generated in the biofeedback setting wi 
also enhance our knowledge of biofeedback 
training in its larger context of social 
therapeutic influence—as opposed: to its nar 
rower definition as a method of facilitati 
physiological self-control. 

Implicit in the reconstruction I shall off 
of the development of these experiences wi 
be a rejection of the reductionist and mec 
nist position that holds that psychologi 
states are simply the consequence of efficient 
causes such as physiological processes 
reinforcement histories. Instead, I shall pr 
ceed from a contextual human-action pet 
spective, which recognizes that psychologi 
state is one parameter or strand in a compl 
behavioral process that includes cognitio 
motivations, and social significances as оће 
parameters, as well as physiological process 
and learning histories (Ossorio, 1973, 1978; 
Sarbin, 1977). 


Sensory Deprivation 


One hypothesis with respect to the develo) 
ment of unusual experiential states duri 
alpha training is that the alpha-feedback set 
ting happens to be conducive to the develop 
ment of sensory deprivation and the associate 
alterations in consciousness (Zubeck, 1969) 
There appear to be, in fact, two independen! 
aspects of alpha-training procedures that fä 
cilitate sensory deprivation: the attributes 9 
the general biofeedback setting and the 
fects of alpha-enhancement training per se. 


The Biofeedback Setting 


There are several.ways in which the typi 
alpha-biofeedback setting resembles those 
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are employed in sensory-deprivation experi- 
Aimentation. Trainees are usually asked to sit 
in a comfortable chair or to lie on a bed, 
which is typically situated in a small sound- 
proof or sound-attenuated room with low 
lighting or none at all. In addition, trainees 
are commonly asked to keep their eyes closed, 
to relax, and not to move around once they 
have become comfortable, in order not to 
disturb the EEG electrodes, which are sen- 
sitive to electromyographic (EMG) artifacts. 
"Moreover, the standard feedback signal is a 
monotonous tone, usually appearing over a 
headphone set, which the trainees are con- 
stantly monitoring in order to track their 
changing alpha levels. Given these aspects of 
the typical alpha-feedback setting, it is not 
surprising that trainees often report becoming 
relaxed, with a loss of body awareness and 
with the associated sensory-deprivation feel- 
ings of lightness, floating, flying, or losing 
"awareness of the “external” environment. 
However, as with sensory deprivation, some 
persons may react to this setting by falling 
asleep or with boredom, and some with anx- 
iety or panic. The reason these latter responses 
are relatively rare during alpha training in- 
volves other factors discussed later, especially 
Factors 2 and 5 (see Table 1). 

There has been only one piece of research 
(Plotkin, 1978) that has explicitly tested the 
hypothesis that the occurrence of the alpha 
experience is related to the sensory-depriva- 
tion aspects of the feedback setting. In this 
study, I found that persons who engaged in 
10 52-min sessions of eyes-closed alpha-en- 
hancement training without intrasession rest 
periods rated their experiences to be signifi- 
cantly more enjoyable and intense than did 
persons who engaged in precisely the same 
straining with 20-sec eyes-open (and lights-on) 
rest periods interspersed every 4 min. The 
only significant difference in alpha levels be- 
tween the two groups was a greater mean 
amplitude, on Session 1 only, for the group 
that did have the rest periods. Nevertheless, 
persons in the no-rest (high sensory-depriva- 
tion) group reported experiencing less body 
weight, greater personal involvement, faster 
Speed of time, greater happiness, more emo- 
tional activation, greater personal relevance, 
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more thought, and a ^higher" state of con- 
sciousness. Moreover, relative to what might 
be thought to be more common procedures, 
the procedure of interspersing rest periods does 
not decrease the occurrence of alpha experi- 
ences: most of the studies that have reported 
the occurrence of these experiences have em- 
ployed similar interspersed rest periods 
(Brown, 1970; Kamiya, 1968, 1969; Nowlis 
& Kamiya, 1970; Plotkin, 1976a, 1977; 
Walsh, 1974). 


Alpha-Feedback-Augmented Sensory 
Limitation 


Peper (1971) has noted another way in 
which the alpha experience may be related 
to a sensory-deprivation state. The research 
of Mulholland, his associates, and others 
(Mulholland, 1968, 1972, 1973; Mulholland 
& Peper, 1971; Wertheim, 1974) has demon- 
strated that the absence of occipital EEG 
alpha blocking reflects the absence of cortical 
oculomotor processing (in essence, abundant 
alpha occurs when a person is awake and "not 
looking"). Several other research reports 
(Chatrian, Magnus, Petersen, & Lazarte, 
1959; Galin & Ornstein, 1972; Jasper & Pen- 
field, 1949; Klass & Bickford, 1957; Kreit- 
man & Shaw, 1965; Morgan, MacDonald, & 
Hilgard, 1974; Schwartz, Davidson, & Pugash, 
1976) have suggested that the occurrence of 
alpha blocking at cortical locations other than 
the occipital lobe is also due to neural pro- 
cessing at the cortical location in question, 
with the concomitant activation of the behav- 
ioral processes associated with that location. 
In short, abundant alpha is known to accom- 
pany (or to be a sign of) sensory, motor, or 
cognitive quiescence at the cortical level. 
Thus, the maintenance of one’s optimal occip- 
ital alpha level (which is facilitated through 
alpha-enhancement training; Plotkin, 1978; 
Note 1) would be expected to be accompanied 
by an absence of visual control processes. 
Moreover, because of the dominance of vision 
in the human being, alpha maintenance in 


the occipital lobe alone would be expected 


to have a generalized sensory limitation ef- 
fect; that is, we would expect that the easiest 
way for an awake human being to minimize 
oculomotor activity—and thereby optimize 
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occipital alpha levels—would be for him to 
focus his attention on cognitive activity, and 
thus away from all sensory modalities, inas- 
much as oculomotor activity is a concomitant 
of all sensory orientations. 

In summary, it appears that occipital alpha- 
enhancement training results, to some degree, 
in a self-imposed sensory-deprivation state. 
Such a state would be expected to be char- 
acterized by increased nonsensory activity or 
awareness. However, the particular nature of 
this wakeful nonsensory state (e.g., introspec- 
tion, daydreaming, boredom, hallucination, 
and some forms of meditation or contempla- 
tion) will depend on factors other than EEG 
alpha. Plotkin (1976a) and Plotkin and Cohen 
(1976) have demonstrated that there is a 
wide range of experiences that occur during 
alpha-enhancement training when the trainees 
are not led to expect any particular experi- 
ences. However, all of them are instances of 
nonsensory—and in particular, nonvisual— 
states. 


Sustained Alertness During Sensory 
Deprivation 


The quality of the experiences that occur 
during sensory deprivation would be expected 
to be very much influenced by the concurrent 
degree of alertness or drowsiness. In consider- 
ing the explicit and implicit demands for re- 
laxation in conjunction with the physical 
attributes of the feedback setting, one would 
expect drowsiness to regularly accompany 
alpha-feedback training. This development 
would be a problem for the researcher who 
is interested in evoking the alpha experience, 
since it cannot be experienced if the trainee 
is asleep; the alpha experience is an alert (al- 
though relaxed) state. 

It is fortunate, therefore, that alpha-en- 
hancement training facilitates the maintenance 
of alert wakefulness by facilitating the main- 
tenance of naturally occurring eyes-closed 
alpha amplitudes. This feature of alpha train- 
ing stems from one of the oldest EEG find- 
ings: Alpha activity decreases in amplitude 

and frequency, and essentially disappears, as 
a person becomes drowsy and approaches sleep 
(Adrian & Mathews, 1934; Berger, 1930; 
Lindsley, 1960). Thus, in order to keep the 
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feedback signal on, the alpha trainee must 
learn to stay alert under sensory-deprivatiom? 
conditions, a nontrivial task at which most | 
trainees are nevertheless able to succeed. How- 
ever, any task that would facilitate alertness 
and be compatible with sensory deprivation 
would do as well as alpha training in this 
regard. Yet we should note that alpha train- 
ing is especially well suited for this purpose: 
because it can be an absorbing task despite | 
the fact that it involves only monotonous 2 


sory stimulation (which renders it compatible 
with sensory deprivation). j 

Relaxed alertness may also be facilitated in 
the biofeedback setting by an upright posture 
(as in the traditional meditation position), 
by high levels of motivation or expectancy 
(to be discussed later), and, of course, by 
normal amounts of prior sleep. 

The role of alpha training in facilitating 
relaxed alertness may help to explain why те-| 
search participants in a noncontingent-feed- 
back or a no-feedback group might not be as 
likely to report alpha experiences as those in 
a contingent-feedback group: The noncontin- 
gent and no-feedback participants are more 
likely to drowse off. Thus, it is not the case 
that enhanced alpha causes the alpha experi- 
ence, or even that maintained optimal alpha 
is uniquely, intrinsically, or directly associated 
with the alpha experience; rather, drowsiness 
or sleep (which is accompanied by reduced 
alpha levels) is incompatible with the alpha 
experience. Maintained optimal alpha per sé 
is as closely associated with alert daydream- 
ing, mind-wandering, and boredom as it is 
with meditative experiences. Therefore, al- 
though alpha training per se may contribute 
to the occurrence of the alpha experience, it 
is not especially facilitative of it. The other 
factors discussed above and those to be ex; 
amined below have been shown to be much 
more critical and influential in effecting the 
experiential state. 


Concentration/Meditation 


In addition to its sensory-deprivation qual- 
ities, the alpha-training procedure has 50 
other important similarities to many medit 
tion exercises, for example, immobility and 
concentration on, or sustained attention t0; 
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monotonous stimulus. The alpha trainees’ task 
is to keep the alpha tone on as long (and/ 
or as loud) as possible. To accomplish this, 
they must intently focus their attention on 
the feedback tone, its variation, and the rela- 
tion between the tone and their behavior and 
experience. This prolonged concentration on 
the feedback tone is formally equivalent to 
the meditator's sustained attention to breath- 
ing, to a mantra, to chanting or prayer, to a 
mandala, or to any other invariant or regu- 
lar form (i.e., meditation object). As Naranjo 
and Ornstein (1971) have pointed out, this 
form of meditation exercise, which they call 
concentrative meditation, eventually results 
in a temporary suspension of ordinary thought, 
which is a central feature of the meditative 
(and alpha) experience, also reported by 
Deikman (1963) in his study of experimental 
meditation. 
However, since the significance of the feed- 
“back signal, in its role as a meditation object, 
derives from its monotony, neutrality, and 
simplicity and not from its EEG contingency, 
it follows that a noncontingent tone would 
serve as well, in this regard, as the alpha tone 
in facilitating the generation of the alpha ex- 
perience. 


Introspective Sensitization 


Recently, in a highly intriguing study, Hunt 
and Chefurka (1976) demonstrated that short 
periods of simply paying direct attention to 
one's “immediate subjective experience” 
elicited “anomalous subjective reports” and 
“altered-state effects” (p. 867). By “imme- 
diate subjective experience,” Hunt and Chef- 
urka mean “the bare features of momentary 
awareness without any reference to the con- 
sensual world of objects, persons, and mean- 
ings" (p. 868)—the “stimulus qualities" of 
sensations devoid of their significance as ob- 
servations of everyday objects. Research par- 
ticipants who were requested simply to pay 
attention in this fashion for 10 min, without 
any explicit suggestions as to what to expect, 
generally reported “visual anomalies, uncanny 
emotion, . . . cognitive disorientation, and 
: - . feelings of interpersonal detachment and 
loneliness" (p. 872). The authors state that 
Such data "suggest that altered-state effects 
jcan be tapped in very short time periods in 
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any situation involving lack of movement, iso- 
lation, and at least implicitly, some attention 
to subjective experience” (p. 869). 

The alpha-training situation certainly in- 
cludes all of the latter three features. As we 
have seen, isolation and lack of movement 
are components of the sensory-deprivation 
qualities of the biofeedback setting. In addi- 
tion, nearly all alpha-training studies include, 
at the very least, the implication that the 
training will result in mild to profound 
changes in experiential state. These experiential 
changes, which include changes in body aware- 
ness, are often explicitly outlined for the 
research participant before the onset of train- 
ing (as will be discussed later). Thus we can 
conclude that the simple act of paying direct 
attention to one’s sensations as sensations, 
which is a feature of the alpha-training con- 
text, can be expected to result to some de- 
gree in unusual experiential reports, indepen- 
dent of explicit suggestion, the degree of 
alertness, EEG alpha amplitudes, the presence 
of tones, or EEG-tone contingencies. ^Intro- 
spective sensitization,” as Hunt and Chefurka 
(1976) have termed it, is as much a feature 
of the alpha-training setting as it is of sen- 
sory deprivation, meditation, and hypnosis. 
Erickson, Rossi, and Rossi (1976), for ex- 
ample, have pointed out that 
the essential identity between periods of introspec- 
tion and trance was demonstrated by Erickson . . . 
when he found that groups of subjects asked to per- 
form a task in introspection underwent behavioral 
and subjective experiences that were similar to those 
they had when they went through a classical hyp- 
notic induction. (p. 196) 


Hunt and Chefurka (1976) also found that 
the experimental protocols of the classical 
introspectionists — (e.g, Titchener, 1912; 
James, 1950; and Spearman, 1923) "revealed 
subjective anomalies similar to those found 
in drug and meditational states" (p. 867). 


Suggestion and Expectation 


The most widely endorsed hypotheses ad- 
vanced to explain why unusual experiential 
states occur during alpha training have evoked 
such social psychological factors as sugges- 
tion, expectation, and the demand characteris- 
tics of the experimental setting (Beatty, 
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1972; DeGood, Elkin, Lessin, & Valle, 1977; 
Lynch & Paskewitz, 1971; Lynch et al., 1974; 
Peper, 1971; Plotkin, 1976a, 1976b, 1977, 
1978; Plotkin & Cohen, 1976; Plotkin et al, 
1976; Valle & Levine, 1975; Walsh, 1974; 
Glaros, Note 2). The central phenomenon 
here, of course, is the research participants' 
expectations about what sort of experiential 
changes will take place during training. These 
expectations can come about through (a) pre- 
experimental knowledge of alpha waves and/ 
or alpha training, (b) explicit suggestion from 
the experimenter or from confederates, or (c) 
implicit suggestion (i.e., other demand char- 
acteristics). 

It is certainly not surprising that expecta- 
tion would play an important role in the 
development of alpha experiences during 
alpha-feedback training. After all, it has long 
been known that expectation has a very 
powerful influence on the often unusual ex- 
periences associated with hypnosis, relaxation 
procedures, meditation, and psychoactive 
drugs, as well as on all sorts of more common 
experiences, In reference to alpha training, 
Lynch and Paskewitz (1971), for instance, 
have made the following observation: 
Subjective reports are frequently influenced by the 
experimental setting and the course of the experiment 
itself. It is certainly possible that some of the re- 
ports of Ss in the feedback situation are influenced 
by what Orne (1962) has called the “demand char- 
acteristics” of the situation, that is, Ss enter the 
experiment expecting to experience alterations in 
mood, expecting the session to be pleasant, perhaps 
a “high,” or if they don’t feel this way initially, the 
experimenter may reinforce such feelings, both in 
the pre-experimental interview and in the actual in- 
structions given during the experiment. (p. 212) 


Preexperimental Expectancies 


There has been only one published study 
(DeGood et al., 1977) that has explicitly as- 
sessed the effect of preexperimental expec- 
tancies on experiential reports of alpha train- 
ing. DeGood et al. gave each of their research 
participants two 30-min feedback sessions: 
an alpha-enhancement session and an alpha- 
suppression session. Half of the participants 
had indicated on a screening questionnaire 
that they had some knowledge of alpha train- 
ing, whereas the other half were “ynknowl- 
edgeable” persons. The postexperimental ex- 
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periential questionnaires indicated that only 
the knowledgeable persons experienced differ-7 
ent subjective states during the enhancement | 
and suppression sessions, with their reports 
of enhancement training being significantly 
more like the alpha experience than their re- | 
ports of suppression training were. 


Implicit Suggestion 


To my knowledge, the effects of implicit. 
suggestion in the alpha-feedback setting have 
never been documented independently of pre- 
experimental expectancy. For instance, when | 
expectancies are operative, one would natu: | 
rally expect that informing a subject that she | 
or he is about to begin an “enhancement trial” 
would serve as an implicit suggestion that the 
alpha experience is about to occur. This view 
offers the most plausible interpretation of a 
recent study by Glaros (Note 2), in which 
participants in one of the groups recruited for | 
an “alpha-wave experiment" were given non- 
contingent (tape-recorded) feedback during 
both “enhancement” and “suppression” trials. 
These persons reported significantly more 
alpha experiences during the “enhancement” 
trials even though there was no difference in 
EEG alpha density between these two con- 
ditions. 

Implicit suggestion should also be con- 
trasted with inadvertent or informal sugges- 
tion, which occurs when the suggestion is not 
a formal component of the experimental in- 
structions, Presumably, inadvertent or infor- 
mal suggestion was at play in the early alpha 
studies, in which it appeared that there was 
a unique or intrinsic relation between alpha 
enhancement and the alpha experience 
(Brown, 1970; Kamiya, 1969; Nowlis & 
Kamiya, 1970). | 


Explicit Suggestion 


The effects of explicit suggestion on 1 
ported experiences during alpha training have 
been an informal or secondary focus of sev- 
eral studies (e.g., Beatty, 1972; Lynch et al, 
1974; Plotkin, 1976a). Beatty compared the 
experiential reports of participants in tV? 
contingent-feedback groups, one of which 
received no information about the alpha €* 
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perience, while the other was informed of the 
a phenomenological attributes of the experience. 
He reported a lack of uniformity in the re- 
ports of the no-information group. However, 
“subjects in the Information Condition, pre- 
sumably because of their initial biases, re- 
ported the typical correlates of brain alpha 
rhythms—relaxation, calmness, inner aware- 
ness, etc.” (p. 154). Lynch et al. reported 
that most of their subjects had positive reac- 
tions to the feedback procedures, but that 
‘the most likely explanation for these posi- 
tive reports rests in the fact that Ss were 
told that the experience would be a pleasant 
one” (p. 409). 
There are three published studies that have 
systematically investigated the effects of ex- 
plicit suggestion concerning the nature of the 
alpha experience (Plotkin, 1977; Plotkin et 
al, 1976; Walsh, 1974). Walsh employed a 
bidirectional design in which half of his re- 
search participants received alpha-suppression 
feedback and half received alpha-enhancement 
feedback. In addition, half of the persons in 
each of these groups received a “positive” 
alpha-experience set (explicit suggestion as to 
the nature of the experience), and half re- 
ceived a neutral set (general description of 
several possible experiences). Each research 
participant then received two 20-min sessions 
of alpha training, one with eyes open and 
one with eyes closed. Walsh found the typi- 
cal alpha experience to be reported only when 
persons were given both the alpha-experience 
set and alpha-enhancement feedback. Either 
alone was not sufficient. Walsh interpreted 
these results as not only demonstrating the 
importance of suggestion, but also showing 
| that the alpha experience is directly asso- 

ciated with the alpha rhythm, though this 
„association may be blocked by “situational 
factors” unless the person is provided with 

“appropriate preparation for the experience, 

including some concepts to use in describing 
it" (p. 433). This latter conclusion, however, 
| is not warranted by Walsh's data, since there 
is the following alternative interpretation, 

Which has considerable independent support. 

Rather than demonstrating that the alpha 

experience is directly associated with alpha 
activity, Walsh's study may have shown only 
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that the relative absence of alpha activity 
(during alpha-suppression feedback) is par- 
tially or wholly incompatible with the alpha 
experience (as well as with numerous other 
sorts of experiences associated with an in- 
hibition of alpha blocking). According to this 
interpretation, the absence of alpha experi- 
ences in the groups that received alpha-sup- 
pression feedback (regardless of whether or 
not they also received the alpha-experience 
set) would be explained by this incompati- 
bility rather than by a special, direct, or one- 
to-one relation between the alpha rhythm and 
the alpha experience. The finding of signifi- 
cantly more alpha experiences in the alpha- 
enhancement group that also received the 
alpha-experience set can be straightforwardly 
interpreted as demonstrating the effects of 
suggestion. Plotkin (1976a, 1977) and Plot- 
kin and Cohen (1976) present data that dem- 
onstrate that alpha suppression, through its 
association with oculomotor activation, is an- 
tagonistic to the occurrence of the alpha 
experience. 

In a related study, Plotkin et al. (1976) 
attempted to demonstrate the effects of sug- 
gestion on the experience of alpha training. 
Before the start of the 30-min eyes-open 
alpha-enhancement training, one group re- 
ceived an alpha-experience set; these research 
participants were explicitly informed about 
the specific experiential changes that were 
associated with increases in the volume of 
the feedback tone. The other group received 
no explicit suggestions whatever regarding 
what sort of experiences, if any, to expect. 
(Note that this no-set group differs from 
Walsh’s neutral-set group. The latter was in- 
formed of a wide range of possible experi- 
ences, which included but did not emphasize 
the alpha experience.) No mention was made 
to any participant in either group that the 
research had anything to do with alpha waves 
(in order not to evoke preexperimental ex- 
pectations). After the session, written experi- 
ential reports were collected and rated by 
blind judges on their similarity to a standard- 
ized description of the alpha experience. 
Somewhat surprisingly, the results showed 
that the likelihood of an alpha experience 
was not significantly different for the two 
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groups. However, this finding later came into 
focus when it was discovered that many of 
the research participants had spontaneously 
noted, on their postexperimental question- 
naires, that they had felt very frustrated in 
their attempts to increase the tone volume 
(ie., to enhance alpha). Thus, the alpha- 
experience set may have been ineffective at 
evoking alpha experiences because (a) the 
experience of frustration was incompatible 
with the alpha experience and (b) many 
trainees saw themselves as having failed at 
the very task that leads, they were told, to 
the alpha experience. This interpretation, 
then, suggests that the degree of perceived 
success at the feedback task should interact 
strongly with expectation. This hypothesis 
was tested in the following study. 


Perceived Success at the Feedback Task 


After the Plotkin et al. (1976) study, I 
was interested in demonstrating two separate 
points: (a) that the intensity of experiences 
reported to occur during alpha training would 
depend on perceived success at the enhance- 
ment task and be independent of actual suc- 
cess (the actual alpha amplitude relative to 
baseline) and (b) that the specific quality 
of the experiences would depend on the ex- 
plicit suggestions given to the participants, 
assuming that preexperimental expectations 
were minimized. These two hypotheses were 
incorporated into a single 2 X 2 factorial de- 
sign, in which the two between-group vari- 
ables were Perceived Success (high or low) 
and Expectation (of one of two different sorts 
of altered states of consciousness). All par- 
ticipants received four 30-min. sessions of 
eyes-closed alpha-enhancement training, al- 
though it was insured that no participant 
thought that he or she was participating in 
research that was at all concerned with alpha 
waves. Persons who were randomly assigned 
to the lambda-expectation group were led to 
believe that they were training to enhance 
their "lambda" brainwaves, whereas the par- 
ticipants in the kappa-expectation group were 
told they were going to learn “kappa” en- 
hancement. Both groups were told that they 
would find themselves in an altered state of 
Consciousness (the “lambda” or “kappa” 
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state, depending on the group) if they wer 
successful at lambda (or kappa) enhan 
ment. To determine the power of the Expects 
tion variable, the lambda and kappa stat 
were described as maximally dissimilar withi 
the constraints of the biofeedback setting an 
the necessity of matching the motivation 
levels of the two groups. For instance, Беса! 
of the sensory-deprivation qualities of th 
feedback setting, both groups were inform 
that the experience involved a loss of bod) 
awareness. Also, to insure an equal motivay 
tion to succeed, both states were described 4 
pleasant, relaxing, and highly unusual a 
special. On the other hand, the two stal 
were defined at opposite poles with гезре 


time, The lambda state (which correspond! 
to the alpha experience) was situated at th 
low end of each of these four dimensions, and 
the kappa state, at the high end. The majo 
distinctive features of the lambda state were 
described as follows in the protocols: 


The “mind” slows down considerably during th 
lambda experience until the point is reached at which 
there is absolutely no thought, a condition oft 
described as “blank mind.” Even in the lighter 

« . thought is very slow and free-flowing. . 
Eventually, thought stops entirely. However, 
mind is nevertheless alert and awake at all times 
: lt is a clear and serene state—beyond emoj 
tion. . . . It is an “egoless” state, characterized b) 
little or no awareness of oneself as a separate entity; 


_ In contrast, the kappa state was described 
in the following manner: 


The “mind” becomes much more efficient in a cet 
tain sense, That is, the kappa experience is a stall 
of very abundant and highly deliberate thought; We 
are able to direct our thought to any topic of in 
terest and to process information at an extremeljy 
rapid rate. . . . It is characterized by abun 
personal thought of a significant and often ins 
ful nature. The kappa experience often involvi 
thoughts of interpersonal experiences that are emo 
tionally relevant . . . you become very aware 0 
your personal strengths and of precisely who y? 
are as a person, 

Although all the research participants M 
ceived the same form of contingent 41 
feedback, half of those in each of the 4 
groups—the “success” subjects—were 


"(regardless of the actual degree of success) 
by reporting to them (every 2 min via an 
intercom) a three-digit number that was os- 
tensibly proportional to the preceeding trial's 
average alpha (“lambda” or “kappa”) am- 
plitude but was, in fact, the actual score in- 
flated at a rate of an additional 296 every 
2 min. Thus, while these “success scores" 
were still responsive to actual alpha ampli- 
tudes, and while the actual feedback tones 
E were still being used, these trainees were 
nevertheless led to believe that they were 
improving at a somewhat remarkable, yet 
convincing, rate. In addition, persons in the 
Success group were given frequent verbal 
praise. On the other hand, persons in the 
“Failure” groups were given their actual 2- 
min scores, although their instructions in- 
formed them that the kappa (or lambda) ex- 
perience does not even begin to occur until 
kappa (or lambda) strength is increased by 
at least 100% over initial levels (which never 
happens). 

The fact that all participants were given 
contingent feedback is noteworthy. As many 
researchers have informally noted in their 
own labs, and as Strayer, Scott, and Bakan 
(1973) have formally demonstrated, many 
persons receiving noncontingent alpha feed- 
back quickly grow discouraged, and often 
become drowsy or fall asleep. For this reason, 
noncontingent feedback (e.g, tape-recorded 
tones of a successful trainee) would not be 
effective for the Success group (they would 
not be as likely to feel successful) or for 
the Failure group (because, if they became 
drowsy, they would then manifest lower alpha 
amplitudes than the Success group, in which 
„сазе perceived success would be confounded 
with actual success). In the experiment under 
consideration, then, the Failure subjects were 
able to feel that they had at least some con- 
trol over the tone (which they did, in fact, 
have), although they were not able to pro- 
duce the degree of enhancement that they 
believed was required to experience the al- 
teration in consciousness. At any rate, it is 
because of the danger of the noncontingency 
being recognized that a noncontingent-feed- 


i to feel highly successful at the feedback task 
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back group is often not an adequate control 
group (Plotkin, Note 1). 

The results from this study were straight- 
forward. In general, and with few exceptions, 
in the written postexperimental questionnaires 
persons in the Kappa-Success group reported 
very powerful and genuine kappa experiences, 
while persons in the Lambda-Success group 
reported authentic lambda experiences. Most 
persons in both of the Failure groups reported 
“nothing unusual.” These differences were 
borne out by statistical comparisons of the 
participants’ ratings of their experiences on 
a series of 1-to-9 scales (Plotkin, 1977). As 
for the EEGs, there were no differences even 
approaching statistical significance between 
any of these groups in the degree of alpha 
enhancement actually achieved. 

These results demonstrated (a) that the 
general intensity, pleasantness, value, and the 
degree of relaxation and sensory-deprivation 
effects that are reported to occur during alpha 
training are, indeed, very strongly influenced 
by the degree of perceived success at the feed- 
back task; (b) that there is a wide range of 
experiences compatible with the biofeedback 
setting and abundant alpha activity—the 
“alpha experience” does not have a unique 
relationship to EEG alpha; (c) that the spe- 
cific qualities of the reported experiences are 
closely related to the participants’ expecta- 
tions; and (d) that the quality and intensity 
of the experiences that occur during alpha 
training are independent of the actual degree 
of success at alpha enhancement. 


The Attribution Process in 
Alpha Training 


It has been established that alpha training 
per se is of little importance in the genera- 
tion of experiential changes during these pro- 
cedures. Nevertheless, it may be the case that 
the alpha experience is more likely to occur 
when the research participant can attribute 
the experience to alpha training or to another 
biofeedback procedure. For instance, would 
we find—or expect to find—equally profound 
changes in consciousness if we simply placed 
a person in a dark room for an hour and told 
him to expect—or to produce—certain ex- 
periential changes? What is it about the fact 
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that alpha trainees have this specific attribu- 
tion available to explain to themselves the 
Occurrence of these unusual experiences 
(namely, that they are the result of alpha 
training) that might enhance the likelihood of 
such experiences occurring in the first place? 

An initial answer would be that alpha train- 
ing serves essentially the same role as an in- 
active drug placebo: Just as there are many 
persons who will experience a suggested psy- 
chological effect after ingesting a purportedly 
psychoactive drug that is in fact only a sugar 
pill, there are many biofeedback trainees who 
will experience a suggested psychological ef- 
fect during purportedly psychoactive biofeed- 
back training that in fact has no significant 
physiological effect. (See Peek, 1977, for a 
discerning conceptualization of the placebo 
effect.) 

However, the expectancy effect that is real- 
izable in the biofeedback setting may be more 
powerful, or may at least have more active 
dimensions, than the typical drug placebo. 
There is one particular difference between a 
biofeedback treatment and a drug placebo 
that perhaps constitutes the most notable 
contribution of the entire biofeedback ap- 
proach to therapeutic intervention: namely, 
the opportunity for the client, patient, or re- 
search participant to become an active agent 
in the process of change, control, or therapy 
(Stroebel & Glueck, 1973; Plotkin, Note 3). 
Whereas the recipients of a drug placebo are 
led to attribute the physiological, behavioral, 
or psychological transformation entirely to 
the drug (and thereby to reduce their own 
sense of responsibility and self-control), bio- 
feedback trainees (whether or not the train- 
ing per se has a humanly significant effect) 
will attribute a desirable outcome at least 
partially to themselves, which will enhance 
their sense of responsibility and self-control. 
Tn addition, there is, of course, a much greater 
likelihood of individuals eventually achieving 
complete self-control (without the aid of drugs 
or biofeedback) when they start from a point 
Of some perceived control and move to one 
of more control, than when they attempt to 
go from no control to some control (Davison 
& Valins, 1969). In short, unlike placebo- 
treated persons, biofeedback trainees have 
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been prepared to see themselves as leas 
eligible for self-control of their problems, be: 
havior, and/or experience (Plotkin, Note 3) 

Thus, the biofeedback approach takes ail 
vantage of a combination of internal and ex 
ternal attributions of the suggested effec 
There are two reasons why a biofeedbad 
intervention may lead to a more роже 
effect than an analogous external-placebo ap 
proach does. First is the fact that the experi 
ence of success at the feedback task may con 
tribute directly to the outcome, especiall 
when the major effect is an experiential й 
psychological change, as in the present cas 
of alpha training. The self-control of an “in 
voluntary" bodily process, especially one 4 
mysterious and vital as brain wave activity, 
may be justifiable grounds for feelings 0 
unusual self-mastery and for the accompany: 
ing positive affect. Furthermore, in the cast 
of biofeedback there is nothing ambiguous 
about the occurrence of success: There is an 
objective measure of progress in the form 0 
a feedback meter, tone, or other quantified 
index of control. Thus, the clinician or ek 
perimenter who employs biofeedback as 4 
placebo intervention can arrange for his 01 
her trainee to receive an indisputable feed: 
back of “progress,” which can serve as a үе) 
compelling counteragent to a trainee’s lack 
of self-confidence and hence, as a powerful 
mobilizer of the trainee's motivations ani 
skills. 

The second advantage that the biofeedback 
intervention has over the external placebo 
follows from the fact that biofeedback traine 
see themselves as active agents; they a 
therefore motivated to exert their own efforls 
toward producing the effect, an approach thal 
may be expected to be more successful tha? 
that of placebo-treated persons, who usualll 
have no reason to actively “help along” thi 
drug (Valins & Nisbett, 1972). Alpha trainee 
would be expected to become more involvé 
in—and thereby more influenced by—a P 
cedure the effects of which they can see them 
selves as having facilitated than they WO! 
in a procedure that is ostensibly produ 
solely by an external agent. 

Thus, Valins and Nisbett recommend tha 
the individual who is treated with a drug % 
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/ placebo intervention be advised that the drug 
is “not so strong” and that it must be “helped 
along” by the appropriate self-control be- 
haviors. When these procedures are used, the 
individual’s self-doubts (about whether he or 
she can contribute to the production of the 
effect) are circumvented, and motivation and 
involvement are maintained. The biofeedback 
placebo goes even further in that the trainee’s 
immediate task—control of the feedback sig- 
nal—is at least one step removed from con- 

“trol of the target process or state (e.g., blood 
pressure, muscle group, or state of conscious- 
ness) and is thus less likely to evoke the 
trainee’s doubts concerning his or her com- 
petence. 

We would expect that alpha training would 
also be more effective than a procedure in 
which only internal attributions are available 
because most persons would probably not see 
themselves as able to induce such experiential 

‘states on their own without special training 
(for if they did, they would have done so 
already! ). A person who starts out on a task 
that is believed to be impossible or doomed 
to failure is obviously less likely to succeed 
than one who thinks he or she has a good 
chance to succeed (Peek, 1977; Plotkin, 
Note 3). 

These views concerning the attribution pro- 
cess in alpha training have received some sup- 
port in a recently completed study (Plotkin, 
in press) in which experiential reports from 
six groups were compared. All the research 
participants were exposed to the identical 
physical setting and received the same ex- 
plicit suggestions of an alpha experience. They 
were divided into the following 8 groups: 
(a) contingent EEG alpha-biofeedback train- 
ing (participants in this group were instructed 

*to try to increase the volume of a feedback 
tone; they were told that successful perfor- 
mance would enhance the strength of their 
alpha brain waves and thereby result in the 
alpha experience); (b) noncontingent bio- 
feedback (instructions were identical to those 
above, but the “feedback” tone was in fact 
a tape recording of a successful trainee's feed- 
back); (c) concentration exercise (an inter- 
nal-attribution-only condition; the partici- 

pants’ task was to use the tape-recorded tone 
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as a concentration object; successful concen- 
tration would lead to the alpha experience) ; 
(d) brain wave stimulation (the analog to the 
inactive-drug placebo; an external-attribution- 
only condition; the participants were informed 
that the alpha experience would be directly 
induced by a combination of electrical brain 
stimulation and computer-programmed audi- 
tory stimulation); (e) a combination of the 
last two groups (the participants were told 
that they were receiving direct brain wave 
stimulation, but that they must “help it 
along” by concentrating on the tone); and 
(f) self-induction (another internal-attribu- 
tion-only condition, but unlike the concentra- 
tion condition, participants were given no in- 
duction strategy; rather, they were on their 
own to self-induce the alpha experience in 
any way they could). The results from this 
study indicated that persons in the first two 
groups (biofeedback training) reported sig- 
nificantly more intense alpha experiences than 
did those in the other four groups. Moreover, 
there were no differences in experiential re- 
ports between the contingent and noncontin- 
gent versions of the biofeedback condition. 


Individual Differences 


Although the results of the Plotkin (1977) 
study demonstrated that reported experiences 
during alpha training were closely related to 
the participants' expectations and their de- 
gree of perceived success, there were never- 
theless a few highly atypical responses in each 
group. A few persons in the Success groups 
reported no unusual experiences, or even the 
opposite experience to what they had been 
led to expect. In addition, a few persons in 
the Failure groups reported the suggested 
experiences despite our attempts to induce a 
perception of failure. These findings point to 
the importance of considering individual dif- 
ferences. Persons may have greater or lesser 
ability and/or disposition to self-induce un- 
usual experiences or to self-induce one kind 
of experience over another. Moreover, persons 
differ in their proneness to experience sen- 
sory-deprivation effects, in their susceptibility 
or openness to suggestion, in their capacity to 
be comfortable in an experimental setting, 
in their disposition to follow instructions and 
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cooperate with the experimenter, and in many 
other relevant attributes. There has been very 
little work explicitly relating personal char- 
acteristics to the individual differences in re- 
ported experience during alpha training. A 
study in progress (Plotkin, Note 4) will cor- 
relate experimental reports with (a) state and 
trait anxiety (Spielberger, Gorsuch, & Lu- 
shene, 1970), (b) Rotter's (1966) Locus of 
Control Test, and (c) Shor's (1960) Personal 
Experience Questionnaire, which assesses the 
individual's proneness to naturally occurring 
altered-state experiences. 

Related to the issue of individual differ- 
ences is the study by Marshall and Bentler 
(1976), which demonstrated that alpha 
trainees who are deeply relaxed during train- 
ing (as measured by forehead EMG) report 
significantly more alpha experiences than less 
relaxed trainees do, even when there are no 
differences in alpha density between the two 


groups. 


Discussion 


The research and concepts reviewed here 
appear to provide an adequate explanation 
of the development of unusual experiential 
states during alpha training. Of particular in- 
terest is the finding that there is no special, 
unique, or intrinsic relation between EEG 
alpha levels and the likelihood or intensity of 
the meditative state of consciousness known 
as the "alpha experience." Yet, at the same 
time, we can understand why it once appeared 
to researchers who were absorbed by the idea 
of a direct and simple relation between states 
of consciousness and neurophysiology that 
there was such an association: As it turns out, 
the alpha-feedback sitwation appears to be as 
effective as any other known procedure for 
generating such experiences in persons who 
do not have special meditation training. How- 
ever, researchers were evidently unsuspecting 
of the strength and complexity of the eight 
non-EEG factors outlined in Table 1. Natu- 
rally, the early biofeedback investigators as- 
sumed that it was their operant-conditioning 
procedures, and not a conspiratorial set of 
“incidental” variables, that were responsible 


for their exciting results. 
However, the research reviewed here does 
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more than remind us that experiential state 
are complexly related to a host of parameters 
besides EEG alpha levels. It calls into ques 
tion the entire enterprise of “mapping con. 
sciousness”  neurophysiologically (Hilgard, 
1969; Kamiya, 1968; Peper, 1972; Stoyys 
& Kamiya, 1968). As Grossberg (1972) ha 
pointed out in reference to the present context 
of alpha-feedback studies, physiology and ex 
perience are of distinctly different logical 
types, so that not only is it unsurprising 
we rarely find very tight relationships ђе 
tween them, it is also conceptually inappre 
priate to speak of any empirical correlations 
between them as a case of “mapping” if by 
this term we mean that the physiological 
states are formally equivalent to—or are effi 
cient causes of—states of awareness, experi. 
ences, or behavior. What is called for at this 
point is not additional empirical specifications 
of physiology/experience or physiology/be- 
havior correlations but rather an explicit and 
systematic articulation of the concepts of 
"persons" and “behavior,” and of the logical 
relations between behavior, experience, psy- 
chological state, and physiological states (e. 
see Ossorio, 1973, 1978). Such an articulation 
will allow an understanding of the signifi 
cance and implications of psychophysiological 
correlations that goes well beyond the sim: 
plistic and misleading notions of “mapping” 
or efficient cause. 

There are several other implications of the 
findings reviewed here. First, although the 
research demonstrates that EEG alpha-bio- 
feedback training per se is neither necessary 
for nor especially facilitative of the achieve 
ment of the alpha experience, the findings 
nevertheless add up to a very positive col 
clusion concerning self-regulation: The tè 
search demonstrates that we have greater 
abilities of self-control of experiential state 
than we have hitherto been willing to grant 
ourselves. As I concluded in an earlier arti 
(Plotkin, 19762), 


we should not be unduly disappointed that there 
no direct association between enhanced alpha 8? 
the alpha experience. The chain of research on рі 
feedback, from Kamiya’s . . . first paper il 
the present, has been valuable in showing us th 
although we once thought that a box of amp: 


and filters had made it possible to induce 8 С 


ALPHA EXPERIENCE REVISITED 


sirable state of consciousness more rapidly and ef- 
Yfectively than ever before, in fact we were really 
always doing it "on our own. We simply dis- 
Covered once again that often people only need a 
‘certain degree of faith in their natural powers and 
abilities, along with an appropriate setting and sim- 
ple instructions, in order to accomplish what they 
feel is normally beyond their potential . . . The 
power to enter altered states of consciousness is a 
natural ability that we all can potentially tap; learn- 
ing how to do this without external devices such 
as electronics and drugs will serve to expand our be- 
havior potential in the widest range of circumstances, 


(p. 97, italics in original) 
qu ginal) 


Thus, it appears that Maslow (1969), for 
example, was somewhat mistaken when he 
concluded from the early alpha studies that 
“it is already possible to teach people how 
to feel happy and serene" (p. 728). It would 
now seem more appropriate to say—and, in- 
cidentally, this is more in keeping with hu- 
manistic themes—that we have discovered 
that it has always been possible for people to 

* allow themselves to feel happy and serene. 

A second related implication concerns the 
similarity between the alpha-feedback phe- 
nomenon and the hypnotic situation: Both 
are ways in which latent behavior potential 
can be evoked, and in a similar manner. One 
procedure for inducing the hypnotic state 
(Plotkin & Schwartz, Note 5) centers around 

_ the hypnotist's carefully timed redescriptions 
of behavior: The hypnotic subject's behavior 
is redescribed in such a way that the subject 
comes to see his or her own behavior as oc- 
curring "automatically" or under the “con- 
trol" of the hypnotist. For example, the sub- 
ject's arm may be seen as rising autonomously 
when the subject is, of course, the one who is 
actually raising it, Similarly, the biofeedback 
researcher gives an (unintentionally) inac- 
curate description of the alpha-feedback sit- 

uation with the result that the trainee believes 
that the promised experiential state is a con- 
sequence of biofeedback-augmented alpha 
enhancement rather than the trainee’s direct 
achievement (the latter being, in fact, the 
case). With this redescription of the trainees’ 
behavior, the researcher has managed to cir- 
cumyent the typical trainees’ self-doubts 
about their abilities to put themselves in this 
state, which leaves them in a position in which 


У can simply go ahead and do just that 
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(as long as they do not see it that way at 
the time). In essence, the trainees are sup- 
plied with a special description of the be- 
havior whereby they self-induce a change in 
consciousness so that they do not fully recog- 
nize their behavior for what it is. Inasmuch 
as they think that their potential effectiveness 
is limited to controlling the feedback tone, 
they do not recognize the situation as one in 
which there is, in fact, a question about their 
ability to self-induce the alpha experience; 
or if they do recognize the situation, all the 
evidence is stacked against their self-doubts, 
since they are, after all, *objectively" suc- 
ceeding at the task! In sum, although bio- 
feedback researchers have not been fully aware 
of it, alpha-feedback training has been a 
situation in which biofeedback has been used 
as an element in a somewhat sophisticated 
social-influence process that can lead to the 
evocation of latent powers of self-control. 
These conclusions should not be taken as 
a disparagement of biofeedback training; bio- 
feedback does represent a valuable advance 
in our capacity to introduce ourselves to new 
realms of physiological and psychological self- 
control. Rather, this research suggests at least 
two cautions or reminders for biofeedback 
users and researchers. First, we must dis- 
tinguish between the intrinsic and the instru- 
mental uses of biofeedback training. Biofeed- 
back is used for intrinsic purposes when the 
physiology that is being controlled, is being 
controlled for its own sake. For example, the 
use of biofeedback training for the reduction 
of high blood pressure or for muscular re- 
education is an instance of intrinsic use: If 
the hypertensive can use the biofeedback 
monitor to learn to lower his blood pressure, 
or if the cerebral-palsy victim can employ 
the information supplied by an EMG moni- 
tor to learn to coordinate his movements once 
again, then there is no question that the 
physiological control itself is valuable. On the 
other hand, when biofeedback is used instru- 
mentally, we cannot be as sure that the physi- 
ological control per se will be at all useful. 
Biofeedback is used instrumentally when the 
control of some aspect of our physiology is 
attempted not because this control is intrin- 
sically valuable, but because it appears to 
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lead to—or to accompany—some other de- 
sirable state of affairs. For example, the use 
of EMG biofeedback training for anxiety re- 
duction, or the use of EEG biofeedback train- 
ing for “mind control,” for altered-state in- 
duction, for pain control, or as psychotherapy 
for neurotics or alcoholics is an instrumental 
use. When biofeedback is used instrumentally 
there is a gap, usually a categorical gap, be- 
tween the physiological process that is being 
self-regulated and the desired behavioral or 
psychological outcome. In such cases we must 
be most careful before concluding that the 
control of the physiology in question has any 
special relevance to the desired or attained 
goal. 

'The second reminder balances out the first: 
Biofeedback training is not merely a form of 
manipulation of human physiology; it is a 
complex social-behavioral interaction in which 
not merely physiology but attitudes, expecta- 
tions, motivations, attention, experience, 
alertness, and understandings are being di- 
rectly and indirectly influenced independently 
of any contingencies between physiology and 
feedback. The present article illustrates how 
biofeedback training can be more fully un- 
derstood if it is viewed as a social-therapeutic 
activity with important physiological aspects, 
as opposed to being thought of as a strictly 
physiological training procedure with inci- 
dental (and perhaps annoying) social attri- 
butes. Especially when employed instrumen- 
tally, the general biofeedback framework is 
not merely a novel application of operant 
conditioning methodology but a potentially 
powerful context for the mobilization and ac- 
tivation of our latent self-control and selí- 
healing capacities (Plotkin, Note 3). It is in 
this latter role that biofeedback training may 
find its most fruitful applications as a thera- 
peutic tool. 
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Statistical Adjustments and. Uncontrolled Studies 


Herbert I. Weisberg 
The Huron Institute, Cambridge, Massachusetts 


Many evaluations of social interventions are based on uncontrolled assign- 
ments of individuals to treatment groups. Statistical adjustments are often 
used to compensate for naturally occurring differences between groups. There 
is much confusion and controversy about the adequacy of these statistical 
methods. A variety of interrelated problems have been identified, including 
measurement error, unequal growth rates across groups, and regression arti- 
facts. In this article it is shown that these problems can all be subsumed under 
a general conceptual framework, as particular examples of model misspecifica- 
tion. This perspective is helpful in revealing clearly the nature of the problems 
posed by lack of experimental control. The important case of linear adjustment 
(analysis of covariance) is given special attention. An expression is derived for 
the proportion of bias remaining after adjustment, in terms of easily interpret- 
able parameters. Implications of these results for research and evaluation de- 


sign are considered. 


To evaluate the effectiveness of a social 
intervention, the performance of a group 
receiving the “treatment” must be compared 
with a standard representing the expected 
performance in the absence of intervention. 
The fundamental problem in research design 
is to find a valid standard of comparison. 
Randomization is generally accepted as the 

| ideal approach. That is, we use a random 
| mechanism to assign individuals to either a 
treatment group or an untreated control group. 
Random selection virtually guarantees (at 
least for large samples) that the control 
group's performance will correspond to that of 
the treatment group without the intervention. 
So a straightforward comparison of mean 
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outcomes for the two groups will provide an 
unbiased estimate of the treatment's effect. 

Often, however, it is impossible to exercise 
experimental control Complex social forces 
unknown to the investigator determine which 
individuals wind up in each of the groups. 
With such uncontrolled selection designs, the 
straightforward difference of group means 
may be a biased estimate of the effect. In 
these situations a variety of statistical methods 
have been proposed to compensate for this 
bias and thus provide an unbiased estimate. 
The analysis of covariance (ANcova) is perhaps 
most widely used for this purpose. 

Recently there has been a great deal of 
concern about the adequacy of ANCOVA and 
other statistical adjustment procedures. Several 
investigators have shown that under models 
representing uncontrolled selection, the ANCOVA 
may either overadjust or underadjust (Bryk 
& Weisberg, 1977; Cain, 1975; Cochran & 
Rubin, 1973; Cronbach, Rogosa, Floden, & 
Price, Note 1). The estimates generated may 
in some instances be seriously misleading. It is 
even possible for the remaining bias after 
adjustment to be larger in absolute value 
than the initial bias without any adjustment. 

Confusion over the adequacy of statistical 
adjustments is part of a larger debate about 
the usefulness of designs based on uncontrolled 
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studies. Some analysts (see Campbell & 
Boruch, 1975; Gilbert, Light, & Mosteller, 
1975; Riecken & Boruch, 1974) argue forcefully 
that randomization is essential and often 
feasible. Yet the vast majority of social 
research is based on uncontrolled selection, 
because a wide variety of practical, ethical, 
and political problems prevent the implementa- 
tion of rigorous randomized experiments (e.g., 
see Cohen, 1975; Suchman, 1967 ; Weiss, 1972). 
'Thus it is crucially important to understand 
exactly what problems are associated with 
uncontrolled studies. 

The present article proposes a conceptual 
framework that may help to clarify these 
issues. I shall begin with a discussion of 
statistical adjustment in general, and then 
consider in detail the important special case 
of linear models. The perspective is similar 
to that of several other investigators (Barnow 
& Cain, 1977; Cain, 1975; Cochran & Rubin, 
1973; Cronbach et al, Note 1; Goldberger, 
Note 2), and some of the specific results can 
be found in their work. One of the main 
objectives of this review is to draw together 
insights scattered throughout the extensive 
literature on the adequacy of adjustments. 


Statistical Adjustments for 
Confounding Variables 


To avoid confusion, we will not consider the 
effects of finite sample sizes, or the accompany- 
ing problems of estimating parameters on the 
basis of data. Such issues «compound the 
problems addressed here but do not affect the 
basic argument. In effect, we will be consider- 
ing the case of large samples, so that the 
precision of estimated parameters is very high. 

Let us consider the study sample as ran- 
domly drawn from some population. Individ- 
uals in the sample are assigned to a treatment 
or control group on the basis of a mechanism 
that may or may not be known. Following 
Rubin (1974), we can think of two potential 
outcomes corresponding to each individual: 
the outcome received under the treatment 
and that received without any intervention. 
We can define Y;: observed outcome under 
treatment actually received, Wi: outcome 

that would have been observed under treat- 
ment, and Z;: outcome that would have been 
observed under control conditions. 
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These definitions may at first seem confus- 
ing. To understand them clearly, it may ђе 
helpful for the reader to imagine a two-stage 
process consisting of group selection and 
treatment administration. In the first stage, 
individuals are assigned to the treatment and! 
control groups as they would be at the begin- 
ning of a study. At the second stage, however, 
there are three alternative possibilities. 

First, the study may be carried out as 
planned, with subjects who are assigned to 
the treatment group receiving the treatment! 
and control subjects experiencing no interven- 
tion. In this case, the observed outcome 
corresponds to the variable У defined above 
The second possibility is to suspend the study 
and simply give the treatment to all subjects. 
The outcome in this case would be W. Finally, 
we can suspend the study but give the treat- 
ment to no one. Then the outcome would ђе 2. 

Ideally, the effect of the treatment would be 
assessed by comparing W; and Z; for each 
individual i. But of course, in general, the 
second and third options described above are 
strictly hypothetical, and we will have informa- 
tion only on У. The key question in uncon- 
trolled studies is whether we can obtain a 
useful and valid estimate of effect when only 
Y is available. Under what circumstances will 
an actual study allow us to make inferences 
about what would have occurred under 
different scenarios? 

A general answer to this question would be 
extremely complex. Note that for each 
individual the treatment effect is given by 


o; = Wi — Zi. (t) 


In general o; may depend on individual 
characteristics and even on the selection 
mechanism. For example, suppose we happen 
to assign to the treatment those who call 
benefit from it the most. Then the average 
effect for those in the treatment group will be 
relatively large. But the results will generalize 
only to similarly selected groups. When thé 
ais are related to individual characteristics Ш 
this way, caution is required in the interpreta- 
tion of average effects. А 

Although this issue of interactive effects 5 
an important and confusing one, I wish ™ 
this article to highlight the problems pertaining 
specifically to statistical adjustments. Let 
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therefore restrict consideration to the situation 
tin which the treatment effect is constant 
across individuals. That is, 


Wi = Zita, (2) 
so that we can write 
Y; = Zi + Qa, (3) 


where Q, = 1 if subject i is in the treatment 
group and Q; = 0 if subject i is in the control 
group. Note that if Q is determined by a 
random mechanism, then each individual has 
the same probability of being assigned to the 
treatment group. Let P be the total proportion 
assigned to the treatment. Then 


PQ=1)=P 
and 
Р(0 = 0) = 1-Р. (4) 


Under randomization, the expected value of 
Z for those in each group should be identical. 
That is, if we define 


ua = Е(210 = 1) 
uzo = E(Z|Q = 0), 


we would have 


and 


(5) 


ра = Иго. (6) 


If, however, the assignment mechanism is 
nonrandom, it is possible that иза ~ uzo. Even 
with no intervention, there might be a differ- 
ence in mean outcomes between the groups. 
In this case, if we simply compare the mean 
observed outcomes for the groups, we will 
obtain on the average (and approximately 
for large sample sizes) 


Hyi — pro = а + ил — Изо. (7) 


That is, the estimated effect will be inflated by 

, и — uzo While under randomization 21 

7 zo will be 0, under nonrandom assignment 
it may be either positive or negative. This 
term represents the selection bias in estimating 
а. Equations 5 and 7 show that the bias is 
determined by the relationship between Z and 
Q in the study sample. In fact, it will be con- 
Venient for us to reexpress this relationship, 
using the well-known formula for the point 
biserial correlation (e.g., see McNemar, 1969, 
P. 218) as: 


(8) 


ee 
May bal EU 
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where oz is the overall standard deviation of 
Z and pzq is the correlation between Z and Q. 

The dilemma posed by uncontrolled studies 
is that this correlation cannot be estimated 
empirically. Without the intervention, the 
distribution of Z could be obtained, but there 
would be no information on Q. After the study, 
Q can be observed, but Z is no longer observ- 
able. The observable outcome is then the modi- 
fied variable Y that includes a component at- 
tributable to the treatment. From Equation 3 
it is clear that the relationship between Y and 
Q is not the same as that between Z and 
Q. Thus the crucial relationship between Z 
and Q cannot be observed. 

In many situations, however, it is possible 
to identify other variables that are related to 
the assignment variable Q, and which are 
thought to “explain” the relationship between 
Z and 0. That is, differences between ил and 
uzo are caused (at least in part) by differences 
between groups in the distribution of these 
“confounding factors.” If such factors can 
be identified and measured, it may be possible 
to compensate for their effects in estimating a. 
Statistical adjustments are based on this idea. 

Intuitively, a confounding factor can be 
defined as a variable that has a different 
distribution in each of the treatment groups 
and that is causally related to the outcome 
variable. This definition is ambiguous, how- 
ever, because the concept of causality is 
difficult to operationalize. Whatever we mean 
by causal influence, it seems likely that in 
general the value of uz1 — uzo will be deter- 
mined by a complex combination of variables. 
The actual selection process, moreover, may 
be described in more than one way. We may 
have many equally plausible “explanations” 
for group differences. So it may be unrealistic 
to expect an unambiguous criterion defining a 
confounding factor. 

On the other hand, it is not unreasonable to 
ask whether in a particular study, using 
statistical adjustment and a given set of 
adjustment variables (covariates), an unbiased 
estimate of a can be expected. Further, we can 
try to specify the general conditions under 
which adjustment will be useful. What proper- 
ties must X have in order to eliminate bias 
completely, or at least reduce it to an accept- 
able level? Can we assess the amount of bias 
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remaining after adjustment? These questions 
will be addressed in the remainder of this 
article. 


Complete Elimination of Bias 


Let E(Z| X) be the conditional expectation, 
or mean, of Z given X. It is the average value 
of Z in the study population for individuals 
having a given value of X. Thus it is in general 
a mathematical function of X. 

Similarly, let E(Z| X, 1) be the conditional 
expectation of Z given X for those individuals 
assigned to the treatment group. Let E(Z |Х,0) 
be the conditional expectation for control 
group subjects. In general, these expectations 
will not be equal for a given value of X. 
Suppose, however, that 


E(Z|X,1) = E(Z|X, 0) 
or equivalently that 
E(Z|X, Q) = E(ZIX). (10) 


This means that even though the uncondi- 
tional expectations for the two groups differ, 
the conditional means are equal. Conditioning 
on X eliminates the selection bias. We will 
say in this case that X is a complete confounding 
factor. 

To see more clearly the importance of this 
condition, suppose we “match” two individ- 
uals on the basis of a variable X for which 
Equation 10 holds. Let Z; be the outcome for 
the treatment group subject and Zo be the 
outcome for the control group subject. Then 


E(Z|X,1) — E(Zo|X,0) = 0, (1) 
so that 
E(Yi|X, 1) — E(Yo| X, 0) = a. 


By matching individuals on the basis of a 
complete confounding variable, we can obtain 
an unbiased estimate of a. Such matching 
procedures are commonly employed, partic- 
ularly in medical research. There are of course 
many practical problems to deal with, even 
though the method is theoretically correct. 
Rubin (1973a, 1973b) provides an excellent 
discussion of these problems. A less technical 
but more comprehensive review is provided in 
Anderson et al. (in press). 

Note that there may be many different 


(9) 


(12) 
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complete confounding factors satisfying Equa- 
tion 10. If they vary in terms of the strength 
of their relationship with Z, some factors will 
result in more precise estimates of o than 
others will. In terms of bias reduction, however, 
they are all equivalent. Note also that X may 
be a vector consisting of several variables, 
Equation 10 is still the defining property. 

Now suppose that a “covariate” X (either 
univariate or multivariate) is used for adjust- 
ment, and Equation 10 does not hold. Then 
there may be differences between groups that 
are unrelated to X. So after adjustment for X, 
there may remain an apparent treatment 
effect really attributable to preexisting group 
differences. 

The controversy over statistical adjustments 
mentioned above centers on the adequacy 
of the Xs used in practical applications. If 
adjustment proceeds as if Equation 10 holds, 
and a more general model of the form | 


B(Z|X,Q) = 0 103) 


actually holds, how well do these procedures, 
perform? | 

Stated in this way, the adequacy of statis- 
tical adjustment may be seen as a problem 
of model specification. Essentially, we are 
estimating a parameter а under a particular 
restricted form of the relationship between 2, 
X, and Q. We wish to know how accurate our 
estimate will be if a more general model 
actually holds. 

Note that from this perspective, the impor- 
tant issues concern the relationships among 
real, potentially observable variables. Although 
it is true that Z and Q cannot be observed 
together in the same study, each could be 
observed if we were willing to forego the other 
The adequacy of adjustment depends on how 
the covariate X relates to these two variables. 

It is perhaps worth contrasting this perspec 
tive with that adopted in the widely circulate 
paper by Cronbach and his Stanford colleague 
(Cronbach et al., Note 1). These author 
define a true model in terms of two id 
variables. The performance of adjustments 
then assessed in terms of the relationship 
between the covariates actually used and 
these unknown, ideal factors. However; the 
ambiguity inherent in defining and interpreting 


these constructs and the complexity of th? 
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mathematical analysis have led to some 
confusion. By avoiding hypothetical variables, 
I hope to provide a clearer understanding of 
the problems encountered in practical situa- 
tions. 

Note that while Equation 10 specifies a 
theoretical condition for complete adjustment, 
this condition is difficult to verify in practice. 
Generally speaking, we can be sure that the 
condition is fulfilled only when the investigator 
has complete control over the assignment of 
subjects to treatment groups. With complete 
control, the experimenter has two main 
options: randomization and explicit control. 

When individuals can be assigned randomly 
to the two groups, из — из = 0, and statis- 
tical adjustment is unnecessary. If adjustment 
is performed using any covariate, the estimate 
of effect remains unbiased and may have 
greater precision. Increase of precision in 
randomized experiments was in fact the 
original purpose of the ANcova (Fisher, 1932). 

By explicit control, we mean that a covariate 
X serves as the sole basis for group assignment. 
That is, the probability of being in each group 
can depend on X but no other variable. In 
practice, this amounts to conditional random- 
ization, since for a given value of X, individuals 
are assigned randomly with the value of P a 
function of X. Rubin (1977) has recently 
advocated designs based on explicit control 
for educational research. 

In the extreme case when the probability of 
assignment to the treatment group is either 0 
or 1 for any given X value, we have determin- 
istic assignment conditional on X. The 
special case when all individuals below a 
certain cutoff score on X are assigned to one 
group and those above to the other group 
has been called the regression discontinuity 
design (Campbell, 1969; Campbell & Stanley, 
1966). The problem with this design is that, 
although X is indeed a complete confounding 
variable, the distribution of X in the two 
groups does not overlap. Thus we cannot 
match individuals on the basis of X, and we 
must rely heavily on model assumptions in 
order to analyze such experiments. 1 

As mentioned abové, both randomization 
and explicit control require that the investiga- 
tor be able to determine the assignment 
Procedure, In fact, these two techniques con- 
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stitute the backbone of controlled experimenta- 
tion in the Fisherian tradition, and a huge 
literature has developed to elaborate on these 
two basic ideas (e.g., Cochran & Cox, 1957; 
Cox, 1958; Kempthorne, 1952; Winer, 1971). 
This tradition relies on experimental control 
as the prerequisite for causal inferences. 

In our terms, then, experimental control is 
the only general method of insuring that the 
covariates employed in a given study con- 
stitute complete confounding factors. With 
uncontrolled studies, we will not know how 
much bias remains after adjustment. More- 
over, because there is an unlimited class of 
possible models relating Z, X, and Q, we 
cannot hope to answer this question defini- 
tively. Restricting consideration to linear 
models, however, will provide some helpful 
insight. 


Linear Models and the Analysis of Covariance 


Let us assume that 
E(Z|X, Q) = и + BX + 80. 


In this model à can be interpreted as the 
expected difference in Z for individuals in 
the two groups with identical X values. Now 
if X is a complete covariate, then Equation 


14 reduces to 


(14) 


E(Z|X, 0) = и - 6X, (15) 
so that 
роі — uzo = B(uxi — uxo), (16) 
and we could form the estimator 
8 = ип — Hyo — B(ux1 — ихо). (17) 


From Equations 7 and 16, we see that & will 
be an unbiased estimator of o. 

If we ignore finite-sample estimation prob- 
lems, Equation 17 represents the ANCOVA 
estimate of a. Now suppose that an ANCOVA is 
employed when the true underlying model is 
given by Equation 14. Note first that à 
represents the bias remaining after adjustment 
by ANCOVA with X as the covariate. We will 
be interested in the relationship between 6 and 
the initial bias without adjustment (uz1 — #20). 
The ratio of these quantities may be termed 
the proportion of initial bias remaining, which 
we will denote т. We use the term proportion 
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in a general sense, since т need not necessarily 


lie between 0 and 1. 
Note that from Equations 3 and 14 we have 


E(Y|X,Q = и +8Х + 0 + а)0. (18) 


So а and 6 are totally confounded in terms of 
the linear model relating observable variables. 
Thus instead of estimating а as we could if X 
were complete, we can only estimate à + æ. 

Let us denote the matrix of correlations 
among Z, X, and Q by 


1 рах PzQ 
рах 1 рхе 
pza рхо 1 


Consider the basic model specified by Equa- 
tion 15. Using standard results we can write 


RP E Lu (19) 
с.х 
where 


pzq.x = partial correlation of 2 and Q given X, 
cz.x! = conditional variance of Z given X, 


and 
cq.x! = conditional variance of Q given X. 


Further we have 


ол? = ez (1 — рах?) (20) 
and 
сох? = сф(1 — pxo) 
=Р(1— Р)(1 —рхо). (21) 
From Equations 19, 20, and 21, 
41- 
6 = pzo.x 2 рах? (22) 


VPG = P) V1 — рхо 


Now, using Equation 8, we can express the 
proportion of bias remaining after adjustment 


as 
_ ргех Nl — рах? 
PzQ М1— рхо? 


Note that т may be either positive (under- 
adjustment) or negative (overadjustment), 
depending on the signs of pzę-x and pzq- 
Moreover, pzo.x = 0 is a necessary and 
sufficient condition for ANCOVA to be unbiased 
under the model we are considering. That is, 
the correlation between Z and Q must be 
reduced to zero by conditioning on X. 


Q3) 


T 
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Table 1 
Range of for Different Combinations ој pza, ` 
PXZ, PZX 
ПОСЕ АЕТ te 
Basic Sign Sign Sign 
situation Case (р2д) (ехо) (рах) т 
1 1 $ te + — to +1 
2 th = zT — to +1 
3 – + – — æ to +1 
ihe m. - + -= 0+1 
2 5 – - – 110 +9 
6 = i: 43 1to + 
7 + – + 1 to 97 
Shere + - 1 to + 


Although this characterization of the condi- 
tion for unbiased estimation is intuitively 
appealing, it is not very helpful for specifying 
when various values are likely to occur in 
practice. However, using the definition of | 
partial correlation, Equation 23 can be 
rewritten as 


ze рга — PZXPXQ 4 (24) 
р29(1 — рхо?) | 


That is, we can express the proportion of bias” 
remaining after adjustment as a function of 
the simple correlations among Z, X, and Q. 
Suppose first that all three of these correla- 
tions are positive. Then it can be shown that 
т <1, which means the initial bias will be 
reduced. Moreover, if pxq is large relative to 
рдо, т could be very large in the negative 
direction. This means the adjustment c8? 
overcompensate for the initial bias. The range 
of possible values is (— %ю, +1). A similar 
analysis can be undertaken for each combina- 
tion of signs of the correlations. The results | 
are presented in Table 1. | 
The eight cases can be classified into two) 
basic situations. In Basic Situation 1 (Case 
1-4), the relationships are such that adjust- 
ment is in the right direction, reducing the 
initial bias. In Basic Situation 2 (Cases 5-8): 
adjustment will be in the wrong direction, 
further inflating the initial bias. The reas” 
why each of the four cases in each basi 
situation has the same range of possible 
values is that the fou? cases are really equi. 
alent. Consider Case 1. Suppose we rice 
by —X in any Case 1 situation. We would 10 
expect the amount of bias removed to depend 


к 
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on whether X or — X is used as a covariate, 
since both variables carry the same informa- 


"tion. But if Z, X, and Q satisfy the conditions 


of Case 1, then Z, —X, and Q satisfy the 
conditions of Case 2. So each Case 2 situation 
may be viewed as a Case 1 situation with X 
replaced by — X. Similarly Case 3 can be 
generated by substituting —Z for Z in Case 1 
situations, and Case 4 by substituting 1 — Q 
for Q. Thus we need consider only the two 
basic situations. 

Note that under Basic Situation 1, the 
initial bias increases, resulting in an estimate 
farther from the true value than the unadjusted 
mean difference. Although such bias inflation 
can in theory occur, it can probably be avoided 
in most practical situations. Often, enough is 
known about the general nature of selection to 
indicate at least the signs of the correlations 
among Z, X, and Q. For example, in the 
evaluation of compensatory education pro- 
grams like Head Start (Campbell & Erlebacher, 
1970), a very disadvantaged treatment group 
is compared with a somewhat less disadvan- 
taged control group. For the outcomes and 
covariates commonly used, it can be expected 
that both рхо and рхо will be negative and 
Pzx positive. Under these conditions, we can 
be sure that Basic Situation 1 obtains, meaning 
that ANCOvA will underestimate the actual 
treatment effect. 

For most practical applications of ANCOVA, 
Basic Situation 1 can be expected to hold. 
So it is of interest to consider in detail the 
expression for v in this case. In particular, is it 
possible in actual situations to say more about 
the range of possible т values? Because all 
four cases within Basic Situation 1 are equiv- 
alent in the sense described above, it suffices 
to consider only one case. We shall therefore 
assume that pzo, охо, and pzx are all positive. 

Of the three correlations, only pxo can be 
estimated directly from the data. Note, 
however, that under our assumption of a 
constant treatment effect, 

PYX-Q = Pzx-Q- (25) 
That is, pzx.g can be estimated from the 
within-group relationship between Y and X, 
and can therefore be estimated from the data. 
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Using the standard formula for partial correla- 
tion 
рах — PZQPXQ (26) 


рахо = ——— ee _s, 
У (1 — рг) (1 — exo?) 


we obtain 


PZX = PZQPXQ 


+ pex.oV(1 — рс) (1 — рх), (27) 


where the radical represents the positive square 
root. Substituting in Equation 24 yields 


рхо Nl — pzo 


T—1— pzx.q (28) 
pze V1 — pxo 


So even if we can estimate рхо and pzx.a, 
we still require information on 220 in order to 
assess т. Moreover, for fixed values of рхо 
and pzx.o, the value of т is quite sensitive to 
the value of pz. Since there is no constraint оп 
рго for given values of рхо and pzx.o (it can 
take any value between 0 and 1), we cannot 
place useful bounds on т in any obvious way. 
Only if some additional constraint on the 
correlations can be assumed will it be possible 
to restrict т to a subinterval of (— с, 1). 

Equation 28 provides another characteriza- 
tion of the condition for complete adjustment. 
All bias will be removed only if 


ема MU 
PXQN1 — рад 


If pzx.q is too small or too large there will be 
nonzero bias. The fact that the within-group 
correlation between Z and X could be too 
large under certain circumstances may at 
first appear counterintuitive. In particular, 
it seems plausible that ancova would be 
unbiased when рхх.о = 1. But this will be 
true only if pzg = exo, or equivalently if 
pzx = 1. If the overall correlation between Z 
and X is less than 1, but the selection process 
results in a within-group correlation equal to 1, 
then adjustment will not be complete. 
Equation 28 allows an intuitive understand- 
ing of the adjustment problem. If 220 > рхо, 
the outcome (in the absence of intervention) 
is more strongly related to assignment than 
is the covariate. This means that the adjust- 
ment coefficient would have to be large in 
order to adjust fully, larger than 8 used in 
the analysis of covariance. If, on the other 


(29) 
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hand, рхо < рхо, then a modest adjustment 
coefficient is needed, and 8 may be either too 
small or too large. 

In the methodological literature, a variety 
of potential problems in using statistical 
adjustment have been pointed out. I believe 
that all of these can be understood clearly in 
terms of the framework presented above. 
Tn the remainder of this article, I discuss some 
special issues that have received a great deal 
of attention. First we shall consider the 
problems raised by alternative models for 
individual growth over time on the outcome 
dimension. The important special case known 
as Lord's paradox (Lord, 1967) is analyzed in 
detail. Then we shall examine the situation 
in which covariates are measured with error, 
and finally we shall consider so-called regression 
effects. 


Growth Models 


Suppose the outcome of interest consists of 
the level of growth attained by an individual 
on some important dimension. Often the 
selection mechanism will result in a mean 
growth rate for the treatment group that is 
higher or lower than that of the controls, even 
in the absence of an intervention. Generally, 
in this situation, the covariate used is a 
pretest measured on the same dimension as 
the outcome score (posttest). In our notation, 
then, Y — observed posttest score, Z — post- 
test that would be observed without interven- 
tion, and X = pretest score. The use of 
statistical adjustments in this situation can 
be related to the voluminous literature on the 
measurement of change (see Cronbach & 
Furby, 1970), based on traditional psycho- 
metric assumptions. The problems in measur- 
ing change, from this perspective, depend 
primarily on reliability considerations. Much 
recent research on statistical adjustments when 
individuals are growing (Campbell & Boruch, 
1975; Campbell & Erlebacher, 1970; Kenny, 
1975) comes from this psychometric tradition. 
As a result, the problems caused by fallible 
measurement have been confounded with those 
related to growth per se. I shall discuss 
measurement error separately in a later 
section. My purpose here is to consider the 
problems caused by differential growth across 
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treatment groups, even when pretests and | 
posttests are perfectly reliable. 

My analysis will follow closely that of Bryk: 
and Weisberg (1977). They have considered 
various models for individual growth and for 
the selection of individuals into groups. 
With these models, they have identified 
situations in which various adjustments (in- 
cluding АМСОУА) will overadjust or under- 
adjust. 

I shall restrict consideration here to the 
special case when Equation 14 holds. Further- 
more, it will be helpful to define standardized 
group differences on pretest and posttest by 
Ш21 — Изо 


D, (30) 


02.9 
and 
Dx- x1 — ихо à 
ox.Q 


Then there are four basic ways that the re- 
lationship between the groups can change over 
time. First, the standardized distance between 
groups can remain the same over time. We 
refer to this situation as standardized parallel 
growth. Second, the standardized distance can 
increase. We call this situation s/andardized 
divergence. Third, the means may cross over 
between pretest and posttest, with the group 
that is higher on the pretest being lower on the 
posttest. Fourth, the standardized difference 
may decrease but not so much that crossover 
occurs. We then have standardized convergence. 
These four cases are illustrated in Figure 1 
and summarized as follows: 


I. Standardized parallel growth: 


Dx = Dz > 0. 
П. Standardized divergence: 
0 < Dx < Dz. 


III. Crossover: Dx 20; D; «0 


IV. Standardized convergence: | 
Dx > Dz > 0. 


Let us consider now what happens when the 
analysis of covariance is applied in each of 
the four cases. It will be convenient to rewrite 
Equation 28 in still another form: 


(8) | 


Dx 
л=1— .0— · 
PLEO 


IUD 
Parallel 


Ш 
Cross-Over 


ti t2 


Consider first the parallel case. From 
Equation 31 we obtain 
(32) 


So in general, АМСОУА will underadjust 
under a parallel growth model. The special 
case when pzx.o = 1 and Dx = D; is partic- 
ularly interesting. It can be shown to be 
consistent with a "degenerate fan spread" 


T = 1 — pzx.Q. 


LEGEND 
== mean growth И 
curves D 
E le of AL 
== a sample of 
+ individual ARRAI -Treatment Group 
Git) growth curves 


TIME 


Figure 2. Degenerate fan spread. [G(t) = growth at 
time t.] 
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п 
Divergence 


ti to 


IV 
Convergence 


t t2 


Figure 1. Types of standardized growth. [G(t) = growth at time t; tı = pretest time; t» = posttest time.] 


model (Bryk & Weisberg, 1977). Under this 
model individual growth is represented as a 
straight line, and the slopes of these lines may 
vary across individuals, but nonnegligible 
growth for each begins at the same time, r. 
This special situation is illustrated in Figure 2. 
As Bryk and Weisberg have shown, any 
reasonable technique will correctly adjust in 
this case. Because the pretest and the slope 
are perfectly correlated, the pretest contains 
full information on growth in the absence of 
intervention. So it constitutes a complete 
covariate in the sense defined above. 

With a standardized divergence situation, 
т must be less than 1; so ANCOVA will under- 
adjust. Under a crossover model, will be 
greater than 1, and the initial bias becomes 
inflated. Only under the standardized con- 
vergence model does ANcova have the possibil- 
ity of being correct. This will occur when 


(33) 


D 
PZX-Q = ay 


Dx 


That is, the correlation between pretest and 
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posttest within groups must exactly equal the 
proportional decrease in standardized mean 
difference. 

In general, we will not know which of these 
situations obtains, since Y and not Z is actually 
observable after the treatment. The pretest 
alone as a covariate is usually not adequate 
to allow complete adjustment unless we have 
evidence that Equation 33 actually holds. 
Only under a very special model of growth 
will the pretest represent a complete covariate. 

If statistical adjustment is to be possible in 
growth situations, we must have enough 
information on the nature of growth in the 
absence of intervention to specify a complete 
covariate. This information may come from 
theoretical knowledge about the individuals 
being studied and how they were assigned to 
groups, as well as from empirical data. How- 
ever, there are many possible models for 
individual growth. Even our four cases are 
all under the restrictive assumption that 
Equation 14 applies. Moreover, Bryk (1977) 
has shown that the performance of statistical 
adjustments is highly sensitive to model 
assumptions. Thus the use of statistical 
adjustments to control for differential growth 
appears quite problematic. 


Lord's Paradox 


Lord (1967) has presented a particularly 
interesting example of the problems in model 
specification when individuals are growing. 
He considered the following situation: 


A large university is interested in investigating the 
effects on the students of the diet provided in the 
university dining halls and any sex difference in these 
effects. Various types of data are gathered. In particular, 
the weight of each student at the time of his arrival in 
September and his weight in the following June are 
recorded. 


At the end of the school year, the data are indepen- 
dently examined by two statisticians. Both statisticians 
divide the students according to sex. The first statis- 
tician examines the mean weight of the girls at the 
beginning of the year and at the end of the year and 
finds these to be identical. On further investigation, he 
finds that the frequency distribution of weight for the 
girls at the end of the year is actually the same as it was 
at the beginning. 

He finds the same to be true for the boys. Although 
the weight of individual boys and girls has usually 
changed during the course of the year, perhaps by a 
considerable amount, the group of girls considered as 
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a whole has not changed in weight, nor has the grou 
of boys. A sort of dynamic equilibrium has been 
maintained during the year. (p. 304) | 

Let Q = 0 for the women and Q = 1 for 
the men. Further, let Y = observed final 
weight, Z = final weight that would have been! 
observed under continuation of previous Феј | 
and X = initial weight. Lord further assumes 
that the regression coefficient 8 of Z on X in 
each group is the same. Moreover, his diagram 
showing hypothetical scatterplots suggests 
that Y and X have a bivariate normal 
distribution. 

Lord (1967) goes to present two appar- 

ently contradictory but equally plausible 
analyses of these data: 
The first statistician concludes that as far as these 
data are concerned, there is no evidence of any interest- 
ing effect of the school diet (or of anything else) on 
student weight. In particular, there is no evidence of 
any differential effect on the two sexes, since neither 
group shows any systematic change. 


The second statistician, working independently, decides 
to do an analysis of covariance . . .. He finds that 
the difference between the intercepts is statistically 
highly significant. 

The second statistician concludes as is customary in 
such cases, that the boys showed significantly more 
gain in weight than the girls when proper allowance is 
made for differences in initial weight between the two 
sexes. (p. 305) 

Lord concluded that there is no way tof 
tell from the data available which results: 
should be accepted. He infers that in general 
with uncontrolled studies, there is no way {0 
make proper allowance for differences between 
treatment groups. 

Let ô represent the expected difference in 
final weight between a boy and girl of given 
initial weight. Then 


E(Z|X) = и + 8X +30. 


Let а represent the specific effect of the 
diet on boys and az the effect on girls. Then 


E(Y|X) = и -- 8X +6 aq 

и + BX + ог 

This can be rewritten as 
E(Y|X) = и + BX + 6 + 000, (0 


where 


(34) 
- for boys 
for girls. (35) 


1 


ш = ud a 


апа (31), 


а = ay — оз. 
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The general model representing Lord's situa- 

X tion has the form of Equation 18. So estimating 
the differential effect o; — o» is formally 
identical to the usual estimation problem we 
have been considering. Lord's paradox can 
be viewed in terms of model specification along 
the lines developed above. 

As we shall see, the data described by Lord 
are consistent with a variety of underlying 
models. One possible model corresponds to the 
assumption that without the change in diet, 

- the distribution of weights would be constant 
over time, although random fluctuations for 
individuals might occur. Then we would have 


ихо = ихо 
игл = ихђ 
and 
сд.0 = сх.ф. (38) 


Suppose further that there is no treatment 
effect for either sex (ол = a» = 0). Then we 
would observe 


Шү1 = HXi, 
Hyo = ихо 
and 
сү.ф = сх.ф, (39) 


which corresponds with Lord's description of 
data. This model represents the first statisti- 
cian's construction of the situation. 

Note that under this model, Dz = Dx, 
and we would have standardized parallel 
growth. Moreover, since we know that the 
within-group correlation between Z and X is 
less than 1, it follows from Equation 32 that 
ANCOVA will underadjust. 1 

Suppose, however, that instead of remaining 
constant over time, the distribution of weights 
for boys and girls would change in the natural 
course of events, even with the same diet. 

_ There are several forms such a change could 
take. For example, the variance for each 
group might remain constant, but the mean 
levels move closer together. Thus we might 
have 


Bz < ихђ 
цао > ихо (40) 
and 


сл. = сх.ф. 


This implies that Dz < Dx, and we have а 
| Standardized fan close situation. From Equa- 
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tion 33 then, if the correlation between Z 
and X within groups happens to be equal to 
Dz/Dx, then ANcova will adjust perfectly. 
This situation must hold if the second statisti- 
can's analysis is to be accepted. 

Of course, there are many other possible 
models representing the underlying growth in 
the absence of the new diet. Under these 
alternatives, neither statistician’s argument 
will be valid. It is possible that there is 
indeed a differential effect of diet (contrary 
to the first statistician's conclusion) but that 
the estimate proved by ancova is biased. 
"The moral of Lord's story is simply that in the 
absence of additional information on natural 
growth, there may be alternative models con- 
sistent with the data. Unless we can specify 
a correct model, statistical methods cannot be 
counted on to compensate adequately. 


Covariate Measured With Error 


So far we have been assuming that the 
covariate is measured without error. Suppose 
now that X represents a fallible variable 
subject to the usual psychometric assumptions 
(see Lord & Novick, 1968). The issue of 
reliability is very complex. Many definitions 
of reliability have been offered in an attempt 
to quantify the intuitive notion that part of an 
observed score is attributable to random 
fluctuation rather than to a stable character- 
istic. The effects of measurement error on 
statistical adjustments have been widely 
discussed, and many solutions to these 
problems have been proposed (e.g., Cochran, 
1968; DeGracie & Fuller, 1972; Lord, 1960; 
Porter, 1967; Stroud, 1972). 

The usual formulation of the measurement 
error problem assumes correct model specifica- 
tion in terms of an underlying true score 7. 
That is, 


E(Z|T, Q) = и + 6T. (41) 
The observed score X is related to T by 
E(X|T, 0) = T. (42) 
We define the reliability of X by 
2 
r= prx.g = ZE. (43) 


сх.ф? 


- Of course, in general, the conditional correla- 


1160 HERBERT I. 


tions and variances may differ across groups, 
but to understand the main issues it suffices 
to consider the simple case when they are 
equal. In this case, it can be shown (Cochran, 
1968) that the relationship between Z and X 
is of the form 


E(Z|X,Q)=w' +8Х +80, (44) 


where 
(45) 


The effect of measurement error is to reduce 
the within-group regression coefficient (and 
hence the total adjustment) by a factor r. 
So, for example, if the reliability is .8, АХСОУА 
will remove 80% of the initial bias. 

Based on Equation 45, methods have been 
proposed to estimate r and to "correct" the 
ANCOVA. But as noted above, this approach 
assumes that the model is correctly specified 
in terms of the true score. In general, however, 
there is no reason to expect that because a 
variable is free of measurement error it is a 
complete confounding factor. Instead of Equa- 
tion 41 we might have 


E(Z|T,Q)=u+8T+6Q0 _ (46) 


for some nonzero value of à. In this case 
Equation 45 is still valid, and the reduction in 
bias using X remains r times the reduction in 
using X. So we have 


(47) 


where тх and тт represent the proportions of 
bias remaining when X and T are the 
covariates. 

This formula implies a rather curious 
possibility. Suppose that using T as the 
covariate results in an overadjustment (r7 1). 
Then by attenuating the relationship between 
T and Z, X may actually reduce the absolute 
magnitude of the remaining bias, by pulling 
т back toward 0. Of course, with finite samples, 
the use of X also implies lower precision of 
estimates. 

With large samples, however, a variable 
with low reliability may be an excellent 
covariate. In fact, it may even be a complete 
confounding factor. This will occur when 


1 


ccs 


which means 6’ = 0, but à = 0. 


тх = (1— r)rr, 


(48) 


тт 
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Although this result may seem counter- 
intuitive, it is helpful in understanding what 
our definition of a complete covariate does and 
does not imply. It does not necessarily imply! 
high precision with finite samples or a perfect 
relationship between covariate and outcome 
within groups. It is simply the condition under 
which unbiased estimation of а is possible, 
and implies that the model assumed by ANCOVA 
is correctly specified. | 

Note also that viewed in terms of model 
specification, the fallibility of the covariate is: 
not in itself the problem. Having a perfectly 
reliable covariate will not guarantee correct 
adjustment, and having a fallible covariate 
does not necessarily result in bias. A fallible 
covariate may even be a complete confounding 
factor. As Overall and Woodward (1977) have 
emphasized, if assignment is explicitly on the 
basis of a fallible variable, the ancova will be 
unbiased. A perfectly reliable variable on the 
other hand may not be complete. The impor- 
tant question is not one of measurement error, 
but whether Equation 15 holds in terms of! 
whatever covariate is actually used. 


Regression Artifacts 


Another issue that has resulted in much 
confusion is the so-called regression effec. 
Suppose we wish to compare the performance 
of two groups on an outcome measure. The 
groups, however, are thought of as sampled 
from different populations. To adjust for 
differences in the groups, matching is com- 
monly used. The problem is seen in the 
following terms: 


In order to get a matched group when the two popula- 
tions have different mean values, we must take individ- 
uals who fall relatively high on one population and 
match them with individuals who fall relatively low 
in the other. Since the individuals in each group # 
regress toward their own population mean, the regression 
in the two groups will be different. Upon another test, 
our поре will no longer be matched. (Thorndike, 1942, 
p. 


So the mean difference on the outcome scores 
will differ, even if there is no difference between 
treatments offered to the groups. This differ- 
ence is sometimes called a regression artifact. 
This problem arises most commonly when & 
pretest measuring the same dimension as the. 
outcome is used as the matching variable. The. 
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regression artifact may then be attributed to 
imperfect test-retest reliability of the measur- 
‘ing instrument. As Thorndike (1942) noted 
long ago, however, the issue is much more 
general : 


The fallacies with which we are here concerned may 
arise whenever the measure or measures by means 
of which the groups were matched have less than a 
perfect correlation with the measure of the experimental. 
variable which is being studied. A more limited example 
of this is found in the less than perfect correlation 
between a test and a subsequent retest with the same 
instrument. However, our argument is more general 
than this, and holds whenever groups are matched upon 
one measure or group of measures and then studied 
With regard to their performance on other measures 
which do not have a perfect correlation with the 
matching variable. Since this is universally true in 
the matched-groups experiment, the points to be 
raised here are of quite general application. (p. 85) 


This problem 15 particularly confusing 

because it is not obvious what population mean 
the individuals in a group can be expected to 
regress toward. Suppose we know that one 
group is all black and the other all white. 
We might expect the blacks to regress toward 
the mean for black children. However, suppose 
we know they are black and living in Boston. A 
different mean may be relevant. This argument 
can be continued indefinitely. Viewed from this 
Perspective, the concept of regression has a 
rather mystical quality. A retest score is 
being pulled by an irresistible force toward 
some predetermined norm from which it has 
deviated. Consider this convoluted argument of 
Campbell and Erlebacher (1970) attempting to 
circumvent this ambiguity: 


In situations such as this where control samples are 
Chosen to have pretest scores equivalent to experimental 
Samples, the question may be asked "Since the Head 
Start children are an extreme group, why don't they 
Tegress toward the overall population mean just as 
much as do the matched controls?" Comparable 
questions emerge when psychotherapy applicants are 
Matched with a control sample chosen to have equally 
maladjusted test scores (e.g., Campbell and Stanley, 
1973; 1977; pp. 11-21 and 45-50). Why are these 
Controls expected to regress to the population mean 
While the therapy applicants are not? An initial 
Answer is that person-to-person matching on individual 
Scores involves the misleading exploitation of score 
instability phenomena to a much greater degree than 
do the complex of processes which produced the Head 
tart sample or the psychotherapy applicants. These 
groups turn out to be extreme when measured, but 
dis not selected on the basis of their extreme scores- 
It is selection on the basis of extreme individual scores 
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that creates most strongly the conditions under which 
obtained scores become biased estimates of true scores. 
(pp. 195-196) 


From our standpoint, the reference to 
hypothetical populations is unnecessarily con- 
fusing. In any actual situation, the entire study 
sample may be viewed as drawn randomly 
from a single population to which generaliza- 
tions are desired. Moreover, because we are 
ignoring finite-sampling issues, we can treat 
the sample and this population as identical. 

Each individual in the population is seen as 
characterized by many variables. In particular, 
there is a value Fi, Zi, Xi, Qi for individual i. 
In this formulation Z; is unobservable for 
those individuals assigned to the treatment 
group, but is defined unambiguously. 

With this formulation, the regression artifact 
is simply the bias resulting from statistical 
adjustment when X is not a complete con- 
founding factor. Rather than saying that Z 
regresses toward two different population 
means, depending on which population the 
subject comes from, we can say that the 
subject's expected value of Z conditional on 
X depends on Q, that is, 


E(Z|X, 1) = E(Z|X, 0). (49) 


An important special case that has led to 
much confusion occurs when Z represents a 
retest score on an instrument that is identical 
to that used for the initial test X. Suppose we 
are in a nonintervention situation, so that 


Y = 2. (50) 


Then pyx.g can be interpreted as the test- 
retest reliability, r. 


prx-Q = pzx.Q = r. (51) 
Now from Equation 28 we have 
т= 1—7. (52) 


This may be interpreted as an instance of 
standardized parallel growth. The under- 
adjustment, or regression effect, is then seen 
as a natural consequence of this underlying 
growth model. On the other hand, we can 
assume that there exists an underlying true 
score T such that if the adjustment used 7 
rather than X, there would be no remaining 
bias. The regression effect is then viewed as 
the result of measurement error. 
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Neither of these interpretations is the 
correct one. They are simply alternative ways 
of explaining the fact that the pretest does 
not contain complete information for adjusting 
posttest differences. What matters is not 
measurement error, differential growth, or 
regression toward a population mean, but 
simply that the relationship between posttest 
and pretest will not in general satisfy Equation 
10. 


Summary and Conclusion 


I have tried in this article to provide a 
unified perspective on the problems in using 
statistical adjustments with uncontrolled selec- 
tion. I have indicated how various problems 
can be viewed as special instances of model 
misspecification. In particular, problems raised 
by individual growth, measurement error, and 
regression effects can be understood in these 
terms. 

For linear adjustment (ancova), I derived 
an expression for т, the proportion of selection 
bias remaining after adjustment. This formula 
expresses т as a simple function of three 
intuitively meaningful parameters. One of 
these parameters, ро, expresses the relation- 
ship between the assignment process and the 
outcome that would be observed in the 
absence of the treatment. This correlation 
can have a substantial effect on т and cannot 
be estimated from data. So in general, we 
cannot know whether the remaining bias is 
likely to be small. 

Unless we have evidence that X is a complete 
confounding factor, in the sense discussed 
above, we cannot be sure that all bias has 
been removed. As Lord (1967) has put it: 


With the data usually available for such studies, there 
simply is no logical or statistical procedure that can be 
counted on to make proper allowances for uncontrolled 
preexisting differences between groups. The researcher 
wants to know how the groups would have compared 
if there had been no preexisting uncontrolled differences. 
The usual research study of this type is attempting to 
answer a question that simply cannot be answered in 
any rigorous way on the basis of available data. (p. 305) 


Many have interpreted this pessimistic conclu- 
sion as implying the need for more randomized 
experiments. Since adjustments cannot be 
counted on, methodologists must press for 
strict experimental control (see Campbell & 


Boruch, 1975). 
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In my view, this attitude is unrealistic, 
Particularly with large-scale social interven-; 
tions, randomization is rarely practical. For] 
most of our knowledge we must continue to 
rely on uncontrolled studies. So the crucial 
question is whether such studies can he 
better designed and analyzed. If, in Lord’s 
words, “the data usually available for such. 
studies" are inadequate, can we collect other 
data, not usually available, that will allow 
valid and useful inferences? 

The answer depends in part on our willing. 
ness to modify the way we think about 
uncontrolled studies. The emphasis placed on 
randomized experimentation as an ideal mode! 
of inquiry has led researchers to view alterna- 
tive designs in terms of their closeness to this 
ideal, and to retain the idea of a treatment 
group versus control group comparison even 
when the groups have not been selected. 
randomly. Statistical adjustments are em- 
ployed in an effort to simulate the results that, 
would have been observed under a randomized 
experiment. But as we have seen, the results: 
of such attempts may be quite misleading. 

Under randomization, the control group's 
performance is a proxy for that attained by 
the treatment group in the absence of inter- 
vention. With uncontrolled selection, the 
control group loses this property, and the 
rationale for such a group is greatly weakened. 
What is needed is a valid standard of compati- 
son representing the outcomes for the treat- 
ment group had the treatment not been 
received. Information on the performance of & 
nonequivalent control group is relevant only 
if we are convinced that the groups do not 
really differ, or if we can specify a complete 
confounding factor X. Without randomization 
the burden of proof shifts to the investigatoty 
who must provide evidence that two indivi : 
uals with identical values of X, but assigned 0 
different groups, would have the same outcome 
(on the average) in the absence of intervention: 

On the other hand, it may be possible 1 
some situations to estimate directly the 
performance of the treatment group in t 
absence of intervention, or more gener? 
under alternative treatment conditions. 
estimate can then serve as the standard 0 
comparison against which to compare actu 
outcomes. For example, in some situations! 
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is possible to measure a characteristic of the 
treated population repeatedly before and 
after an intervention. Such repeated-measure, 
or time-series, designs (Campbell & Stanley, 
1966; Glass, Willson, & Gottman, 1975) may 
allow strong causal inferences. The main 
threat to the validity of such designs is the 
possibility of a concurrent uncontrolled change 
at the time of the intervention. However, 
by applying techniques developed in single- 
subject research (Hersen & Barlow, 1976; 
Sidman, 1960), such as multiple-baseline and 
reversal designs, this problem can often be 
overcome. 

- Another promising approach, applicable 
when the outcome is a measure of develop- 
mental level, has recently been proposed (Bryk 
& Weisberg, 1976; Strenio, Bryk, & Weisberg, 
Note 3). Value-added analysis uses the varia- 
tion in pretest scores for the treatment group 
to predict growth in the absence of interven- 
tion. This use of cross-sectional data to 
simulate the unobservable development that 
would have occurred without the treatment 
requires some strong assumptions, but these 
can in principle be tested using observable 
data. Extensions of the value-added idea to 
designs encompassing multiple measurements 
may ultimately allow valid estimates of effects 
on individual subjects (Strenio, Weisberg, & 
| Bryk, Note 4). 

Finally, historical information on the treated 
population as well as data on other relevant 
populations may be combined to yield reason- 
àble standards of comparison. Bayesian and 
empirical Bayes approaches offer great promise 
às a formal technology for combining such 
Sources of evidence (Rubin, 1978). 

These approaches are not fully adequate as 
they now stand. They need to be refined to 
the point where they can produce standards 
of comparison that are valid under empirically 
testable assumptions. Attempting to develop 
Such designs ought to be a top priority of eval- 
uation methodologists. Until we have tried to 
develop alternatives not based on “approxima- 
tions” to randomization, we should be cautious 
in discounting the value of uncontrolled 
Studies. While statistical adjustments are 
Certainly problematic, the potential contribu- 
tion of uncontrolled studies has not really 
een tested. 
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Methods for comparing two or more statistical significance (р) levels are de- 
scribed; these methods are more rigorous, systematic, and informative than the 
comparisons that are commonly made by using a significant/not significant 
dichotomy. Formulas are provided for calculating the significance level of a 
comparison between two or more р levels. 


Suppose that we wish to compare the results 
of two studies. If all that is reported for each 
is that the results are significant (р < .05) or 
not significant ( > .05), then the conclusion 
must be simply that the results are the same 
(both are significant, or neither is significant) 
or that the results differ (one is significant, 
and one is not significant). Most studies, 
however, do report information adequate for 
calculating р values (Rosenthal, 1978). The 
purpose of this article is to show how р values 
can be directly compared by calculating the 
significance level of the comparison. 
Before describing our methods, two com- 
ments are in order. First, we could dichotomize 
each № value to be significant or not significant 
and use the crude method of comparison 
| јаве outlined. This approach is clearly unwise 

because it does not use the more detailed 
information available in the p values. For 
example, а р of .05 tells about the same story 
às a р of .06 if both results are in the same 
direction and the studies are of similar size. 

his is true, even though many psychologists 
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give far more credence to .05-level results 
than they do to .06-level results (Rosenthal & 
Gaito, 1963). 

Our second comment on comparing fs is that 
if raw data or appropriate summary statistics 
are available, comparisons can be made that 
are more specific than those made solely from 
the p values. For example, raw effect sizes 
(e.g, mean differences), within-group var- 
lances, and residual variances can be directly 
compared with significance levels calculated 
for the comparisons. Since а р level is affected 
by raw effect size, residual variance, and 
sample size, the comparison of р levels is 
sensitive to differences in any of these compo- 
nents. In practice, particular areas of research 
tend to have reasonably homogeneous sizes of 
experiments, and there is substantial correla- 
tion between level of significance and magni- 
tude of effect. For example, for eight research 
areas recently summarized (Rosenthal, 1976), 
the median correlation between effect size as 
measured by Cohen's d (Cohen, 1977) and the 
Z of the significance level was .74. Conse- 
quently, in many cases, comparisons of ps 
can be thought of as rough comparisons of 
effect sizes. Even when the sample sizes vary 
across the studies, however, the results 
presented here validly compare р values. 


Notation 


Consider first the simple case of two experi- 
ments, each with two treatments. Let p; be 
the observed significance level in the jth 
experiment, and let Z; = Z(p;) be the standard 
normal deviate corresponding to pj. We assume 
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that the р; and е Z; are directed, so that if 
one study shows Treatment 1 superior and the 
other shows Treatment 2 superior, then one 
pi will be less than .5 and the other greater 
than .5, and one Z; will be positive and the 
other negative. 

Suppose that A; is the parameter to be 
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Table 1 
Example of Comparing Two p Levels 
Study One-tailed p 2 
А 1/107 5.20 
B .007 2.45 
Difference 2:155 


estimated in the jth experiment: Aj = ui; — из 
where ш; is the population mean of Treatment 
iin Experiment j. _ 

Let A; = Yi; — Y»; be the estimate of A; 
in the jth experiment, where Yi; is the sample 
mean of Treatment i in Experiment j. Through- 
out, the symbol ~ will be used to indicate an 
estimate. The standard error of A; is o4,; 
if the variances of the observations in Treat- 
ments 1 and 2 are the same, say cj, then 
cà, can be written as о (1/mj) + (1/123), 
where ту is the number of observations 
comprising Yi; The usual estimate of c; is 
the residual mean square in the jth experiment. 


Comparing Two Studies 


We shall now give the results and examples 
of their applications. The technical discussion 
will be postponed until the end of the article. 


Result 1 


Suppose 


oA, 
that is, suppose that the quantities estimated 
by the / statistics of the two experiments are 


the same. 
Then, for large nij, 


21— Z: 
v2 


is distributed as a standard normal deviate. 
Note that the test implied by Result 1 
will be sensitive to different kinds of differences 
between the experiments. For example, if 
the precisions of the two studies are the same, 
that is, if сд, = cà, then the test will be able 
to detect the difference in raw effects. For 
another example, if the raw effects are nonzero 
but equal, that is, if A = A» #0, then the 
test will be able to detect different precisions; 
the different precisions can be due to different 


Note. Sufficiently accurate ps can usually J 
obtained by interpolation or by using extende 
tables (e.g., Federighi, 1959). 

*2.715/V] = Z = 1.94; p = .026, one-tailed. 


residual variances o;? or different sample size 

Example 1. Table 1 shows the results 0 
two studies of the effects of teachers’ @ 
pectations on pupils’ gains in intellectu 
performance (Rosenthal, 1976, p. 460). 1 
both studies, gains in performance were great 
when teachers had been led to expect betti 
performance, but Study A showed resul 
much more significant than those of Study B 
the р levels being 10-? and .007, respectivel} 
(one-tailed). It is of interest to compare thes 
р values because the children of Study А wel 
younger than the children of Study B. Th 
comparisons were as follows: | 

From a normal table we find that Za = 5 
and Zp = 245. Thus Za — Zp = 2.75 am 
(Za — 20) М0 = Z = 1.94, p = .026, one-taile 
suggesting more significant results for th 
younger children. 

Example 2. Table 2 also shows the resull 
of two studies of the effects of teacheli 
expectations on pupils’ gains in intellectu 
performance (Rosenthal, 1976, p. 460; опе 
these studies has already been shown | 
Table 1). In this example, however, the res 
of one study were quite significant (p = 00 
while those of the other were not (р = :4 
Even though these р values are quite differen 
the difference between these p levels has à 
value of only .123. One benefit of more $} 


Table 2 
Example of Comparing Two p Levels 
Study One-tailed 2 2 
А 1007 2.45 
B 21 .81 
Difference 1.64* 


* 1.64/v2 = Z = 1.16, p = .123, one-tailed. 
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Table 3 
Example of Comparing Four p Levels 
Study One-tailed p 2 Linear à Quadratic A Cubic X 

Grade level 

2 1/107 5.20 +3 +1 +1 

3 .0001 3.72 +1 54 -3 

4 21 81 -1 E +3 

5 007 2.45 23 +1 ies 
Statistic 

Contrast Z 2.50 1.56 1.34 

Z’ = x(1) 6.23 2.43 1.79 

One-tailed p .006 .059. .090 


“tematic comparisons of p levels will be a 
decrease in the tendency to assume that two 
studies that differ in whether their results 
reach some conventional level of significance 
are really telling different stories about the 
state of nature. 


Comparing Many Studies 


Result 2 generalizes Result 1 to the case of 
many experiments. For Result 2, we suppose 
that there are K experiments and so let the 
index j run from 1 to K. 


Result 2 
Suppose 
Ar, LAr E 
và oh, Tax 
Then for large samples, 
K 
> (201—2) 


inl 


is distributed as x? with K — 1 df, where 2 
15 the mean of the Zj. 
* Example 3. "Table 3 shows the results of 
four studies of teacher expectations. The first 
and last studies listed were of second and 
th graders, respectively, and we met them in 
Table 1. We met the fourth graders in Table 2. 
he question we now ask is whether the four 
? levels of Table 3 are significantly different 
Tom one another. The sum of the squares of 
* deviations about the mean Z (SS) was 
Computed to be 10.45, which is referred to 
* distribution of x? with 3 df. For the four 
(s of Table 3, we found 5 to be .015. Alterna- 


tively, the mean square (SS/df) is referred to 
the distribution of F with 3 and го df. For the 
four Zs of Table 3, we found SS/df to be 
3.48 = F(3, со), р = (015. 


Contrasts in the Studies 


Although we know how to answer the broad 
question of the significance of the differences 
among a collection of р levels, we may often 
be able to ask a more focused and more useful 
question. For the four studies of Table 3, 
for example, we are far more interested in the 
more focused question of whether lower р 
levels are found more at lower grade levels. 
Result 3 handles such questions. 


Result 3 
Suppose 
к A, 
№ = 0, 
E iri; 
where 
K 
Ум=о 


Then for large samples, 
K 
У MZ 
ј=1 


K 
XM 
j= 


is distributed as a standard normal deviate, 
Example 4. The column labeled Linear Х 

of Table 3 gives the weights of a linear contrast 

to address the question of whether lower р 
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values are found more often at lower grade 
levels. The analysis showed a clear linear 
trend for younger children to be more signif- 
icantly affected by teacher expectations, 
Z(yZ)/NZM = 11.16/V20 = Z = 2.50, p= 
.006 (one-tailed). 

We now know that the four p levels of 
Table 3 differ among themselves, X*(3) = 
10.45, р = .015, and that there is a linear trend 
for the p levels to be related to grade level, 
Z = 2.50, Z? = X?(1) = 6.23, р = .006. Since 
the 3 df X? of 10.45 is the sum of (a) the 1 df 
X* of 6.23 corresponding to the linear trend 
and (b) an independent 2 df X? corresponding 
to deviations from the linear trend, we have 
that (10.45 — 6.23) — 4.22 is distributed as 
X? with 2 df. 

In this case Х2(2) = 4.22, р = .121, suggest- 
ing no strongly significant curvilinear relation- 
ships. This X? of 4.22 based on 2 df сап be 
further split into a quadratic and a cubic 
component. The last two columns of Table 3 
show the weights (orthogonal polynomials) 
employed for the quadratic and cubic contrasts 
and the 2, Х, and p for each. The X*(2) has 
been split into two X*(1)s of 2.43 and 1.79, 
significant at p = .06 and .09, respectively. 
Since there is no theoretical reason to expect 
either a quadratic or a cubic trend for the 
present data, we might not normally split the 
2 df X? further, but there are applications 
where such further contrasts may be of value. 


Technical Discussion 


To prove Results 1, 2, and 3, fix 6; = Aj/o4, 
at some value, and let fj, the df in each study, 
get larger and larger. As f; ©, 2; = Z(pi) 
is the / test in the jth study and thus equals 
A;/é4;. Furthermore, in the limit, Z; is normally 
distributed with mean 5; and variance 1. 
Consequently, when à = ёз, (Z1 — Z2)/V2 is 
N(0, 1) (ie, Result 1); and more generally, 
when ZAj5; = 0 (where ZA; = 0), ZuZi/ VEA? 
is N(0, 1) (ie, Result 3). Also, when all 
6; = A/cà, E(Z; — Z)? is distributed as Ж, з, 
because the Z; are independent normal var- 
iables with common mean and variance 1 
(i.e., Result 2). 

To obtain insight into how large each f; 
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should be before we can trust the aboy 
asymptotic argument, we examine the mean 
and variance of Z; for large but finite fi 
Using a Taylor series expansion of an expres- 
sion in Wallace (1959, p. 1121), when f, is 
large Z; can be approximated as /; (1 — /;//4fj), 
where 1; is the / statistic that was used to look 
up the р value, р;. Using this approximation 
and assuming normally distributed data, we can 
use Equation 1 in Johnson and Kotz (1970, 
p. 203), Stirling’s approximation (Wilks, 
1962, p. 177), and Taylor series expansions in 
1/ f; to show that for large fj, the mean of Z; 
is approximately (1 — 8//4f; and the 
variance of 2) is approximately [1 + (3 — ô’) 
fi] These expressions suggest that if all fj 
are large enough to ensure that all values of 
52/43 and (3 — 6;*)/f; are close to zero 
(e.g., .10), then: each Z; will essentially have 
mean 6; and variance 1, as with infinite fj; 
as a consequence, we then expect the asymp- 
totic arguments leading to Results 1, 2, and 3 
to be appropriate. However, in rare circum 
stances when f; is so small that 5;/4fj 9 
(4 — 62)/f; is large (e.g., .5), then the signif 
icance tests presented in Results 1, 2, and 3 
may be somewhat inexact, since even when 
all 6; = ô, all the Z; will not have (a) approx 
imately the same mean or (b) variance equil 
to опе. ` | 
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The Construct Validity of Egocentrism 


Martin E. Ford 
Institute of Child Development, University of Minnesota (Minneapolis) 


Construct validation is briefly explained and then applied to egocentrism. Con- 
ceptual and operational referents of this construct are organized into three 
categories: visual/spatial egocentrism (what does the other see), affective ego- 
centrism (what does the other feel), and cognitive/communicative egocentrism 
(what is the other thinking). Several kinds of reliability information are re- 
ported, and construct validity is evaluated primarily by examination of the 
relationships among measures of egocentrism within and between categories. 
Although interrater reliability and interrater agreement were found to be uni- 
formly high for all egocentrism measures, and the measurement reliability was 
usually adequate, a few tasks were not internally consistent. Overall, the con- 
Struct validity of egocentrism was not supported, since most task intercorrela- 
tions were low and often nonsignificant. From this evidence and an analysis of 
key egocentrism tasks, an alternative interpretation of the data based on cogni- 


tive constructs and task-specific and response-specific variables. is proposed. 


One activity essential for progress in theory 
and research is the validation of psychological 
constructs. Essentially, a construct is an 
unobservable characteristic of some entity, 
usually a person, that is hypothesized as an 
explanation for some observable phenomena. 


‘Constructs usually refer to an underlying 


psychological structure, state, system, or 
process, although these are sometimes not 
easily specified. Examples of some common 
psychological constructs that fit this descrip- 
tion are intelligence, memory, anxiety, libido, 
identity, attachment, and, of course, ego- 
centrism. 


The author wishes to thank Andrew Collins and 
Daniel Keating for their help in editing the manu- 
script. 

Requests for reprints should be sent to Martin 
Ford, Institute of Child Development, 51 East River 
Road, University of Minnesota, Minneapolis, Min- 
nesota 55455. 


Constructs.that have been adequately vali- 
dated through empirical testing can be effi- 
cient and reliable sources of guidance in our 
problem-solving activities. On the other hand, 
constructs that are not valid may distort our 
view of the relevant problems or may lead 
us to seek solutions in the wrong places. It is 
therefore important that the validity of con- 
structs be evaluated. Cronbach and Meehl 
(1955) have attempted to explain when and 
why it is important to evaluate construct va- 
lidity in the context of psychological testing. 
They comment that 
construct validation is involved whenever a test is to 
be interpreted as a measure of some attribute or 
quality which is not “operationally defined" [ie., 
there is no one operation that is by itself an ade- 
quate definition of the construct]. The problem 
faced by the investigator is, ^What constructs ac- 
count for variance in test performance?" (p. 282) 


Cronbach and Meehl emphasize that con- 
struct validity can only be evaluated by 
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integrating evidence from many different 
sources, They describe several kinds of in- 
vestigations that can potentially provide in- 
formation about the validity of a construct: 
(a) studies of group differences, (b) correla- 
tional and factor analytic studies, (c) studies 
of internal structure, (d) studies of change 
over occasions, and (e) studies of process. 

In this article, studies that are relevant to 
the construct validity of egocentrism are 
reviewed. However, not all studies of egocen- 
trism are included. Most of the research cited 
falls in the second and third categories de- 
scribed above, that is, correlational and in- 
ternal structure studies. Where relevant to 
the discussion, studies of group differences 
(usually different age groups) are mentioned. 
However, since there are age differences on 
most measures with some cognitive content, 
these studies are given little weight. This 
kind of information is not as decisive or as 
discriminating as some other kinds of evi- 
dence might be and is therefore only selec- 
tively reported. For the same reason, studies 
of change over time might be less useful for 
answering questions about the construct va- 
lidity of egocentrism, although this assertion 
is difficult to evaluate, since longitudinal stud- 
ies in this area are so infrequent. Finally, 
studies of the psychological processes that ex- 
plain or correlate with egocentric performance 
would probably be of great value, but un- 
fortunately this kind of evidence is also sparse 
in the literature. 

The article is organized around three is- 
sues: 

1. What are the conceptual referents of the 
term egocentrism? 

2, What are the various ways that research- 
ers have attempted to operationalize this 
construct? Are these measures reliable? 

3. Do the various operationalizations of ego- 
centrism appear to be measuring the same 
underlying construct? If no, then what does 
each measure? 

The present article concludes that the data 
do not support the construct validity of ego- 
centrism and that several other sources of 
variance can be hypothesized to account for 

the data. Thus some additional evidence is 
presented that focuses primarily on the 


- -MARTIN E. FORD 


plausibility of this reinterpretation. The arti- 
cle relies, in particular, on an analysis of the 
requirements of several key egocentrism tasks 
and the kinds of errors that are com- 
monly associated with them, 


Conceptual Definitions of Egocentrism 


Cronbach and Meehl (1955) assert that 
construct validation begins with a theory that 
defines the construct. They point out that if 
an investigator does not specify the meaning 
of the construct clearly enough, then others 
will be unable to evaluate the evidence pro- 
vided for the validity of the construct. Al- 
though the nature of current psychological 
theorizing demands that a certain degree of 
vagueness be tolerated, the testability of a 
theory and the value of construct validation 
are greatly enhanced when this vagueness is 
minimized. 

Another important point is that the term 
egocentrism is not always used as an explana- 
tory construct. It is sometimes used simply 
to describe a specific social cognitive act and 
implies nothing about other acts or the 
underlying causes of the observed behavior. 
Egocentric may be useful as an adjective that 
describes a person’s behavior in a given situ- 
ation, but it should not be confused with the 
term egocentrism that is used to refer to a 
hypothetical underlying trait. It is only this 
latter use that is evaluated. 

Looft (1972) attempted to pinpoint the 
meaning of egocentrism and concluded that 
it “does not pertain to selfishness or an 
overly keen regard of oneself, or even to the 
frequent use of ‘I’ or ‘me.’ The essential 
meaning of egocentrism is an embeddedness 
of one’s own point of view” (p. 74). In Pia- 
get’s (1926) theory, which is the context in 
which the construct is typically used, ego- 
centrism is defined as a lack of differentia- 
tion in some aspect of subject-object interac- 
tion, Feffer (1959, 1970) defines it as the 
inablity to “decenter,” where decentration 
refers to one’s ability to shift attention to 
consider more than one aspect of an event. 
Although each of these definitions is a little 
different, they share a common core of mean- 
ing: Each refers to an individual’s failure to 
perceive a situation or an event in more than 
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one way. This one way of perceiving is the 
one that is easiest for the individual, that is, 

| the one that requires no conceptual elabora- 
tion beyond what is directly perceived. 

Conceptually and methodologically, the re- 

ferents for egocentrism can be organized into 
three categories. Adapted from Shantz's 
(1975) review of social cognitive develop- 
ment, these are (a) What does the other 
see? (b) What does the other feel? and (c) 
What is the other thinking? Each of these 
incorporates an appreciation for perspectives 
` other than one's own and the ability to infer 
in a given situation what these alternative 
« perspectives are. The differences among 
these three categories are essentially in the 
domain each pertains to: visual/spatial 
(sometimes called perceptual) , affective, and 
cognitive/communicative (sometimes called 
conceptual). 

Piaget's theory and the current develop- 
mental literature imply that egocentrism is a 
Characteristic of individuals that is both 
consistent across situations and stable over 
time. Because egocentric performance is 
typically regarded as an indication of the 
developmental status of the individual in 
terms of Piaget's stages of cognitive develop- 
ment, egocentrism is clearly meant to refer 
to a generalized trait. In other words, the 
presence or absence of this hypothetical con- 
struct is considered to be a sufficient explana- 
tion for the presence or absence of a wide 
range of phenotypically diverse behavior. The 
“major prediction that one can make from this 
conceptualization is that if egocentrism is a 
unitary trait that can be used to predict 
failures in all kinds of perspective taking, then 
measures of egocentrism from each of the 
three domains previously described should be 
positively and significantly correlated. It is 
reasonable to expect that correlations of 
measures within a given domain would be 
higher than those between categories (due 
to domain specific variance), but both sets of 
correlations should exceed correlations be- 
tween egocentrism measures and measures of 
other constructs, Although these latter corre- 
lations may be expected to be positive and 
significant for theoretically related constructs 
such as intelligence, conservation, and popu- 
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larity (Rubin, 1973), they should not equal 
or exceed within-construct correlations. 

In the section on Relationships Among Ego- 
centrism Measures, studies that have com- 
puted these kinds of correlations are re- 
viewed. First, however, the next section de- 
Scribes operational definitions (measures) of 
egocentrism and reports evidence on the sta- 
bility and internal consistency of these 
measures. 


Operational Definitions of Egocentrism 


Measures of Visual/Spatial Egocentrism: 
What Does the Other See? 


Piaget and Inhelder (1956) developed the 
first measure that was intended to assess the 
ability to imagine how an object or set of 
objects would appear from a different occupied 
position (ie. from the perspective of an- 
other person). In their test, known as the 
three-mountains problem, a child sits in one of 
four chairs positioned around a table on 
which three mountain-shaped objects are 
placed. A doll is placed in one of the three 
vacant chairs, and the child indicates what 
the doll sees by pointing to one of a set of draw- 
ings or photographs, by drawing what the 
doll sees, or by recreating the doll’s view by 
manipulating materials provided to the child, 
A correct response is taken as evidence for 
visual/spatial perspective-taking ability. 

Many innovations on this task have ap- 
peared in the literature; various stimulus 
displays have differed fundamentally on di- 
mensions of complexity and familiarity. 
Flavell, Botkin, Fry, Wright, and Jarvis 
(1968) developed several measures, including 
one in which four displays are presented one 
at a time in a standard sequence. Each display 
consists of a set of objects fastened to a 
board, which is placed in the middle of a 
small rectangular table. These displays are 
increasingly complex: The first has only a 
single red wedge of wood; the second con- 
sists of three vertically oriented blue cylinders 
of equal height; and the last two displays, one 
blue and the other half red and half blue, 
consist of three cylinders of unequal height. 
The child’s task is to reconstruct the experi- 
menter’s visual perspective using a duplicate 
set of unfastened objects. 
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Many other visual/spatial role-taking 
tasks could be described; brief characteriza- 
tions of some of these should suffice. In a 
simple task, Liben (1978) had young chil- 
dren wearing yellow sunglasses indicate how 
a white card would appear to an experimenter 
wearing green sunglasses. Coie, Costanzo, and 
Farnill (1973) used a display in which a toy 
doll was oriented in different positions with 
respect to three toy houses varying in both 
size and color. Kurdek and Rodgon (1975) 
created a display with presumably attractive 
and familiar Walt Disney characters. Photo- 
graphs, including several on a photo cube, and 
a table setting were used by Zahn-Waxler, 
Radke-Yarrow, and Brady-Smith (1977), 
based on measures developed by Flavell et 
al, (1968). Fishbein, Lewis, and Keiffer 
(1972) displayed either one or three toys to 
a child, who either pointed to one of a set of 
four or eight photographs or actually turned 
the display until the correct view was facing 
towards the child, Eliot and Dayton (1976) 
used an adaptation of the three-mountains 
task; by varying the shape of the stimulus 
objects (blocks) and the shape of the board 
that supported them and by varying as well 
the arrangement of the objects, 39 different 
configurations could be constructed. Finally, 
Shantz and Watson (1971) had young chil- 
dren view a display in a covered box and 
then view the same display from the opposite 
side; on some trials the display was rotated 
180° and the children’s expressions of sur- 
prise or amusement were noted. 

It is important to note that in all of these 
measures, only certain responses can be in- 
terpreted as a manifestation of egocentrism, 
that is, those that represent the actual visual/ 
spatial perspective of the subject. All other 
errors are nonegocentric errors because they 
represent perspectives other than those ex- 
perienced by the subject. For example, it 
would not be evidence of egocentrism if a 
child, when asked to show how someone sit- 
ting opposite the child would see a rectangu- 
lar display, were to indicate a perspective 
that was incorrect but different from his or 
her own, such as the perspective of someone 
sitting to the immediate right. No matter how 
poorly the child performs on the task, ego- 
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centrism can only logically be inferred when 
the child's own perspective is offered as the 


correct answer. This point is elaborated in a , 


later section in which age differences in 
visual/spatial perspective-taking errors are 
considered. 

Reliability. Only one estimate of the re- 
liability of any measure of visual/spatial 
egocentrism appears in the published litera- 
ture. Rubin (1973) reports that test-retest 
(TR) reliability for the Flavell et al. (1968) 
measure is between .85 and .95 for his sample 
of 5- to 12-year-olds. Where reported, the 
interrater reliability and interrater agreement 
have also been in this range (Rubin, 1973; 
Zahn-Waxler et al., 1977). 


Measures of Affective Egocentrism: 
What Does the Other Feel? 


Affective egocentrism refers here to an 
inability to infer the feelings of others. It 
does not imply the ability or inclination to 
share these feelings, which distinguishes it 
from a related construct, empathy. (In gen- 
eral, taking the perspective of another person 
requires being able to identify and appreciate 
a different view of the world in some domain. 
It does not require that this perspective be 
experienced in precisely the same way; in- 
deed, that would be unlikely, since immedi- 
ately perceived and inferred experience would 
have to be equivalent.) 

In one widely used measure of affective 
egocentrism, Borke’s (1971, 1973) Inter- 
personal Perception Test, a child is told a 
short story that portrays an emotionally stim- 
ulating situation, such as losing a pet (sad- 
ness), going to a birthday party (happiness), 
having a toy broken by another child (anger), 
and being lost (fear). There are 23 stories in 
all. Presented along with these stories is a 
picture of the described situation, in which 
the appropriate character has a blank face. 
The child's task is to supply the proper fa- 
cial expression. A similar but even simpler 
measure was used by Feshbach and Roe 
(1968), whose Affective Situations Test pro- 
vided the appropriate facial expressions for 
the story characters along with the situa- 


tional cues. An important criticism of these _ 
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measures is that one cannot be sure that cor- 
rect responses are evidence of true perspective 
taking, since there is no clear criterion for 
discriminating between subjects’ attributing 
their own responses to a situation and ac- 
tually inferring the emotional responses of 
another. 

A measure of affective egocentrism that 
may permit this discrimination was developed 
by Rothenberg (1970). This task differs 
from Borke’s Interpersonal Perception Test 
on dimensions of familiarity and to a lesser 
degree, complexity. For this measure, children 
are required to judge the feelings of individu- 
als unlike themselves (adults) in relatively 
unfamiliar situations (e.g. unexpectedly 
bringing friends home for dinner to an un- 
prepared spouse). This task is more complex 
in that no visual cues are available and the 
taped verbalizations are more adultlike and 
presumably more difficult to comprehend. 

Several researchers (Burns & Cavey, 1957; 
Deutsch, 1974; Kurdek & Rodgon, 1975) 
have used the general strategy of presenting 
pictures or films of situations in which the 
facial expression of the central character is 
not what one would expect on the basis of the 
contextual cues (e.g., frowning at a birthday 
party or following a helpful gesture by an- 
other). Inferring affective perspective taking 
in these tasks is difficult, since children with 
the ability to infer the feelings of others may 
give differing responses depending on indi- 
vidual differences in the salience of the incon- 
gruent facial and situational cues. The prob- 
lem mentioned earlier of discriminating at- 
tribution of one's own feelings and inferring 
others’ feelings is here also. 

Reliability. Only three studies report 
measurement reliabilities for affective ego- 
centrism, and all are dangerously low. For a 
sample of third and fifth graders, the internal 
consistency of the Rothenberg measure has 


„been reported as .28—47 (Rothenberg, 1970), 


.30 (Hudson, 1978), and .50 (Rubin, 1978). 
In the latter study, the internal consistencies 
within Grade Levels 1, 3, and 5 were only .18, 
.20, and .39, respectively. Again, where re- 
ported, interrater reliability and interrater 
agreement are fairly high (Hudson, 1978; 
Moir, 1974; Rothenberg, 1970; Rubin, 1978). 
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Measures of Cognitive/Communicative 
Egocentrism: What Is the Other Thinking? 


This is the broadest of the three categories 
and is therefore not as conceptually distinct 
as the previous two. In general, this set of 
tasks requires the subject to infer something 
about the thoughts, motives, or intentions of 
another person. Sometimes this is done more 
or less in the absence of the subject's poten- 
tially interfering cognitions, but more com- 
monly the evidence of interest is whether one 
can overcome the tendency to attribute one's 
own knowledge to another in some sort of 
communicative situation. 

Referential communication. Perhaps the 
most widely used measures of cognitive/com- 
municative egocentrism are those used in 
referential communication studies, in which a 
listener must select the appropriate object 
(referent) from a set of objects (nonrefer- 
ents) on the basis of a verbal message by a 
speaker. In some cases the major dependent 
variable is the number of successful matches 
made by the speaker and listener, although 
since the characteristics of the listener can 
vary considerably, others have been more 
interested in the actual content of the speak- 
er's message. This also provides a more di- 
rect assessment of the subject's communica- 
tive ability. The most widely used measure of 
referential communication ability was devised 
by Glucksberg and his colleagues (Glucks- 
berg & Krauss, 1967; Glucksberg, Krauss, & 
Higgins, 1975; Glucksberg, Krauss, & Weis- 
berg, 1966). In this task, a speaker and a 
listener are seated at opposite ends of a table 
with an opaque screen separating them. Sev- 
eral stimuli are pictured on cards or blocks; 
these stimuli are novel graphic designs that 
are difficult to label or describe. The speaker 
then attempts to provide descriptions of each 
stimulus so that the listener can successfully 
discriminate the referent from the nonrefer- 
ents. Successive trials are not independent in 
the sense that the set of nonreferents dimin- 
ishes in size with each trial, although some 
researchers have used the full set of stimuli 
for each trial. A variant on the general 
Glucksberg et al. procedure is to have listen- 
ers provide feedback indicating that they 
didn’t understand the first message and then 
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to assess how the speaker's communication is 
affected. This manipulation is based on the 
premise that egocentric speakers will be un- 
able to recode the message to take into ac- 
count the communicative failure. 

Other measures have been devised to assess 

referential communication ability, and in a 
situation analogous to that for the visual/ 
spatial and affective egocentrism measures, 
these differ essentially in the familiarity and 
perceptual complexity of the stimulus items 
used. In addition to the Krauss and Glucks- 
berg Blocks Task (Krauss & Glucksberg, 
1969), Piché, Michlin, Rubin, and Johnson 
(1975) administered the Baldwin and Garvey 
Picture Identification Task (Baldwin & Gar- 
vey, Note 1), in which subjects describe a 
picture of a Dr. Seuss-like animal to a lis- 
tener who must select the correct picture from 
a set of seven that closely resemble each 
other. Piché et al, (1975) also administered 
the Crystal Climbers Task, in which subjects 
describe a model made of white plastic 
circles, squares, rectangles, and cylinders of 
various sizes to a listener who must construct 
the same model out of a set of unassembled 
pieces. Other studies used measures of less 
perceptual and conceptual complexity. Shatz 
and Gelman (1973) had their subjects de- 
scribe which of several airplanes of similar 
appearance to choose. Maratsos (1973) had 
young children describe to an experimenter 
who either was watching or who could ap- 
parently not see (ie. hands over eyes but 
with a small crack to peek through) which of 
several toys to place in a toy car that the 
child was to catch as it rolled down a small 
hill. These toys consisted of familiar, easily 
discriminable and describable objects: a red 
duck, a green duck, small dogs, and boys and 
girls. Hoy (1975) also varied whether the 
listener could see the speaker, but in addi- 
tion the familiarity and complexity of the 
objects to be described were manipulated 
(i.e. a horse or a random shape). 

Social and private speech. Another method 
of assessing cognitive/communicative egocen- 
trism is to observe the quality of children's 
speech in naturalistic settings. These settings 
may be social, in which case the data of in- 
terest are the degree to which speakers can 
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modify their messages according to the status 
of their listeners, especially those who are 
younger and presumably less competent com- , 
municators (Garvey & Hogan, 1973; Shatz & 
Gelman, 1973). In other instances, these set- 
tings may be mostly nonsocial in the sense 
that any speech that occurs is not directed at 
another person. This measure clearly origi- 
nated from Piaget's (1926) characterization 
of the speech of young children as repetitious 
and like a "collective monologue." Kohlberg, 
Yaeger, and Hjertholm (1968) developed a 
quantitative scale in which certain kinds of 
private speech, such as the repetition of 
words for their own sake or simple descrip- 
tions of one's own ongoing activity, were 
rated as more egocentric than other kinds of 
private speech, such as speech that was used 
to guide and control activity or inaudible 
muttering. This kind of measure seems to be 
an indirect means of assessing the ability to 
infer the cognitive perspective of another and 
to adjust one's behavior accordingly. Social 
speech measures might be better suited to this 
purpose, However, Kohlberg et al. comment 
that self-communication may not be very 
different from social communication with 
someone you know intimately. 

Feffer’s Role-Taking Task. A third popu- 
lar kind of cognitive/communicative egocen- 
trism measure is Feffer's Role-Taking Task 
(RTT; Feffer, 1959, 1970; Feffer & Goure- 
vitch, 1960; Feffer & Suchotliff, 1966). This 
measure, which attempts to tap an individu- 
al's ability to decenter or to see an inter- 
personal situation from the perspective of 
another, requires subjects to make up an 
initial story as one character would tell it 
from a picture (e.g., a Thematic Apperception 
Test card) that portrays at least three char- 
acters, Then they must retell the story as it 
would be perceived or experienced by the 
other characters, repeating the story once for 
each’ character. Scoring is based on the ex- 
tent to which the sequential story telling re- 
flects an ability to logically coordinate the 
different versions of the initial story and to 
elaborate the roles and internal states of the 
characters (Schnall & Feffer, Note 2). 

Privileged information. A fourth measure- 
ment strategy in this domain is the privileged 
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information paradigm originated by Flavell 
et al. (1968) and further developed by 
Chandler (1973). Flavell's version, known as 
the apple-dog story, is similar to Chandler's 
several stories, Subjects first describe a story 
that is displayed in a sequence of several 
cartoons that depict story characters in emo- 
tionally charged and stressful situations. They 
then must retell the story from the perspec- 
tive of a bystander who has no knowledge of 
the activity portrayed in an important subset 
of the cartoons. This knowledge is crucial for 
understanding the outcome of the story and 
is the subjects “privileged information" that 
must not be attributed to the bystander if 
successful role taking is to occur. Scoring is 
based on the degree to which subjects are 
able to restrict their second narrative to the 
limited perspective of the bystander. Ambron 
and Irwin (Note 3) used a similar strategy, 
except that in their test the narrative is sup- 
plied by the experimenter, and there are 
fewer cartoons (four). 

Chandler, Helm, and Smith (Note 4) de- 
veloped a task based on “droodles,” which are 
clever drawings that convey some situation 
with a minimum of lines. Subjects first saw 
only a part of the droodle, which by itself 
was uninterpretable, and then were given the 
whole drawing. The task was then to interpret 
the picture from the perspective of someone 
who had only seen the uninterpretable part of 
the droodle. 

Zahn-Waxler et al. (1977) used a wide 
array of tasks that assessed conceptual role 
taking, including two privileged information 
stories. However, they also included tasks 
that involved (a) choosing an appropriate 
birthday gift for other persons; (b) choosing 
the appropriately sized chair for one's self 
and for an adult experimenter; (c) indicating 
which of two games, including one that a 
confederate preferred and the subject did not 
prefer, that the confederate would like to play 
with; and (d) indicating which of two foods 
(attractive cookies or juice with soggy crack- 
ers) that a confederate who pretended to hate 
cookies would prefer to eat. 

Recursive thought. Another means used 
to assess cognitive/communicative egocen- 
trism is through the child's understanding of 
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the recursive nature of thought (Miller, 
Kessel, & Flavell, 1970). This measure in- 
volves showing subjects several cartoonlike 
drawings in which scalloped cartoon clouds 
represent thinking and smooth cartoon clouds 
represent talking. The faces of a boy, a girl, a 
mother, and/or a father are pictured in the 
drawings. Different configurations of smooth 
and scalloped clouds are embedded in one 
large scalloped cloud, indicating that the 
depicted character is thinking about some- 
thing. After subjects are trained to under- 
stand the meaning of the clouds, they are 
asked to describe what the person in the 
picture is thinking. The pictures to be de- 
scribed range from relatively simple situations 
(e.g, the boy is thinking about the girl) to 
fairly complex representations (e.g., the boy 
is thinking that the girl is thinking of him 
talking to her). Presumably, an inability to 
take the perspective of another will be re- 
flected in an inability to describe events such 
as thinking about another's thoughts. 

Infer game strategies. A sixth and final 
set of measures in this domain are those that 
require subjects to infer the Strategy of an 
opponent in a game. The most widely used of 
these is the Flavell et al. (1968) nickel-dime 
game, in which the subject has a nickel and a 
dime and two upside-down cups with another 
nickel and dime taped to them. The subjects' 
instructions are to cover the coins with the 
pair of cups in such a way that their ор- 
ponents will choose the lesser of the two ге- 
wards. For each trial, subjects may be asked 
to explain their rationale for the placement 
of the cups. These explanations are analyzed 
for the degree to which they reflect considera- 
tion of the thought processes (i.e., strategies) 
of the opponent. The hide-the-penny guessing 
game used by Selman (19712, 1971b) is simi- 
lar in purpose and execution. 

Reliability. Relatively abundant reliabil- 
ity information is available for the various 
measures of cognitive/communicative ego- 
centrism. The three studies that provide 
measurement reliabilities for referential com- 
munication tasks report TR correlations of 
-86 (Chandler, Greenspan, & Barenbaum, 
1974), .89 (Deutsch, 1974), and .85-.95 
(Rubin, 1973), which indicates that this is a 
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fairly reliable measure. Rubin also found 
that the reliability (TR) of the private 
speech and recursive thought measures was 
adequate (.85-.95). However, Kohlberg et 
al. (1968) reported a TR correlation of only 
43 for their measure of private speech. Simi- 
larly low reliabilities are reported for Feffer’s 
RTT. All four studies that reported reliabil- 
ity coefficients obtained low internal con- 
sistency estimates: .27 (Keller, 1976), .40 
(Kurdek, 1977), 40 (Turnure, 1975), and 
42 (Feffer & Gourevitch, 1960). Kurdek 
also obtained a TR correlation for this task 
of .60. On the other hand, Chandler's privi- 
leged information stories are more reliable, 
although not dramatically so. Internal con- 
sistency reliabilities of .91 (Chandler et al., 
1974), .65-.86 (O'Connor, Note 5), .56 (Kur- 
dek, 1977), and .52 (Rubin, 1978) have 
been reported. Chandler et al. and Kurdek 
obtained TR correlations of .84 and .68, re- 
spectively. And finally, O'Connor and Kurdek 
reported adequate internal consistency (.77— 
.85 and .68) for two different game strategy 
inference measures, 

Interrater reliability and interrater agree- 
ment are uniformly high for all of the above 
measures, indicating that scoring is not a 
major problem (Byrne, 1974; Chandler & 
Greenspan, 1972; Chandler et al, 1974; 
Deutsch, 1974; Feffer, 1959; Hudson, 1978; 
Kohlberg et al., 1968; Kurdek, 1977; Leahy 
& Huard, 1976; Piché et al, 1975; Rubin, 
1972, 1974, 1978; Turnure, 1975; Urberg & 
Docherty, 1976; Weinheimer, 1972; Wolfe, 
1963; Zahn-Waxler et al, 1977; Marsh & 
Serafica, Note 6; Olejnik, Note 7). Unfortu- 
nately, some of these reliability estimates are 
difficult to evaluate, since the actual degree 
of rater independence is often unknown be- 
cause the scoring procedures are not described 
in sufficient detail, 

Summary. The various measures of ego- 
centrism can be organized into three cate- 
gories that correspond to their conceptual 
referents. The major dimensions that differ- 
entiate measures within categories are the 
familiarity of the task stimuli and the com- 
plexity of the task in terms of its perceptual 
characteristics, the task instructions, and the 
type of response required. The main dimension 
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that differentiates measures between cate- 
gories is the kind of inference required. (i.e., 
Is it concerned with seeing, feeling, or think- 
ing?) Visual/spatial egocentrism measures are 
patterned after Piaget and Inhelder's (1956) 
three-mountains problem, but there are many 
variations ranging from simple displays with 
familiar objects to fairly complex displays 
with novel objects. Not all errors on these 
tasks are egocentric errors. Affective ego- 
centrism measures are fewer in number, but 
these have special problems that relate to 
the kind of inferences that can legitimately 
be made about subjects’ thought processes 
(e.g., memory vs. perspective taking). Also, 
the internal consistency of these measures 
appears to be lower than one would like, 
indicating that the shared variance among 
task items is minimal. There are many dif- 
ferent kinds of cognitive/communicative ego- 
centrism measures, including referential com- 
munication measures, observational measures 
of children’s naturalistic speech, Feffer’s 
RTT, measures that assess appreciation of 
privileged information and comprehension of 
recursive thought, and measures that assess 
awareness of another’s strategies in a game 
situation. Where reported, reliabilities seem to 
be at least moderately high for all of the 
measures in this category except Feffer’s 
RTT, the internal consistency of which is 
too low to claim that it reliably measures a 
unitary dimension. 


Relationships Among Egocentrism Measures 


Tables 1, 2, and 3 summarize the data 
reviewed in this section and the last section 
that is relevant to the construct validity and 
reliability of measures of egocentrism. In 
these matrices of correlations, four kinds of 
reliabilities are indicated. Two refer to the 
reliability of the measure itself, TR and in- 
ternal consistency (IC). The other two refer 
to the reliability of the judges who scored 
the egocentrism measures, interrater reliabil- 
ity (IR) and interrater agreement (IA). 

The row (and column) labeled Other con- 
ceptual role-taking tasks (Tables 1 and 3) 
refers to a wide variety of tasks not subsumed 
by the other categories. These include a word 
association test (Piché et а]., 1975), one of 
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the many Flavell et al, (1968) tasks (Moir, 
1974), the conceptual role-taking tasks de- 
scribed in the previous section that were used 
by Zahn-Waxler et al. (1977), and a task 
devised by Selman in which one must infer 
the thoughts and intentions of a filmstrip 
character (Kurdek, 1977). Included in Ta- 
bles 1-3 are several constructs other than 
egocentrism, so discriminant validity can be 
evaluated as well as convergent validity. 

The data presented in Tables 1-3 should 
be interpreted with great caution. These 
tables are an organizational device and not 
the final word on the construct validity of 
egocentrism. Correlations are not the only 
kind of relevant evidence. The data in Tables 
1-3 will be more meaningful if one has a 
grasp of the rationale for each study, the 
specific procedures used, the situation in 
which the testing took place, and so forth, 
which can be obtained from the original ref- 
erence, Nevertheless, these tables should be a 
convenient starting point for those interested 
in investigating the construct validity of ego- 
centrism. 


Visual/Spatial Egocentrism and Affective 
Egocentrism 


Using two of the Flavell et al. (1968) 
visual/spatial measures and Rothenberg's 
affective egocentrism measure, Moir (1974) 
found in a sample of 40 11-year-old New 
Zealand girls essentially zero correlations 

“between the two domains after IQ was par- 
tialed out. One of the two correlations was 
significant (7 = .36, р < .05) before partial- 
ing out IQ. Rubin and Maioni (1975), using 
an adapted version of the three-mountains 
problem and Borke's affective egocentrism 
task, obtained a correlation of .44, which was 
not significant for their small sample of 16 
preschool children. The only other study to 
¿relate measures in these two domains (Kur- 
dek & Rodgon, 1975) correlated scores on a 
visual/spatial task that involved three Walt 
Disney characters and a task that required 
identification of affective states in situations 
in which facial and contextual cues were in- 
congruent. In their large sample (N — 167) 
of children from Grades K, 2, 4, and 6, the 
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correlation between the two tasks was signifi- 
cant only for second graders, and even there 
it did not account for a large portion of the 
variance (r = .36). 

One can tentatively conclude that the pro- 
portion of variance shared by these two 
domains is small indeed and is possibly at- 
tributable to general intelligence. One cer- 
tainly cannot conclude from these data that 
an underlying construct of egocentrism can 
explain individual differences on these tasks. 
This conclusion must remain tentative, how- 
ever, because the internal consistency of the 
affective egocentrism measures is so low that 
it may preclude obtaining high validity cor- 
relations. Although this may be a function of 
the hypothesized trait being measured, it may 
be a measurement failure, The question of 
what visual/spatial and affective egocentrism 
tasks might be measuring if not egocentrism 
is dealt with later. 


Affective Egocentrism and 
Cognitive/Communicative Egocentrism 


Only a few studies have correlated mea- 
sures in these two domains, and these substan- 
tially support the conclusions advanced pre- 
viously. Using a sample of 73 delinquents 
and nondelinquent controls, Rotenberg (1974) 
found a correlation of .02 between a cogni- 
tive role-taking measure that assessed sub- 
jects’ ability to predict other's everyday be- 
havior and an affective role-taking measure 
that assessed their tendency to relieve the 
distress of others, Hudson (1978) reported a 
small but significant relationship (r = .16) 
between Rothenberg’s affective egocentrism 
measure and Flavell's apple-dog story. Simi- 
larly, Rubin (1978) found partial correla- 
tions (controlling for chronological age) of 
—.23 and .00 between Chandler’s privileged 
information stories and the Rothenberg and 
Borke measures, respectively. Rubin also re- 
ported mostly nonsignificant partial correla- 
tions between these affective egocentrism 
measures and the hide-the-penny game, the 
Glucksberg-Krauss task, and the Miller et 
al. recursive thought measure, although these 
latter two were significantly related to Roth- 
enberg’s task (r = .25 and .20, respectively). 
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Table 1 (continued) 


Referential communication Recursive thought 


Affective egocentrism 


Visual/spatial egocentrism 


Measure 


.65* (.26*) [N = 80, 5-12(24)] 


.65* (.31*) [N = 80, 5-12(24)] 


.63* (,26%) [N = 80, 5-12(24)] 
sig (У = 108, 5-7(1)] 


Conservation 


—.11 EN = 80, 5-12(24)] 


.52* [№ = 60, 3-5(5)] 


22 [№ = 60, 3-5(5)] 
.05 [N = 80, 5-12(24)1 


16, 3-5(27)] 


.68* [N 


.68* [N = 16, 3-5(27)] 


105 [N = 80. 5-12(24)] 
sig [N = 108, S-7(1)] 


Popularity 
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); 32 = Turnure (1975); 33 = Urberg & Docherty (1976); 34 = Weinheimer (1972); 35 = Wolfe (1963); 


-Smith (1977). In most cases, no partial r was computed, so it does not appear in the entry. Some studies reported a range of correlations, and 


Greenspan, & Barenbaum (1974); 5 = Deutsch (1974); 6 = Feffer (1959); 7 = Feffer & Gourevitch (1960); 8 = Feffer & Suchotliff (1966) ; 
most used several separate age groups. IA is expressed as a percentage rather than as a correlation. 


10 = Hudson (1978); 11 = Keller (1976); 12 = Kohlberg, Yaeger, & Hjertholm (1968); 13 = Kurdek (1977); 14 = Kurdek & Rodgon (1975); 15 = Leahy &Huard 


f groups used to compute the correlations, and reference represents a number іп the following reference key: 1 = Bunting (1975); 2 = Byrne (1974); 3 
(Note 6); 17 = Moir (1974); 18 = O'Connor (Note 5); 19 = Olejnik (Note 7): 20 = Piché, Michlin, Rubin, & Johnson (1975); 21 = Rotenberg (1974); 


23 = Rubin (1972); 24 = Rubin (1973); 25 = Rubin (1974); 26 = Rubin (1978); 27 = Rubin & Мајот (1975); 28 = Rubin & Schneider (1973); 29 = Selman 


where r is the correlation between two sets of scores, partial r is the same correlation with IQ, mental age, or chronological age partialed out, № is the number of subjects in the 


30 = Selman & Lieberman (1975); 31 = Sullivan & Hunt (1967 


36 = Zahn-Waxler, Radke-Varrow, & Brady 


Note. TR = test-retest reliability; IC = internal consistency; IR = interrater reliability; ТА = interrater agreement. Each entry is in the general form: r (partial ғ) (У = X, ages(reference 
16 = Marsh & Serafica 


жр < .05, except for IA. Studies that reported only whether a correlation was significant or nonsignificant are denoted sig or ns. 


= Chandler & Greenspan (1972); 4 = Chandler, 


study, ages is the ages (in years) o! 
9 = Heilbrunn (1974); 


key number) ], 
22 = Rothenberg (1970); 


(1976); 
(1971); 
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Another study that found a modest but 
significant relation between the affective and 
cognitive/communicative domains (Moir, 
1974) reported a correlation of .49 between 
Rothenberg’s task and a game strategy in- 
ference measure. With IQ partialed out, the 
correlation was still significant (r= .35). 
One possible interpretation of these relation- 
ships is that some of these measures may be 
tapping some common social cognitive or 
personality dimension such as social insight 
or social sensitivity. However, without more 
validity information this hypothesis is little 
more than speculation. The alternative hy- 
pothesis that performance on perspective- 
taking measures in the affective and cogni- 
tive/communicative domains can be explained 
by an underlying construct of egocentrism is, 
as it was for the visual/spatial and affective 
domains, untenable. 


Visual/Spatial Egocentrism and 
Cognitive/Communicative Egocentrism 


A set of three studies that related visual/ 
spatial and referential communication mea- 
sures shows a fairly consistent pattern of sig- 
nificant correlations. Rubin (1973) obtained 
a correlation of .65 between Flavell’s four 
displays and the Glucksberg-Krauss task on 
a sample of 80 children aged 5-12. With IQ 
partialed out, the correlation was considerably 
lower but still significant (r — .35). In an- 
other study of 112 children in grades two and 
six and college-aged and elderly adults, 
Rubin (1974) found a correlation of .49 be- 
tween the same two tasks, although the corre- 
lation within ages was only significant for 
the sixth graders and undergraduates. Heil- 
brunn (1974) found a significant correlation 
for 8-year-olds but not for 12-year-olds be- 
tween performance on a modified version of 
the three-mountains task and a referential 
communication task. These studies together 
suggest that there may be a common under- 
lying construct that weakly ties the two sets 
of measures together. 

Do other cognitive/communicative egocen- 
trism measures show a similar relationship 
with visual/spatial egocentrism? With one 
possible exception, the answer seems to be 
no. Sullivan and Hunt (1967) found correla- 


MARTIN E. FORD 


1180 


цовә jo шлој əy} jo uorjeue|dxo ue 40 *1u2u13943* eU = үү: AM|Iqetjes 421914930! = MI t Адџојатвиоо |еилоуш = 


(од -€ ‘zt = N] 26001 = VI 
(о) упре-01 ‘49 = N] %06 = VI 
[(eDor-o ‘sb = N] %96 = VI 
С(Е1)01-9 "8+ = А7 89 = 01 
C(gi)s-¢ ‘st = N] 58-11 = 01 
L(eDor-o ‘sb = N] 1 = МІ 


(00 11-£ и = NJ (£0 —) 
[(£)01-9 '96 = N] G£Z) «St^ 


[(eDor-o ‘st = N] %26 = VI 
C(ze)s-€ ‘te = N] %L6 = VI 
C(on)s-z ‘ort = NJ %001 = VI 
[(02)01-6 ‘02 = N] 84" = Ul 
[(SDz1-6 ‘89 = N] #8" = МІ 
L(g)c1-9 ‘98 = N] #6" = МІ 
C(61)6-S ‘021 = Л] 56 = МІ 
C) 81-8 ‘621 = N] 96 = WI 
[(Е1)01-9 ‘8% = NJ] 98° = 01 
[(61)6-S ‘ozi = N] 98-s9' = 01 
[()s1-8 ‘SZI = N] 16 = 01 
L(eDor-o ‘sh = N189 = UL 
C(#)st-8 ‘SZI = Л] +8 = UL 


L(enor-o '96 = N] (zc) «v£ 


L(0201-6 ‘oz = NJ Sr 
[(enor-o ‘96 = N] (90) «£c 


L(£e)01-9 ‘8t = N] %S6 = VI 
C(se) 12-01 ‘IZ = N] #8' = Ul 


£(0z) 01-6 ‘oz = N] 98° = МІ 
LGg)s-s ‘091 = N] 98° = МІ 
L(g9)sumpe ‘se = N] 68' = МІ 
L(ze)z1-2 ‘99 = N] 46° = МІ 
((91)01– '091 = N] 66° = UI 
LGD£T-I ‘19 = N] LU = 01 
СС) 21-1 ‘09 = N] OF = 2I 
C(e1 01-9 "8% = N] OF = 01 


C) et-9 ‘08 = N] 2 = 01 
L(eDor-o ‘84 = N] 09 = UL 


“YI 10] 3da0xe '60 > 4 
"том V зла, 298 "Алпа 
DI Ame 3521212391 = YL HON 


ѕәџ8әјӘ215 owes лојиј 


џопешлојит родопама 


ASEL Зорјеј-ојом SA 


soroje1js aures лојиј 


uoneurjur рәдәрлиа 


HSLL SurqeT-ojoy SPAA 


әлпетәуү 


CNET. L2 14s14ju22085 fo saansvapy fo Kjyiqnyay puo KPPUDA 12141807) 


с 9IqeL 


1181 


EGOCENTRISM 


сви 10 31$ pa30uap зле JURDYIUSISUOU 10 302201031 SEM џопејаоо e зәцјәҷА [uo pa310da1 72y} SAPNIS ^y] 203 3dooxo <0' > d, 
"40N | 9|qe 298 ‘AQUA 
Чэвә jo WHO} 93 jo uoreuejdxo ue 10 4 -1uauj2018e 303e1193uI = F] t Амаецол ојелзозш = WI :49usjsrsuoo јешлојш = Dy $ Ayypiqerjes 35ә3ә1—3593 = YL HON 


С) 21-5 ‘08 = N] zo — 


L(zD 1% ‘8z = N] (sz) o£ Aqaejndog 
L(2)cI-s ‘08 = N] (90 —) Sz" uoneA1asuo;) 
Lag) TI- '06 = N] $c-60 
: С(8) пре ‘og = N] su 
L(08-1 'or1 = N] 60 С) 21-1 ‘99 = л] Г(21)1-9 ‘9с = N] #0'— 
5232 әшов Јој 215 C(z1)s- ‘9 = N] «0r age [езџәш 
L()e1-9 ‘89 = N] 3is Cz) z-s ‘08 = N] «£v 10 ÕI 
C(9£) 1-Е '801 = N] %001-98 = VI 
L(eDor-o ‘8 = N] %66 = VI 
L(o201-6 ‘oz = N] zU [(0z) 01-6 ‘oz = N] 12 L(eDor-o ‘8 = N] 29' = 01 IU] 
L(£001-9'o6 = N] (60) «ве С(Е1)01-9 ‘96 = N] (456) «6 — C(98)-z¢ ‘got = N] $6-06' = DI Зирјед-әјог 
С(0ғ) ‘89 = N] 29° = UL ]enadaouoo 
L(eDor-o ‘8+ = №) 99 = МІ 19930 


C(Z1) L= ‘87 = N] „89  woeods [21506 


С6021–5 ‘oz = N] 5628 = WI 
L(cD4- '8@ = N] 58 = МІ 
[01% 'oc = N] 06 = UI 
C(z1) 1% ‘97 = N] ev = UL 
[(#z)Z1-s ‘01 = N] 56—58° = UL Yds әјелна 


uorjeuriojur роволтла xser 3urjer-ojow 5е бол syse} Витјеј-ојол |епјдәоиоо 19010 yoaads јела 9Jns?3]A 


шугајид2од у fo so4nspo]y fo &pypiqpijoy рит «ируүод 121418407) 
£ әде], 


1182 


tions of .25, .00, and .35 between the three- 
mountains task and Feffer's RTT at ages ts 
9, and 11, respectively. Heilbrunn similarly 
found no relationship at age 8 and a signifi- 
cant but modest correlation at age 12. Con- 
sidering the poor internal consistency of the 
Feffer measure, it is not surprising that these 
correlations are unimpressive. The only study 
that correlated visual/spatial and privileged 
information tasks, Kurdek and Rodgon’s 
(1975) study of 167 children from kinder- 
garten to sixth grade, found nonsignificant 
correlations at all ages except grade five, 
which may be attributable to chance, consid- 
ering the number of correlations computed. 
Moir (1974) found that one of two visual/ 
spatial tasks was significantly correlated with 
Selman's measure of inferring game strate- 
gies, but Selman (1971b), using a sample of 
60 preadolescents, did not obtain a significant 
relationship between his measure and two 
scores from a different visual/spatial perspec- 
tive task. Rubin (1973) found a nonsignifi- 
cant correlation of .28 (with IQ partialed 
out, —.06) between private speech and visual/ 
spatial egocentrism. It seems that the only 
egocentrism measure other than referential 
communication that might be significantly 
related to visual/spatial perspective taking is 
comprehension of recursive thought; Rubin 
(1973) found a correlation of .73 (with IQ 
partialed out, a still significant .36) between 
these two measures. 


Within-Domain Correlations for 
Cognitive/Communicative Egocentrism 


Correlations among the various cognitive/ 
communicative egocentrism tasks support the 
hypothesis that referential communication and 
recursive thought, along with visual/spatial 
perspective taking, form a cluster of measures 
that share some reliable variance beyond 
general intelligence and are more or less unre- 
lated to other measures in this domain. Rubin 
(1973) found a correlation between measures 
of referential communication and recursive 
thought of .72, which remained significant 
when IQ was partialed out (r= 31). In 
Rubin's (1978) study, this finding was repli- 
cated (partial r — 46). In these two studies 
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Rubin also found that neither measure was 
consistently related to private speech, the 
hide-the-penny game or Chandler's privileged 
information stories. In a sample of 20 fourth 
graders, Piché et al. (1975) also found non- 
significant correlations between Feffer's RTT 
and two referential communication tasks (7 
= .23 and —.08, respectively). However, one 
of these latter two measures, the Crystal 
Climbers Task, was significantly related to 
Chandler's stories (7 = 44), although the 
other, the Baldwin and Garvey Picture Iden- 
tification Task, was not (r — .32). Similarly, 
Chandler et al. (1974) found a modest but 
significant correlation of .31 between the 
privileged information stories and a referen- 
tial communication task, whereas Leahy and 
Huard (1976) obtained a correlation of only 
.05. None of the above three studies controlled 
for age or IQ, which probably accounts for 
some of the small amount of shared variance 
between these two sets of measures. For ex- 
ample, Rubin (1978) obtained a significant 
negative correlation (r= —.25) between 
the Chandler and Glucksberg-Krauss tasks 
when chronological age was partialed out. 
Chandler’s stories also do not consistently 
relate to Feffer’s RTT (Kurdek, 1977; Piché 
et al., 1975), although low but significant 
correlations are reported by Kurdek between 
Chandler’s stories and Flavell’s nickel-dime 
game and between Feffer's RTT and the 
nickel-dime game. An important and reveal- 
ing exception to this general pattern of low 
correlations among cognitive/communicative 
egocentrism measures is the Kohlberg et al. 
(1968) finding that private and social speech 
are highly correlated (r — .68). These mea- 
sures probably reflect a general level of ma- 
turity in the use of language. 


Interpretation and a Hypothesis 


From these data one can argue that if 
there is a construct of egocentrism under- 
lying performance on these tasks, it is not 
manifested on other than the visual/spatial, 
referential communication, and recursive 
thought measures. Even this conclusion is 
tentative, since it rests heavily on one study, 
and much of the shared variance among 
these tasks is attributable to general intelli- 


_ gence. One might note that Rubin’s (1973) 
factor analysis of these three measures, pri- 
А ü vate speech, conservation, popularity, and 
_ * chronological and mental age yielded two fac- 
tors that are easily identifiable as general 
- intelligence and popularity factors despite the 
fact that Rubin labeled the first factor an 
egocentrism factor. (This factor had high 
positive loadings from all variables except 
popularity.) Still, the modest but perhaps 
unexpected relationship among visual/spa- 
tial, referential communication, and recursive 
- thought measures, if replicable, must be ac- 
counted for. Although egocentrism is one 
. possible explanation, it is unparsimonious, 
since it fails to account for the predomi- 
nantly nonsignificant correlations between 
these three tasks and other cognitive/com- 
municative (as well as affective) egocentrism 
measures, 

One can entertain the hypothesis that 
these measures are related beyond their sim- 
ple relationship with general intelligence 
primarily because they share a property 
usually identified with a "purely" cognitive 
factor. This factor, labeled differently by 
different theorists (e.g, Bernyer, 1958; 
Horn, 1968; Vernon, 1965), may be in- 
terpreted at its most general level as involv- 
ing the ability for spatial thinking and reason- 
ing and general perceptual mastery. This 
hypothesis may seem more justifiable after 
examining the data on visual/spatial tasks 
and then analyzing the specific task require- 

«ments of the referential communication and 
recursive thought measures. 


Further Consideration of Visual/Spatial 
Egocentrism Measures and Their Correlates 


Several researchers have carefully ana- 
lyzed the specific kinds of errors made by 
children on visual/spatial egocentrism mea- 
„Sures, and one may refer to these for а more 
complete understanding of performance on 
these tasks (e.g., Coie et al., 1973; Eliot & 
- Dayton, 1976; Fishbein et al., 1972; Hutten- 
locher & Presson, 1973; Shantz & Watson, 
1971). The important point for this analysis 
is that in general, the proportion of egocentric 
errors is small at all ages, and the major de- 
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velopmental trend is not a tendency to make 
proportionately fewer egocentric errors but 
rather simply to make fewer errors of all 
kinds. Eliot and Dayton (1976), using a sam- 
ple of 410 first graders, 421 fifth graders, and 
260 adults and an adaptation of the three- 
mountains problem, found that egocentric 
(frontal) errors were made for the 90°, 180°, 
and 270° positions on only 16% of the trials 
with first graders and 8% of the trials with 
fifth graders, These proportions are not sig- 
nificantly different from each other. There was 
a highly significant decrease in the actual 
number of total errors made, however, and as 
expected, adults were proficient on this task. 
The authors concluded that young children 
are less perceptually accurate but not more 
perceptually egocentric, 

Coie et al. (1973) found a similar tendency 
for egocentric errors to be relatively infre- 
quent in a sample of 90 second, third, and 
fourth graders, although performance was far 
from perfect: Errors totaled 476, but only 80 
of these were egocentric errors. They ana- 
lyzed their data by ability levels (defined by 
performance on the egocentrism task) and 
did find a moderately significant trend towards 
making fewer egocentric errors at higher abil- 
ity levels. Although this is more compatible 
with the observations reported by Piaget and 
Inhelder (1956) and Shantz and Watson 
(1971), these age or ability trends are easily 
overestimated. Coie et al. (1973) found that 
20% of the errors in their lowest ability 
group were egocentric, which compares to a 
chance level of 8%. The authors concluded 
that development in this domain can best be 
characterized by small, undramatic transi- 
tions in the mastery of one's visual field. 

A study even more striking in its demon- 
stration of the small role played by egocen- 
trism in the development of visual/spatial 
perspective-taking ability is a study by Fish- 
bein et al. (1972). These researchers used a 
set of displays of increasing complexity 
briefly described in the last section (one or 
three toys and turning the display vs. point- 
ing to one of either four or eight photo- 
graphs). The complexity of the display and 
the mode of responding had significant effects 
on task performance, but the proportion of 
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egocentric errors did not show any tendency 
to decline with age. In fact, on the pointing 
task, preschoolers were less likely to make 
egocentric errors than were first graders across 
all variations of the display, even though the 
preschoolers made more total errors. Simi- 
larly, third graders were at least as likely as 
first graders to make egocentric errors. To- 
gether, these studies suggest that little of the 
variance in visual/spatial perspective-taking 
tasks can be attributed to egocentrism and 
that a straightforward cognitive reinterpreta- 
tion might be tenable. Unfortunately, no 
study has related performance on visual/spa- 
tial tasks with scores on psychometric tests 
of spatial or perceptual thinking or with 
measures of spatial or perceptual processes 
Such as mental rotation; consequently, this 
hypothesis requires further testing before con- 
fident conclusions can be drawn. 
How do referential communication and re- 
cursive thought measures fit into this pic- 
ture? A speculative hypothesis based on a 
simple analysis of the requirements of these 
tasks is that a major component of success- 
ful performance may be the ability to percep- 
tually encode and discriminate the task stim- 
uli. Referential communication measures such 
as that created by Glucksberg and Krauss 
(and used by Rubin in his 1973 study) use 
stimuli that are novel and diffücult for chil- 
dren to perceptually process. The cognitive 
interpretation advanced here is Supported by 
findings reported in Glucksberg et al, (1966). 
Up to age 4 (younger than Rubin's subjects) 
children were unable to successfully perform 
the task, even when pictures of familiar ani- 
mals were substituted for the novel forms. 
However, between 52 and 63 months of age, 
all subjects could do the task with animals 
used as stimuli, but none could do it with the 
novel forms. Older subjects were not tested in 
this study, but another study by Hoy (1975) 
extends these observations of gradual improve- 
ment with age depending on the perceptual 
complexity and familiarity of the stimuli used 
as referents and nonreferents, Using 36 chil- 
dren aged 5, 7, and 9, Hoy found that children 
of all ages were better able to describe how 
to build a toy horse than they were a random 
shape and that performance improved with 
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age for both stimuli. Similarly, Grushcow and 
Gauthier (1971) reported that 24 5-year-olds 
were more successful on a referential commu- 
nication task using familiar rather than un- 
familiar animals (67% vs. 46%) but were, 
also more successful when unfamiliar animals 
were used than when familiar symbols were 
used (46% vs. 33%). When the task stimuli : 
are chosen such that they are familiar and 
easily discriminable, children as young as 3 
years old demonstrate an appreciation for the 
perspective of another: Maratsos (1973) 
found that children at this age were far more 
explicit verbally when communicating about 
the characteristics of simple referents (fa- 
miliar toys) to an experimenter who appar- 
ently could not see than to an experimenter 
who could see. 

In the recursive thought task, although sub- 
jects are taught to discriminate thinking car- 
toon clouds from talking clouds, most items 
are unusual and fairly complex perceptually. 
For example, if a boy is thinking of a girl who 
is thinking of the mothery three progressively 
smaller drawings of three different people 
appear in clouds of various shapes and sizes # 
embedded in each other. Items of greater 
complexity may be even more difficult to de- 
cipher perceptually, 

One important implication of this dis .s- 
sion is that although an awareness that otuers 
can have a different perspective and that one 
must infer that this perspective does seem to 
be required in most egocentrism measures, . 
these abilities may be present in their simplest, 
forms early developmentally (Borke, 1972). 
Although inferring perspective-taking ability 
is difficult in some egocentrism measures, 
Presumably nonegocentric performance has 
been observed for children as young as 2 
years old in the visual/spatial domain (Ver- 
kozen, 1975) and 3 years old in the affective 
(Borke, 1971) and cognitive /communicative 
domains (Maratsos, 1973; Menig-Peterson, 
1975). 

The implication of these findings for de- 
termining the sources of variance in perform- 
ance on egocentrism tasks is that perspective- 
taking ability may account for little of ће | 
Variance after age 4 ог 5 (Mossler, Marvin; 
& Greenberg, 1976). More plausible sources 
of variance include (a) general intelligence 

a 


f 
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(Rubin, 1973); (b) verbal comprehension 
" (Shantz & Watson, 1971); (c) specific cog- 
nitive factors such as spatial or perceptual 
abilities, depending on the complexity and/or 
familiarity of the task stimuli (Coie et al., 
1973; Eliot & Dayton, 1976; Pufall, 1975); 


['(d) characteristics specific to the type of 


а 


response required (e.g., verbal vs. nonverbal; 
symbolic vs. concrete), since it may be the 
case that young children are unable to express 
their knowledge through the required response 
mode (Fishbein et al., 1972; Garber, 1975; 
Shantz, 1975); and (e) variables highly spe- 
cific to the task, such as whether a real per- 
son or a doll is sitting in the position in which 
a visual/spatial perspective must be inferred, 
which Cox (1975) found made a significant 
difference (easier if a real person is used). 
Summary. To summarize the data in 
this section and the implications of these 
data, visual/spatial, affective, and cognitive/ 
communicative perspective-taking measures do 
not appear to tap à single unitary dimension 
of egocentrism. Measures of egocentrism are 
typically as highly correlated with other con- 
structs (i.e. IQ, conservation, and popular- 
ity) as they are with measures of the same 
construct. Social speech and private speech 
ures may be tapping some common di- 


mi 

mUllsion that represents the developing child's 
mastery of language skills. Feffer’s RTT and 
the measures of game strategy inference and 


awareness of privileged information seem to 
be independent of other measures in these 
- domains, at least beyond any commonality 
due to general intelligence. For the most part, 
the same can be said for measures of affective 
egocentrism. This does not necessarily mean 
that these tests aren't measuring something 
that is consistent and important, since mea- 
surement reliabilities are generally adequate. 
Visual/spatial, referential communication, 
and recursive thought measures seem to 
share some modest but significant amount of 
“variance in addition to that attributable to 
general intelligence. Given that most errors 
made on visual/spatial perspective-taking 
tasks are not egocentric errors and that de- 
velopmental improvements on these tasks are 
not characterized by dramatically smaller 
proportions of egocentric errors, the relation- 
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ships among these three measures are more 
likely due to some common factor other than 
egocentrism, such as perceptual and/or spatial 
facility. For all of the measures that purport 
to measure egocentrism, task-specific and re- 
sponse-specific characteristics may account 
for a large proportion of the variance in per- 
formance, although this variance should not 
necessarily be construed as error, particularly 
if this variance can contribute to our knowl- 
edge about social cognitive development. 


Conclusion 


As a whole, the evidence reviewed in the 
last two sections and summarized in Tables 
1-3 fails to support the construct validity of 
egocentrism. The most common finding is a 
lack of relationship among egocentrism mea- 
sures, even for those whose reliability indi- 
cates that something consistent is being mea- 
sured, The few commonalities found among 
specific tasks are more parsimoniously in- 
terpreted as the result of other explanatory 
constructs, such as those referring to the 
general level of cognitive, perceptual, or 
linguistic development of the child. Cron- 
bach and Meehl (1955) comment that 


one who claims that his test reflects a construct 
cannot maintain his claim in the face of recurrent 
negative results because these results show that his 
construct is too loosely defined to yield verifiable 
inferences. (p. 291) 


This statement is applicable to the research 
on egocentrism. 

There are two main implications of nega- 
tive evidence for construct validity. Either 
some or all of the tests are not good measures 
of the construct, or the theory that specifies 
the meaning of the construct is incorrect 
(Cronbach & Meehl, 1955). Since many of 
the egocentrism measures are reliable and 
possess a good deal of face validity, some 
doubt is cast on Piaget's theory, or at least 
that part of it that sets forth the meaning of 
egocentrism. Developing and testing alterna- 
tive theoretical formulations that could better 
account for egocentric behavior is an invest- 
ment of research effort that would most 
likely provide greater payoff in explanatory 
power. 
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Large Contingency Tables With Large Cell Frequencies: 
A Model Search Algorithm and Alternative Measures of Fit 
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A new search algorithm, the Generalized Guided Method, for locating a model 
for contingency tables and a measure of fit related to the R* of multiple re- 
gression are proposed for the analysis of contingency tables with many cells 
and large cell frequencies. This algorithm is designed to analyze contingency 
tables that contain dichotomous and/or nondichotomous (polytomous) varia- 
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for illustration. 


The user of conventional multidimensional 
contingency table techniques, such as those 
described by Goodman (1972a, 1972b), en- 


| counters two problems in the analysis of con- 


tingency tables with many cells and with large 
cell frequencies: (a) which measures of fit to 
„use in assessing and comparing models and 
(b) how to search for appropriate models 
among the multitude of possible hierarchical 
models. This article, building primarily on 
the work of Goodman, addresses these prob- 
lems. A model selection criterion and a measure 
of fit that is less dependent on sample size than 


wer the traditional chi-square measures are pro- 


^N 


posed, and a new search algorithm for locating 
models that satisfactorily fit large contingency 
"tables is described. 

The data used for illustration are from 
Miller, Simons, and Fein (1974) and involve 
characteristics of 559,158 persons admitted 
to British mental hospitals. Miller et al. were 
interested in whether the legal code under 
which a person is admitted to the hospital 


У varies by age (four age categories), sex (two 


sexes), region (15 regions), and year of admis- 
sion (7 years). The two categories of the 
"variable Legal Code are formal (similar to 


Requests for reprints should be sent to Douglas A. 
Zahn, Department of Statistics, Florida State Uni- 
versity, Tallahassee, Florida 32306. 


bles. A 4 X 2X 15 X 7 X 2 contingency table with 559,158 observations is used 


involuntary commitment) and informal 


(similar to voluntary commitment). 


Notation and Terminology 


The symbols A, S, R, Y, and L denote the 
variables in the five-way, 4 X 2 X 15 X 7 X 2 
contingency table under consideration: age, 
sex, region, year, and legal code, respectively. 
Let fij4» denote the observed frequency in 
cell (i, j, k, 4, m). Let Fija» denote the ex- 
pected value of this cell under the model 
being fitted, and let Ê ijktm denote the estimate 
of Куни. F and Ё may be superscripted to 
indicate a specific model. 

Attention is restricted here to the situation 
in which one of the dichotomous variables in 
the contingency table is viewed as a dependent 
variable. These techniques may also be used in 
situations with polytomous dependent vari- 
ables, although the results are more difficult 
to interpret, as explained in Goodman (1971). 
Considering L as the dependent variable, let 
wijkt = fükn/ Јама denote the observed odds 
in favor of formal admission for patients in 
cell (i, j, k, £). Let Фу denote the expected 
odds that the admission is formal under 
the model being considered, where Орн = 
Fisa/Fijus. Let Фум = In ом, where In de- 
notes the natural logarithm. Thus Ф denotes 
the logarithm of the expected odds that an 
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admission is formal and has been termed by 
Bishop, Fienberg, and Holland (1975) and 
others the logit pertaining to variable L. The 
saturated model for ®;;.¢ can be written 


Фри = B+ BE + 85 + BE + BE + В 
+ BE + BAY + BSE + BSE + BEY + BA 


+ BEY + ВАЗУ +- 855 + BEY, (1) 
where the 68 satisfy the constraints 
4 4 2 
DB = 0, ..., У, BAY = у, BARRY 
i=l i=l j=l 
S pABRY — 5" GABBY 0. Q) 
= Бар = Emo 


The order of an effect in the logit model is 
the number of letters in its superscript. An 
effect may be denoted by the letters in its 
superscript; for example, AS may be used to 
denote 848. One effect is said to be a lower 
order relative of a second effect if the letters in 
the superscript of the first are a subset of those 
in the superscript of the second. Thus the third- 
order effects ASR and ASY are lower order 
relatives of the fourth-order effect ASRY. A 
hierarchical model is one in which, for each 
interaction appearing in the model, all lower 
order relative effects are also in the model. 

Attention is restricted here to hierarchical 
models. Goodman’s (1972b) “minimal set of 
marginal tables fitted under the model” 
(p. 39) notation will be used to denote the 
models that are considered. However, this 
notation presents a problem, since it refers 
to cell frequency models, that is, to models for 
In Fijktm rather than to logit models, such as 
the one given in Equation 1. The difficulty can 
be circumvented by determining for each logit 
model its equivalent cell frequency model. A 
cell frequency model and a logit model are 
said to be equivalent if estimates of the param- 
eters of the logit model that use the estimated 
cell frequency model parameters are equal to 
those produced by direct estimation of the 
logit model parameters. Since in logit analyses 
the contingency table formed by the inde- 
pendent variables is assumed to be fixed, the 

equivalent cell frequency models must preserve 
this table (see Goodman, 1971). Where L is 
the dependent variable, the cell frequency 
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models that are equivalent to the logit models 


notation, it can be shown that the cell fr 
quency model that is equivalent to the logit 
model 


tua B+ ВУ + A+ BE + ВАЧ (3) 


preserves the (ASRY), (SL), and (ARL) 
margins. This model can be identified by the 
abbreviated list of 8 parameters included in 
it, that is, the parameters that have no higher 
order relatives in this logit model, namely S 
and AR. 

Another aspect of the equivalence of cell 
frequency and logit models is that they yield 
the same estimated cell frequencies, P. Using 
the estimated logits, Фин and the entries in 
the (ASRY) margin, fiic = fiin + Јуна 
one can compute estimated cell frequencies 
using the relation 


Pia = (fines) exp @ijne)/ 
[1+ exp(b;,)0], @ 


where exp(x) = e*. Also, the ^s from the cell 
frequency model can be used to compute esti- 
mated logits that are equivalent to those 
obtained using the logit models, by the formula 
Фин = In (Руму ума). Since the Ps pro- 
duced by logit models are equivalent to those 
produced by cell frequency models, the fit of a 
logit model may be assessed by the likelihood- 
ratio chi-square goodness-of-fit statistic for 
the equivalent cell frequency model. 

Model A is said to be nested in Model B if 
Model A is a special case of Model B that can 
be obtained by setting some parameter(s) in 
Model B equal to zero. For example, the model 
(ARL), (ASRY) is nested in the model 
(ASL), (ARL), (ASRY). 


Measures of Fit for Contingency Tables With 
Large Cell Е. requencies 


The hypothesis H that a specific model fits 
the data in a contingency table is examined by 
first determining the maximum likelihood 
estimates, assuming H is true, of the expected 
cell frequencies in the table, by using the 
iterative proportional fitting procedure, which 
15 also called the Deming-Stephan (1940) 
algorithm. This can be done with any of several 


| 
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LARGE CONTINGENCY TABLES 


contingency table computer programs that 

, are available (e.g., Dixon, 1975; Goodman, 
* 1973; Haberman, 1972; Zahn, Note 1). The 
"use of this algorithm is illustrated in several 
articles, including those by Davis (1974), 
Fienberg (1970), and Goodman (19723). 
Conventionally, the differences between the 
observed cell frequencies f;j4» and the esti- 
mated expected cell frequencies Pj». under 
hypothesis H are examined to determine if 
the discrepancies are large enough to cast 
doubt on the hypothesis that H fits the data, 
by using the likelihood ratio chi-square 
statistic 


«X(H) = 22, 25 2 > У Пуна 
X In (јума/ Ромт)). (5) 


This statistic has an asymptotic chi-square 
distribution with degrees of freedom denoted 
df(H). (For an extended discussion of the cal- 
culation of df(H), see Bishop et al, 1975, 
section 3.8; and Davis, 1974, pp. 205-208, 
213). The computer programs listed above 
provide d/(H). 
* Problems arise if X*(H) is used as a measure 
of fit in a contingency table with large cell 
frequencies. Bishop et al. (1975, section 9.6) 
show that the magnitude of this statistic is 
proportional to the sample size if the hypothesis 
H is not exactly true. With large cell fre- 
quencies, only the saturated model may yield 
an insignificant chi-square statistic. Terms 
of marginal utility may be incorporated into 
«the final model simply because their chi-square 
statistics are inflated to significance by the 
large sample size. This sacrifices the parsimony 
often desired in describing a data set. In 
addition, analysis of models that contain 
terms that have small effects may yield less 
accurate cell frequency estimates than does 
analysis of simpler models (Bishop et al., 
1975, section 9.2; see Hocking, 1976, for 
analogous multiple regression results). These 
«arguments also speak against the suggestion 
that with a very large sample size, the only 
appropriate analysis, rather than hypothesis 
testing, is the estimation of parameters in the 
saturated model. у 

In developing an alternative to chi-square 
for assessing the goodness of fit of a model, 
the assessment of fit of multiple regression 
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models, which has been more extensively 
studied, should be considered. Namboodiri, 
Carter, and Blalock (1975, p. 458), in a dis- 
cussion of models based on regression methods, 
suggest that the researcher not rely solely on 
tests of significance in assessing fit (a) be- 
cause large sample sizes would lead to reject- 
ing models with adequate fit and (b) because 
in the social sciences, even with a perfect 
model, measurement errors in the variables 
may produce distortions great enough to 
cause the model to be rejected. Among the 
suggestions by Draper and Smith (1966, pp. 
165, 238) for assessing fit are (a) the subjective 
examination of the increase in R? as additional 
variables are added, looking for "breaks" in 
the rate of increase, after which additional 
variables add little explanatory power; (b) 
the setting of an arbitrary proportion of the 
total variance that must be explained for the 
model to be considered adequate. In their 
example, an R? of .80 is selected. There is, 
then, precedence for moving away from tests 
of significance as criteria of fit in the develop- 
ment of models. 

Both of Draper and Smith's (1966) sug- 
gestions for alternative measures of fit are 
based on the assessment of the proportion of 
variation explained by the model. Goodman 
(1971, p. 54; 1972a, p. 1057) proposed a 
statistic analogous to the R? of multiple 
regression for contingency tables. He defined 
the statistic 


К° = [х(Но) — ХН) ух(Но, (6) 


where Но denotes the hypothesis Фм = 8, 
that is, that the logits are constant, and there- 
fore the effects of all variables in the logit 
model are zero. 

The just defined R°? statistic simultaneously 
reflects two pieces of information about Model 
Н: (a) R? itself measures the proportion of the 
total variation in the table, as indicated by the 
lack of fit for Model Ho, X?(Ho), which is 
explained by Model Н; and (b 1— R? 
measures the proportion of variation still un- 
explained in the table. This partition relates 
to the two chi-square statistics that are of inter- 
est in assessing any contingency table model. 
The first is the difference X?(Ho) — ХН), 
which tests the significance of the parameters in 
Model Н, whereas the second, X*(H), tests the 
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lack of fit of Model H or the significance of 
the parameters in the saturated model that 
are not in Model H. The first of these is 
analogous to the multiple regression F test of 
the hypothesis that all the parameters in the 
regression model are equal to zero, whereas the 
second statistic does not have an analogue 
in multiple regression, since in the latter 
context we do not have a test statistic available 
for the determination of whether the variance 
unexplained is statistically significant. 

Our use of R* goes beyond Goodman's 
(19722) recommendation that this statistic be 
used as a measure of how well Model H fits 
the data. We propose that R? may be a better 
criterion for choosing among models for 
contingency tables with large cell counts than 
is chi-square. 


Comments on R? 


Perhaps the most difficult question relating 
to the use of R? as an indicator of the fit of 
a given model is when a model can be said to 
fit adequately. If the researcher were depend- 
ing on statistical significance as a criterion, 
a cutoff point could be chosen, though the 
choice of a specific significance level is difficult. 
Even in multiple regression models, which have 
a longer history of usage, traditional cutoff 
points for the adequacy of R? have not been 
developed. Also, recent work in multiple re- 
gression indicates that judging when а multiple 
regression equation fits adequately may be a 
more difficult question than previously thought 
(Mosteller & Tukey, 1977, chap. 12-16). An 
additional problem in contingency tables is 
that researchers have considerably less ex- 
perience in determining R? values of the kind 
proposed here than they do for multiple 
regression studies, 

In general, the choice encountered in adding 
more parameters is between increased ex- 
planation of variation, that is, increased R, 
and increased complexity of the model. Results 
in the regression literature, as summarized by 
Hocking (1976); imply that the more inter- 
actions with small effects there are in the 
model, the less precisely the logits are esti- 
mated. As parameters are added to the model, 
generally the rate of increase in К? per param- 
eter added decreases rapidly. Thus there comes 
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Table 1 

Sample Size, Degrees of Freedom, x*, and R$ 
Values for Models That Were Assessed in 
Recent Literature as Fitting Contingency 
Tables 


Source n df x 
Goodman (1971) 1,008 8 5. 
Goodman (19725) 8,036 2 1.32 
Goodman (1973) 

Model 8 2,982 2 31 
Model 16 2,982 10 6.11 
Model 20 2,982 14 13.95 


a point in most contingency table analyses 
at which additional increments in the 
statistic will cost dearly. 
Perhaps initial insight into the question of 
adequate magnitudes of R* can be developed 
by an examination of Table 1, which reflects, 
the magnitudes of R® statistics evident in 
contingency tables fitted in the literature using 
conventional criteria. Note that the R? values 
for models with statistically insignificant chi- 

square statistics range from .796 to .9996. 
E 


Additional Alternatives to Chi-Square 


Anumber of alternatives to chi-square, other 
than К°, have been proposed for assessing fit of 
contingency table models. One of these is 
ХИМ, where N denotes the total number of 
counts in the contingency table. This statistic 
enables the researcher to compare the lack of 
fit of various models, either from the same or, 
different tables, The statistic R? can also be 
used to do this, and in addition, it measures 
how much improvement the current model 
offers over Ho. 

A second alternative measure of fit is the 
correlation between the actual type of admis- 
sion for a subject and the predicted prob- ™ 
ability of formal admission. The models for 
Pine can be used to predict probabilities of 
formal admission. The predicted probability. - 
9f formal admission for subjects in cell (7, J; 
k, £) under the model being fitted is 


exp($;,)/[1 + exp (;;42)] 
= Ему (ЕР. на + Рака). (7) 


However, the correlation between a dichoto- 
mous variable and a predicted probability may 


— 


r) 
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well be small under reasonable models, as 


4 demonstrated by Morrison (1972), although 


Goldberger (1973) has indicated that such a 

correlation can reach the bound of +1.0. 

Another potential measure of fit reflects the 
fact that R? = 1 does not indicate that the 
model predicts perfectly whether an individual 
has a formal or informal admission; what is 
perfectly predicted is the proportion of formal 
admissions for individuals in cell (i, j, k, €) for 
all such cells. Hence, another possible measure 
of fit is the proportion of subjects correctly 
classified as formal or informal admissions. 

The measure of fit that is appropriate de- 
pends on the objectives of the research study. 
However, the consequences of using the differ- 
ent measures of fit and which measure is best 
in what circumstances appear to be open 
questions. 


Generalized Guided Method for Locating 
Models for Large Contingency Tables 


The saturated logit model for the 4 X 2 X 
15 X 7 X 2 contingency table that is used as 
"ап example in this article illustrates one of the 
problems encountered with large contingency 
tables: the number of linearly independent 
parameters in the model is 840, far too many 
to interpret easily. Since the saturated logit 
model will always fit the data perfectly, 
whether a satisfactory fit can be obtained using 
a simpler model is a matter of importance. 
With a large contingency table, it is important 
“to have a computationally feasible plan for 
searching among the multitude of possible 
models. With only four independent variables, 
there are 166 possible models. Several al- 
gorithms have been proposed for finding an 
adequate model (Bishop et al., 1975, Fienberg, 
1970; Goodman, 1973; Shaffer, 1973). 

The first step of Goodman’s guided method 
is to compute the standardized effect esti- 
mates, which are used to construct a series of 
"nested models. A major problem with this 
method is that the magnitude of an effect can- 
not be measured by a single effect estimate 
unless all variables in the effect are dichoto- 
mous. For example, there are 420 parame- 
ters in the Legal Code X Age X Region 
X Vear effect. This makes it virtually im- 
possible to use Goodman’s procedure to de- 


+ 
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termine the order of entry of the effects. One 
reason for developing the method described 
below is to provide a procedure for summariz- 
ing the relative importance of the various 
effects in an interpretable way. 

Shaffer (1973) suggests an approach for the 
determination of which parameters are im- 
portant, but her approach is not feasible when 
large tables are analyzed. Fienberg's (1970) 
procedure is based on a posited series of nested 
hierarchical models but requires that the 
contingency table be well enough understood 
to construct the series of models before ex- 
amining the data. 

Bishop et al. (1975, section 4.5) discuss 
several search algorithms and emphasize that 
the strategy used should be sensitive to the 
specific research problem. The algorithm pro- 
posed here uses some of their recommendations, 
deletes others that in our experience have not 
been practical in consulting problems that in- 
volve contingency tables, and includes addi- 
tional modifications we have found useful. 

Higgins and Koch (1977) describe another 
approach to the analysis of large contingency 
tables that uses Pearson chi-square statistics, 
Mantel-Haenszel statistics, and the model- 
fitting procedure developed by  Grizzle, 
Starmer, and Koch (1969). Benedetti and 
Brown (1978) also discuss several search 
algorithms. They stress the importance of 
examining each effect in the model, as is done 
in the algorithm described below, which is 
similar to the algorithm they propose. 

Our search for a solution to the problems 
noted in previous methods has led us to de- 
velop the Generalized Guided Method (GGM), 
patterned after a method recommended for 
locating regression models when the number of 
independent variables is large. In these situa- 
tions, some authors (e.g, Daniel & Wood, 
1971) have recommended fitting a regression 
model by using all available variables and 
computing the partial F statistics for all 
variables in the model. For a given variable, 
the numerator of this statistic is the sum of 
squares explained by that variable after all 
other variables have been entered into the 
model. This is almost always a conservative 
measure of the explanatory power of this 
variable, since generally, if this variable were 
not the last to enter, it would explain more 
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order 
10 
12 


(8) 
2.54 
1.01 


402.91 
296.22 
314.45 
296.22 


df 
294 
252 
270 
252 


B parameter 
under con- 
sideration 

ASR 
ASY 


Fitted marginals 
(ASRY), (ASRL), (ARYL), (SRYL), (ASYL) 


(ASRY), (ASYL), (ARYL), (SRYL), (ASRL) 
(ASRY), (ASRL), (ASYL), (SRYL) 


(ASRY), (ASYL), (ARYL), (SRYL) 
(ASRY), (ASRL), (ARYL), (SRYL) 


(Continued) 
in logit model 


Abbreviated list of 8 
parameters included 
ASY, ARY, SRY 
ASY, ARY, SRY, ASR 
ASR, ARY, SRY 
ASR, ARY, SRY, ASY 


Table 2 
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variation. In searching for the final model in 


= z these regression situations, attention is focused 
on those variables with the largest partial F 
statistics. 
ЗА In the first step of the СОМ, the researcher 
E 


1.60 


computes a conservative measure of the ex- 
planatory power of each effect being considered 
for inclusion in the model. This measure is 
basedon the reduction in the chi-square statistic 
due to adding that effect to the model after as 
many effects as possible (preserving the hierar- 
chical nature of the model) have been included. 
Computation of this measure requires fitting 
a pair of models, denoted H* and H**, for each 
effect. The first model, H*, excludes only the 
effect of interest, denoted 6*, and its higher 


698.25 
296.22 
368.38 
296.22 


$5 58 order relatives; it includes all other effects. 
Model H** includes the effects in H* and the 
effect of interest. Thus X* (H*) — X*(H**) can 
be used to test the hypothesis that the effect 
rai 2 of interest is equal to zero, when as many 
< 2 effects as possible are entered into the model 


before it (cf. Goodman, 1972a, pp. 1051-1052). 

For example, for the effect 825, Models H* 
and H** are, respectively, 

Siu 8--8?-2-85--B2--BT + Bie HBR 
+O HPR Bie Bist Heke» 

and 
Pie — 84-83--85--Bg 3-81 HES HB 

+BAY+ BSE +B Hae FEE. (8) 
Thus this measure of the explanatory power 
of BAS is Xe (H*)— xX? (H**) = 1,014.37 — 420.88 
= 583.49. 

The difference X*(H*) — X*(H**) gives a 
misleading ranking of effects, since the 
numbers of linearly independent parameters 
associated with the effects are unequal. To 
adjust for this, the statistic C(8*) is used in 
the first step of the GGM to order the effects 
according to their explanatory power per 
linearly independent parameter, where 


C(8*) = peqr*) — х(Н**)]/ 
[dí(H*) — ан“). (9) 


For the effect 648, we obtain С(828) = 
(1,014.37 — 420.88)/(315 — 312) — 194.50. 
Thus if effect BAS is added to the Model H*, 
the lack of fit is reduced by 194.50 per 
linearly independent parameter added. The 
addition of the effect 828 to the model actually 


(ASRY), (ASRL), (ASYL), (SRYL), (ARYL) 


(ASRY), (ASRL), (ASYL), (ARYL) 
(ASRY), (ASRL), (ASYL), (ARYL), (SRYL) 


ASR, ASY, SRY, ARY 
ASR. ASY, ARY, SRY 


ASR, ASY, SRY 
ASR, ASY, ARY 


Table 3 
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involves including eight more parameters 
three of which are linearly independent. 

The results obtained in the first step of tffe 
ССМ for our example are presented in Tabl 
2. This step involves fitting 25 separate 
models. 

In the second step of the ССМ, a series o 
hierarchical models is constructed, startin 
with Но and adding effects опе at a time in 
the order of the C(8) statistics. The order ol 
entry of the effects into the series of models i 
indicated in the last column of Table 2. Бой 
our example, this step produced the series of 
12 unsaturated models listed in Table 3. 

The third step in the algorithm is to fit 
12 models in the series. The results are pre 
sented in Table 3. At this point in the process 
the researcher may use several strategies 
select a reasonable model. We suggest tha 
the researcher consider the following criterii 
(cf. Draper & Smith, 1966, pp. 165, 458): (a) 
The А? statistics in the series of nested models 
might be examined for “breaks,” or points a 
which further increases in R? require а large 
number of additional parameters, and/or 
points at which the R* values level off. Using 
this criterion, either Model H or Model Hy 
might be chosen from Table 3. (b) The r 
searcher might decide what proportion of thé 
total variation in the logits in the table must 
be explained for a model to be deemed ad 
equate. Let Rj denote this proportion. Then 
the simplest model H, in the nested series 
in which А? > R$ is identified. Model Hu and 
all other models in which А > R$. will 
said to Бе“ Ка, adequate.” 

А goal of R? = .80 has been chosen here. 
An examination of Table 3 indicates that) 
Н, = Не; that is, Н, is the simplest model in 
the nested series that is .80 adequate. We note] 
that there are other models for this table that 
are .80 adequate, including H;, Hs, ..., Неј 
Unfortunately, identifying all .S0-adequate] 
models requires fitting many models to this 
contingency table. We recommend this step 
when it is computationally feasible. Howevetr 
when the number of dimensions is large, 
Procedure is expensive; it is for these situa 
tions that the GGM is useful. We fitted all 
Possible logit models to establish a stand 
for evaluating the ССМ. This step located 40 
models with R? > .80. j; 


R: 
00 
09 
21 
34 
60 
75 
81 
83 
84 
86 
93 
93 
95 


d 
839 
836 
832 
826 
812 
728 
686 
668 
654 
612 
360 
336 
252 


B parameter 
added to 
previous 

logit model 


(LRY), (LAR), (LAY), (LSR) 
, (LASR) 


(LAY) 


(LRY), (LAR), (LAY) 
(LASR), (LARY) 


(LRY), (LAR) 


(LY) 
(LY), (LR) 
(LRY) 


Fitted marginals 


(LASR), (LARY), (LASY) 
(LASR), (LARY), (LASY), (LSRY) 


(LAS) 
(LAS) 
(LAS) 
(LAS) 
(LAS) 
(LAS) 
(LRY), 


(L) 
(LA) 


(ASRY) 
(ASRY), 
(ASRY) 
(ASRY) 
(ASRY) 
(ASRY) 
(ASRY) 
(ASRY) 


RY, AR, AY, SR 


RY, AY, ASR 


in logit model 
ASR, ARY 


parameters included 
AR 
RY, AR, AY 


Abbreviated list of 8 


ASR, ARY, ASY 
ASR, ARY, ASY, SRY 


A 
AS 
AS, 
AS, 
AS 
AS, 
AS, 
AS, 


X! and R? Statistics for a Nested Series of Models for the Five-Way Table 
Model 


Note. This series is derived from results in Table 2. 
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The fourth and final step of the GGM is 
to identify a minimal Rz-adequate model. 
odel Н will be termed minimal R} adequate 
(a) if КУН) > R} and (b) if for any Model 
H' that can be constructed by deleting one or 
more effects from H, preserving the hierarchical 
nature of the model, R®(H’) < Ка. This 
criterion reduces the 40 models that are .80 
adequate to the following three minimal .80- 
adequate models: (a) (ASRY), (ARL), (RYL), 
(AYL), (SYL); (b) (ASRY), (ARYL), (SD); 
· (с) (ASRY), (ASL), (RYL), (ARL) (Model 
Hy). 

For illustration, the calculations are pre- 
sented in Table 4 for Model Н, only, the model 
ound through earlier steps of the ССМ. 
These calculations indicate that the removal 
of any effect from Model Не yields R? < .80; 
hence, Model Н, is minimal .80 adequate. 

In summary, we note that the GGM 
located one of the three minimal .80-adequate 

, models for this table. The selected model is 


Model He: 95,47 8--82--85-- BE +87 
+BAS-+BAR-+ Bry. (10) 


‘At this point, the researcher knows that no 
three-way interactions are necessary to ex- 
plain at least 80% of the variation in the logits ; 
with reasonable accuracy the odds ratio of the 
number of informal to the number of formal 
admissions for any cell in the Age X Sex X 
. Region X Year four-way table of logits can 
^ be predicted using only the parameters in- 
dicated in Model Hy. Thus instead of having to 
eontend with one four-dimensional table with 
840 parameters, the researcher knows that 
three two-way tables, with 8, 60, and 105 
parameters, summarize the most important 
interactions in the full four-way table. (For 
additional comments on interpretation, see 
Davis, 1974; and Goodman, 1972a, 1972b.) 


Table 4 
Results of Deleting 


* 
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Evaluating the GGM 


It is reasonable to judge a model selection 
algorithm by checking to see whether it finds 
a good model. This necessitates defining a 
good model. Two possible criteria come to 
mind here, one relating to К° and the other to 
chi-square. The best .80-adequate model could 
be defined as that model with the minimum 
number of parameters that yields R? > .80. Al- 
ternatively, the best model may be the one that 
yields an insignificant chi-square with the mini- 
mum number of parameters in the model. We 
have examined the performance of the GGM, 
relative to these two criteria, on the five-way 
table used throughout this article and on the 
2X 2X 2X 2 table from Goodman (1972b). 

For the five-way table, the GGM selected 
the best R? — .80 model, and for the 2 X 2 X 
2 X 2 table, it selected the best R? — .99 and 
R? = .999 models. Also, for the 2 X 2 X 2X 2 
table, the GGM selected the best model that 
used the chi-square criterion and significance 
level of .01. The calculations required to apply 
the GGM to the 2 X 2 X 2 X 2 table are pre- 
sented in Tables 5 and 6. Comparing the 
analysis using the GGM with Goodman’s 
analysis, we note that depending on the 
value of R2, the ССМ will select either the 
model chosen by Goodman or a more parsimon- 
ious model that explains almost as much 
variation. 

We emphasize that these investigations of 
the GGM are empirical investigations, and the 
results are based on the analyses of only two 
contingency tables. However, on the basis of 
these analyses, the GGM does appear to 
perform well. We emphasize that there will 
generally be more than one minimal adequate 
model for a table. Additional insight into the 
structure of the table may be gained by locat- 


Single 8 Parameters From Model Hs 


2 


parameter Resulting ! df s 

removed logit model df x increase* increase* R 
RY AS, AR, Y 770 1,938.51 84 845.48 .67 
AS AR, RY, S 689 1,680.93 3 587.90 n 
AR AS, RY 728 1,479.77 42 386.74 715 


* Differences are measured with re: 


n. 


spect to corresponding statistics of Model He. 


1198 


Table 5 
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Statistics for Pairs of Models to Select Entry Order of Effects in Models for the Data Analysed in 


Goodman (1972b) 


a ———————Д———————-—- 


Abbreviated 
list of 8. 
parameters 8 parameter S 
included in under con- А h е y 
logit model Fitted marginals sideration df x C(8) * order 
9 
COR), (POR) 4 690.8 OS. 
OR, c (COR). (POR), (PC) e 3 24.92 5 
5 
(COR), (PRC) 4 22853 ER i 
RC, 0 (COR), (PRO), (PO) 3 3 17.37 
(COR), (POC) 4 152.65 151.20 3 
oc, R (COR), (POC), (PR) R 3 145 
OR, RC (COR), (POR), (PRC) 2 17.29 1563 4 
OR, RC,CO (СОК), (POR), (РАС), (PCO) CO | 167 $ 
OR, OC (COR), (POR), (POC) 2 .68 m ^ 
OR,OC,CR (СОК), (POR), (POC), (PCR) CR 1 ‘67 
OC, RC (COR), (POC), (PRC) RM 22 1.32 UM . 
OC,RC,OR (СОК), (POC), (PRC), (POR) 1 67 


ing all minimal adequate models and compar- 
ing the alternative explanations offered by 
these models, 


Discussion 


The GGM is a flexible model selection pro- 
cedure; it is not intended to be an automatic 
procedure into which the experimenter inserts 
a contingency table and an R2, value and from 
which a model is received. Rather, it is pro- 


posed as a reasonable procedure for searching 
among the multitude of possible hierarchical 
models for appropriate models that satisfy 
specified criteria. If the researcher has little 
or no prior information, the procedure will 
help to identify reasonable models. 

1, however, the researcher has additional 
information on the contingency table, the 
GGM can be modified to incorporate that in- 
formation. For example, if it is known from 
Previous studies that certain effects almost 


Table 6 
ж and R? Statistics for a Nested Series of Models for the Table Analyzed in Goodman (19725) 
Abbreviated 
list of 8 8 parameter 
Parameters added to 
included in previous 
Model ^ logit model Fitted marginals logit model df x b 
Hu cran (COR), (P) 3,111.47 
Нин. Jo (COR), (PO) о : '801.60 74M 
His ос (COR), (PO), (PC) G 5 186.36 9401 
Hs ОСЕ (COR), (PO), (PC), (PR) R 4 2496 -9920 
Hw Сок (COR), (PCO), (PR) CORN а 145 .9995 
На COOR (COR), (PCO), (POR) OR 2 68 9998 
Не ^ COOR,CR (COR) (PCO), (POR), (PCR) CR 2 i67 09998 


Note. This series of models is derived from results in Table 5, 


| surely must be included in any model that is to 
n fit the contingency table, then the model that 
| \includes those effects could be used as the base 
model for beginning the search, rather than 
Model Но. 

Another way that the GGM can be modified 
to reflect substantive knowledge that the re- 
searcher may have is that the choice among 
minimal adequate models need not necessarily 
center on the model with the fewest param- 
eters. Given substantive considerations, it 
may be that one of the other minimal adequate 
models is far more interpretable or useful. 

If the researcher prefers to use chi-square- 
based statistics rather than the Ке statistics, the 

GM can be easily modified to accomodate 
this. For example, the effects can be ordered 
by x2(H*) — X*(H**) rather than by C(8). 

The researcher may wish to restrict atten- 
tion to models with no effect of order higher 
than some selected value, for example, three. 
P... If this is the case, then Step 1 in the GGM is 
easily modified to include an examination of 
only those effects that are candidates for 
inclusion in the model; no fourth or higher 
“order effects would be examined in Step L 

The GGM may also be based on an ordering 
of the effects by their contribution when each 
is entered as soon as possible into the model. 
In Step 1, instead of comparing H* and H** 
as defined earlier, the researcher would com- 
pt. pare the Model H’ that contains only the 

effect of interest and its lower order relatives, 
with the Model H" containing all effects in H' 
“except the effect of interest. For example, to 
obtain the contribution of AS, the Model H’ 
[(ASRY), (AL), (SL), (ASL) ] would be com- 
pared with Н” [(ASRY), (AL), (SL)]. We 
performed this procedure for the five-way 
table from Miller et al. (1974), which was used 
as an example in earlier sections, and for the 
р table obtained from Goodman (1972b), which 
was analyzed in earlier sections. For the five- 
way table, the results were identical to those 
of the procedure in Step 1 of the GGM, which 
ordered the effects by their contribution when 
entered last into the model, except that the 
age and Age X Sex effects, which were close 
in value, were reversed. The two procedures 
yielded the same final model. In the analysis 
of the Goodman data, the two procedures 


7 yielded the same order of entry and the same 
| 4 
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final model. Thus the impact of a variable's 
order of entry on the ranking of its explanatory 
power appears to be much smaller here than 
is the case in multiple regression. 

Only the class of nonhierarchical models is 
excluded from consideration by the GGM. 
Any hierarchical combination of main and 
higher order effects could be selected. Step 1 
in the algorithm assures that all higher order 
effects will be examined. Thus important higher 
order interactions would not be missed, as they 
might be if we fit only models of uniform 
order, that is, models with all two-factor 
interactions, then models with all three-factor 
interactions, and so forth continuing until the 
lack of fit of a model of uniform order is 
insignificant. 

The analyses described here are in the spirit 
of exploratory, rather than confirmatory, sta- 
tistical analyses (Tukey, 1977); many tests 
are being performed on the same set of data 
after the fact. Hence, the significance prob- 
abilities obtained should be viewed as guide- 
lines rather than as the results of formal tests. 

In concluding, we advise, as do Bishop et al. 
(1975), that before a final decision on a model 
is made, the fit of the model should be ex- 
amined cell by cell to check for outliers or 
any unusual patterns in the residuals. 


Reference Note 
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Based on a review of the literature, empirical research in the area of superior- 
subordinate communication is classified into nine topical categories and crit- 
ically examined. Inspection of this literature suggests that researchers have 
focused the majority of their attention on studying (a) the effects of power 
and status on superior-subordinate communication, (b) trust as a moderator 
of superior-subordinate communication, and (c) semantic-information distance 
as a source of misunderstanding in superior-subordinate communication. It is 
concluded that future research should increasingly be developmental and longi- 
tudinal in nature and should take into greater consideration the effects situa- 
N tional variables have on communication in the superior-subordinate dyad. 


Status hierarchy is inherent in the nature 
of purposeful organizations. As Redding 
(1972) observes, within organizations “there 
are ‘superiors’ and  'subordinates—even 

( though these terms may not be expressly used, 
and even though there may exist fluid ar- 
rangements whereby superior and subordi- 
nates roles may be reversible" (p. 18). How 
‘superiors and subordinates interact and com- 
municate to achieve both personal and or- 
ganizational goals has been an object of in- 
vestigation by social scientists for most of 
the 20th century. Empirical research examin- 
ing superior-subordinate communication is 


~ diverse, is strewn across a multitude of disci- 


plines, lacks coherent organization. and classi- 
fication, and in general, has not received 
Sufficient review and interpretation as a body 
of literature. The present article attempts to 
alleviate this confusion by reviewing, classify- 
ing, interpreting, and providing directions for 
future research in the area of organizational 
communication that is loosely termed su- 
perior-subordinate communication. 

This article focuses on empirical research 
solely in the domain of organizational com- 
munication. To avoid generalizations from 
communication research outside of the orga- 


Requests for reprints shoud be sent to Fredric M. 
Jablin, Department of Speech Communication, Uni- 
versity of Texas at Austin, Austin, Texas 78712, 


nizational environment, I do not review 
investigations exploring small group and inter- 
personal communication extraneous of pur- 
poseful organizations (with occasional excep- 
tion). Since organizational communication is 
different, in a variety of ways, from communi- 
cation in other settings (e.g., Redding, 1972, 
Rogers & Rogers, 1976) and given the diffi- 
culty of generalizing from social science re- 
search, regardless of area, limiting the setting 
(or scope) certainly adds to the validity of 
any knowledge claims. For example, it is diffi- 
cult to generalize from small group communi- 
cation research, which is external to organiza- 
tional environments, to group communication 
within organizations, in which groups of 
groups are tied together in networks of net- 
works. Hence, this review focuses on studies 
conducted within organizations or simulations 
of organizations. 

In addition, this collection and critique of 
superior-subordinate communication research 
has excluded studies related to interviewing, 
despite the fact that they may have involved 
superior-subordinate interaction. (See Daly, 
Note 1, for a complete review of this litera- 
ture.) Moreover, the nucleus of this review 
is the examination of interpersonal dyadic 
interactions between superiors and subordi- 
nates. Specifically, an attempt was made to 
avoid examination of research concerned with 
the use of impersonal, media-related (e.g., 
house organ, bulletin board, suggestion box) 


Copyright 1979 by the American Psychological Association, Inc. 0033-2909/79/8606-1201$00.75 


iZ 


1201 


1202 


superior-subordinate communication. How- 
ever, both written and oral face-to-face com- 
munication transactions, when of an inter- 
personal dyadic nature, were reviewed. 

The article is organized into three sections; 
the first presents a basic definition of su- 
perior-subordinate communication. The second 
section reviews and organizes empirical re- 
search related to superior-subordinate com- 
munication into nine topical categories. The 
final section provides a discussion of the re- 
view and directions for future research. 


Superior-Subordinate Communication Defined 


'The expressions superior and subordinate 
are derived from Latin roots, which when 
joined suggest that within an interpersonal 
relationship one individual is of subrank or is 
Situated below another. In purposeful or- 
ganizations, both formal and informal su- 
perior-subordinate relations usually exist. 
Moreover, most research evidence indicates 
that informal (i.e., not prescribed by organi- 
zational directives) superior-subordinate af- 
filiations may be as important as formal veri- 
table relations in determining communicative 
behavior. However, for the purposes of the 
present review, the definition of superior- 
subordinate communication is limited to those 
exchanges of information and influence be- 
tween organizational members, at least one of 
whom has formal (as defined by official or- 
ganizational sources) authority to direct and 
evaluate the activities of other organizational 
members. 

Katz and Kahn (1966) provide probably 
the most parsimonious yet complete descrip- 
tion of the types of communication that are 
typically exchanged in superior-subordinate 
interactions. These theorists suggest that 
downward communications from superior to 
subordinate are of five basic types: (a) job 
instructions, (b) job rationale, (c) organiza- 
tional procedures and practices, (d) feedback 
about subordinate performance, and (e) in- 
doctrination of goals (pp. 239-241). On the 
other hand, communication upward from sub- 
ordinate to superior is reported to take four 
primary forms: (a) information about the 
subordinate himself/herself. (b) information 
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about co-workers and their problems, (c) in- 
formation about organizational practices and 
policies, and (d) information about what 
needs to be done and how it can be done (p. 
245). More specific and detailed taxonomies 
of messages exchanged in superior-subordinate 
communication are available in the literature 
(e.g, Eilon, 1968; Melcher & Beller, 1967; 
Yoder, 1970). 


Review of Literature 


The empirical literature on superior-sub- 
ordinate communication has been divided into 
nine topical саќерогіеѕ.! Each of these cate- 
gories represents a series of investigations 
that appear to be researching similar con- 
structs from analogous theoretical founda- 
tions? Many of the studies reviewed share 
more than one category but were classified 
into conceptually distinguishable groupings for 
purposes of clarification and parsimony. 


Interaction Patterns and Related Attitudes 


Researchers have investigated a variety of 
issues related to interaction patterns between 
superiors and subordinates. For example, nu- 
Merous studies report that between one third 
and two thirds of a supervisor’s time is spent 
in communicating with subordinates and that 
face-to-face discussion is the dominant mode 
of interaction (e.g., Berkowitz & Веппіѕ, 
1961; Brenner & Sigband, 1973; Dubin & 
Spray, 1964; Hinrichs, 1964; Kelly, 1964; 


1 The reader will note that this review contains no 
single category of research related to downward 
communication in superior-subordinate interaction. 
Since Redding (1972, see especially pp. 388-404) 


Provides an extensive review of this literature prior ~ 


to 1970 and given that less research has been put- 
Sued in this area subsequent to 1970, the present 
review has not directly focused attention on this 
area. Rather, research related to downward com- 
munication is discussed within the confines of the 
other categories, 

* The reader will also note that this review doe 
not consider research that could be classified 45 
relating to participative decision making. Since €*- 
haustive reviews of this literature already exist (€8» 
Redding, 1972, pp. 154-250; Vroom, 1970, pp. 227 
240; Vroom, 1976, pp. 1538-1546) further critique 
would be redundant. 
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Lawler, Porter, & Tenenbaum, 1968; Penfield, 
,.1974). Moreover, results from a number of 
investigations indicate that the majority of 
Superior-subordinate interaction concerns task 
issues (e.g Baird, 1974; Richetto, 1969; 
Zima, 1969; Walton, Note 2) and that su- 
periors and subordinates talk more about 
impersonal (focus of topics external to self) 
than about personal (directly related to self) 
topics (Baird, 1974), Further, research sug- 
gests that superiors are more likely to initiate 
interactions with subordinates than the other 
way around (e.g., Berkowitz & Bennis, 1961; 
Dubin & Spray, 1964). Yet, it is of interest to 
observe that superiors are less positive toward 
"and less satisfied with interactions with their 
subordinates than they are with contacts with 
their bosses (e.g., Clement, 1974; Lawler et 
al., 1968; Tenenbaum, 1971). This finding is 
even more ironic when considered in light of 
Baird and Diebolt’s (1976) discovery that a 
subordinate’s job satisfaction is positively cor- 


í А H H А 
related with estimates of communication con- 


tact with superiors. 

Several other studies present findings rele- 
iant to interaction patterns between superiors 
and subordinates. The results of these investi- 
gations suggest the following conclusions: (a) 
Superiors perceive that they communicate 
more with subordinates than subordinates per- 
ceive, whereas subordinates feel they send 
more messages to their superiors than the 
latter perceive (Webber, 1970) ; (b) superiors 
who lack self-confidence in their leadership 
abilities are less willing to hold face-to-face 
discussions with their subordinates than are 
superiors who are confident in their leader- 
ship abilities (Kipnis & Lane, 1962); (c) 
role conflict and role ambiguity are strongly 
correlated with “leader behavior indicative of 
direct as opposed to indirect interactions with 
subordinates” (Rizzo, House, & Lirtzmann, 
1970, p. 162); (d) when a subordinate needs 
informal help in the work setting, he/she is 
more likely to seek assistance from his/her 
superior than peers or subordinates (as re- 
ported by helpees) (Burke, Weir, & Duncan, 
1976); and (e) supervisors are more likely to 
serve as "production" communication liaisons 
than as “maintenance” or “innovation” liai- 
sons (MacDonald, 1976). 
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In summary, studies that explored interac- 
tion patterns between superiors and subordi- 
nates suggest frequent task-oriented commu- 
nication within the dyad but differential atti- 
tudes and perceptions of those interactions. 
Moreover, personal characteristics and needs 
of the interactants seem to mediate their de- 
sire for and perceptions of superior-subordi- 
nate communication. 


Openness in Communication 


Two basic dimensions of openness in su- 
perior-subordinate communication can be 
distinguished: openness in message sending 
and openness in message receiving. Redding 
(1972) describes openness in message sending 
as the “candid disclosure of feelings, or ‘bad 
news,’ and important company facts” (p. 
330), whereas openness in message receiving 
involves “encouraging, or at least permitting, 
the frank expression of views divergent from 
one’s own; the willingness to listen to ‘bad 
news’ or discomforting information” (p. 330). 
Baird (1974) adds that it is essential that re- 
searchers clearly specify whether they are 
referring to task-relevant openness or non- 
task-relevant openness when investigating each 
of the above dimensions. 

Much of the impetus for studying openness 
in superior-subordinate communication has 
been provided by management theorists who 
have suggested that openness is an essential 
element for an effective organizational climate . 
(eg., Haney, 1967; Likert, 1967). Support 
for this proposition is furnished in studies by 
Burke and Wilcox (1969), Baird (1974), and 
Jablin (1978a), who have found that em- 
ployees are more satisfied with their jobs 
when openness of communication exists be- 
tween subordinate and superior. Furthermore, 
several inquiries report that openness of com- 
munication is directly correlated with organi- 
zational performance (e.g., Indik, Georgopou- 
los, & Seashore, 1961; Willits, 1967). How- 
ever, it should be noted that the results of 
one investigation suggest that managerial 
effectiveness is unrelated to openness of com- 
munication between superior and subordinate 
(Rubin & Goldman, 1968). 

A series of doctoral dissertations completed 
at Purdue University have attempted to ex- 
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plore in detail the communication character- 
istics of openness in superior-subordinate 
relationships. The first of these researches 
(Baird, 1974) examined subordinates’ “up- 
ward communication freedom" with superiors. 
Results of the study revealed that willingness 
of superiors and subordinates to talk as well 
as actual talk about a topic is a function of 
each interactant's perception of the other's 
willingness to listen, Extrapolating on Baird's 
study, Stull (1975) investigated superior and 
subordinate attitudes toward various types of 
supervisory responses to task-relevant and 
non-task-relevant open messages sent by sub- 
ordinates, Analyses disclosed that for task and 
nontask topics, subordinates and superiors 
preferred supervisory responses that were 
accepting (encouraging) ог reciprocating 
(“owning-up” to one’s feelings, ideas, etc.) 
rather than neutral-negative (unfeeling, cold, 
or nonaccepting). Finally Jablin (1978a, 
1978b), attempting to determine the types of 
communicative responses that characterize 
open and closed relationships between su- 
periors and subordinates, experimentally stud- 
ied the attitudes of subordinates toward five 
basic types of message responses occurring in 
a dyad: confirmation (a response that pro- 
vides a speaker with positive content and 
positive relational feedback), disagreement (a 
response that provides a speaker with nega- 
tive content feedback but positive relational 
feedback), accedence (a response that pro- 
vides a speaker with positive content feed- 
back but negative relational feedback), re- 
pudiation (a response that provides a speaker 
with both negative content and negative rela- 
tional feedback), and disconfirmation (a re- 
sponse that provides a speaker with irrelevant 
content and equally irrelevant relational feed- 
back). Results from the investigation indi- 
cated (a) that disconfirming responses are not 
acceptable in superior-subordinate communi- 
cation; (b) that subordinates prefer message 
responses from superiors that provide positive 
relational feedback; (c) that regardless of 
perceived openness or closedness of the com- 
munication relationship with their superior, 
subordinates expected the same types of re- 
sponses from a superior but evaluated the 
appropriateness of these responses differently; 
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(d) that a substantial degree of reciprocity ex- 
ists for confirming messages, regardless of the 
openness or closedness of the superior-sub- 
ordinate relationship; and (e) that subordi- 
nates who perceive a closed relationship with 
their superior are prepared to respond to a 
superiors’ message, which contains negative 
relational feedback toward the subordinate, 
with a response transmitting negative rela- 
tional feedback toward the superior; however, 
this is not true for subordinates who perceive 
an open relationship with their superior. 

In summary, these studies suggest that in 
an open communication relationship between 
superior and subordinate, both parties per- 
ceive the other interactant as a willing and 
receptive listener and refrain from responses 
that might be perceived as providing negative 
relational or disconfirming feedback. More- 
over, these inquiries suggest that what dis- 
tinguishes an open from a closed superior 
subordinate relationship may not be the types 
of messages exchanged but how the inter- 
actants evaluate the appropriateness of these 
communications. Finally, these studies provide 
strong evidence for the proposition that em- 
ployees are more satisfied with their jobs when 
openness of communication exists between 
superior and subordinate than when the rela- 
tionship is closed. 


Upward Distortion 


Closely related to research examining open- 
ness in superior-subordinate relationships are. 
a group of investigations exploring message 
distortion in subordinate upward communica- 
tion to superiors. Mellinger (1956), who col- 
lected questionnaire data from 330 scientists 
in a medical laboratory, is generally credited 
with the initiation of this research tradition. 
(For a discussion of related research antece- 
dent to Mellinger's investigation, see Guetz- 
kow, 1965, pp. 553-555.) Results of this 
early inquiry into message distortion revealed 
that when Individual A does not trust Indi- 
vidual B, Individual A will conceal his/her 
feelings when communicating to B about a 
Particular issue. Moreover, concealment of 
Individual A’s true feelings was found to be 
often associated with evasive, compliant, or 
aggressive communicative behavior on his/her 
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part and with under- or overestimation of 
agreement on the issue by Individual B. 
vCohen's (1958) replication and clarification 
of Kellys (1951) investigation of upward 
communication in experimentally created hier- 
archies also inspired a tradition of research 
in the area of upward communication distor- 
tion. Results of this study suggested that 
within a hierarchy, if an individual has power 
over the advancement of persons of lower 
rank, those of lower rank will omit critical 
comments in their communication with the 
person of higher rank. Thus with these seminal 
studies, Mellinger (1956) and Cohen (1958) 

X initiated a sphere of research that examined 
the moderating effects of trust and mobility 
aspirations on upward communication distor- 
tion. 

Building on the previously described re- 
search, Read and his associates (Maier, Hoff- 
man, & Read, 1963; Read, 1962) explored 

! the relationships among upward mobility 
aspirations, trust, and the accuracy with 
which managers communicate information up- 

¿ward in organizational hierarchies. Data 
analyses supported the earlier findings of Mel- 
linger and Cohen, which indicated that mo- 
bility aspirations (i.e., desire for advancement 
and status seeking proclivity) and low trust 
in one’s superior are negatively related to 
accuracy of upward communication. Moreover, 
results suggested that even when a subordi- 
nate trusts his/her superior, high mobility 
aspirations *strongly militate against accurate 
communication of potentially threatening in- 
formation" (Read, 1962, p. 13). In addition, 
it was discovered that “subordinates feel less 
free to communicate with superiors who previ- 
ously have held their position than with 
those who have not" (Maier et al., 1963, p. 
9). 

More recently, research by Roberts and 
O'Reilly (1974) and O'Reilly and Roberts 
*(1974) has supported the notion that a sub- 
ordinate's trust in his/her superior is a facili- 
tator of distortion-free upward communication. 
However, these researchers did not find strong 
correlations between subject's mobility aspira- 
tions and propensity towards upward com- 
munication distortion. Finally Sussman 


= (1974), investigating upward communication 
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distortion from the perspective of the su- 
perior (ie., the recipient of distorted mes- 
sages), failed to find that superiors perceive 
greater accuracy in messages from subordi- 
nates who are perceived as trusting the 
superior than in messages from subordinates 
perceived as nontrusting. 

In contrast to these studies, several re- 
searchers have investigated the origins and 
concomitants of upward communication distor- 
tion from slightly different perspectives. 
Athanassiades (1973, 1974) argues that 
ascendency and security needs, risk-taking 
propensity, and organizational climate, when 
perceived as instrumental to a subordinate’s 
goals, will- produce upward communication 
distortion. Results of his research indicate 
that for both male and female subordinates, 
upward distortion is need motivated, with 
distortion being positively related to achieve- 
ment needs and negatively related to level of 
security. Furthermore, Athanassiades’s find- 
ings suggest that distortion of upward com- 
munication is negatively related to an autono- 
mous organizational climate and positively 
related to a heteronomous climate. It is of 
interest to observe that his findings (1974) 
also show that “women in managerial posi- 
tions feel more suppressed—less autonomous, 
less independent—than men do in similar po- 
sitions” (p. 208). 

A recent study by. Young (1978) supports 
Athanassiades’s finding that organizational 
climate is related to distortion of upward 
communication. Specifically, results suggest 
that in organic as compared with mechanistic 
organizational environments, subordinates per- 
ceive greater appropriateness, expect fewer 
harmful consequences, and evidence greater 
willingness to disclose important yet person- 
ally threatening information to superiors. 
However, his data also disclose that the up- 
ward communication behavior of female sub- 
ordinates follows more closely the behavior of 
subordinates in an organic work setting than 
does the upward communication of male sub- 
ordinates. 

Extrapolating on Athanassiades’s research, 
Level and Johnson (1978) have recently 
found that upward distortion is most likely 
to be associated with messages in which in- 
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formation about the following personality fac- 
tors is communicated: ascendency, responsi- 
bility, emotional stability, cautiousness, and 
original thinking. Their data also suggest that 
in certain areas subordinate tendencies to dis- 
tort upward communication can be reduced 
by increasing the superior's "consideration" 
leadership style, or increasing the accuracy 
with which the superior transmits downward 
information. In addition, Krivonos (1976) 
examined the role of motivation theory in 
upward communication distortion and found 
that superiors perceive that intrinsically 
motivated subordinates distort messages less 
than do extrinsically motivated subordinates, 

Research that investigates types of mes- 

sages that tend to be distorted in upward com- 
munication indicates that subordinates will be 
less reluctant to communicate information that 
is positive-favorable than negative-unfavora- 
ble (O'Reilly & Roberts, 1974; Rosen & Tes- 
ser, 1970) and that superiors view messages 
that are favorable to subordinates as less ac- 
curate than messages that are unfavorable to 
subordinates (Sussman, 1974). Moreover, 
O'Reilly and Roberts (1974) report that when 
information is both favorable and important, 
Subordinates do not hesitate to communicate 
it upward to their superiors. In addition, 
Housel and Davis (1977) have discovered 
that subordinate satisfaction with upward 
communication tends to vary as a function of 
the channel used; face-to-face channels are 
most satisfactory, followed by telephone and 
written channels, Finally, a recent study of 
Rosen and Adams (1974), which examined 
the severity of discipline administered to sub- 
ordinates who distorted upward communica- 
tion, reveals that "recommended disciplinary 
measures were relatively mild when the sub- 
ordinate's motives were altruistic and when 
his superior was dependent on him for exper- 
tise” (p. 382). 

In summary, research that explored upward 
distortion in superior-subordinate communica- 
tion has examined numerous variables that 
may moderate the occurrence of upward dis- 
tortion. These variables include trust, mobility 
aspirations, ascendency and security needs, 
organizational climate, sex differences, motiva. 
tion, message characteristics, and a variety of 
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upward communication channels. Although 
the effects of subordinate trust in superior, 


subordinate mobility aspirations/ascendency : 


needs, and the contingent role of organiza- 
tional climate on upward distortion seem best 
supported, evaluation of the research in total 
suggests that probably no one variable can 
sufficiently explain the phenomena and that 
additional multivariate research is required 
before we can place confidence in any one of 
these explanations. 


Upward Infiuence 


Influence processes are a central feature of 
superior-subordinate communication. And, as 
Walter (1966) notes, 


to study influence, one must first study communi- 
cation, for influence without communication is as 
wildly implausible as action at a distance, Influence 
is always accompanied by some form of communica- 
tion, blunt or subtle, overt or tacit: Advertising, 
lobbying, arguing a case before a jury or on a 
suitor’s knee, (p. 190) 


In studies of superior-subordinate communi- 
cation, researchers have focused attention on 
two basic dimensions of influence: (a) the 
effects a superior’s influence in the organiza- 
tional hierarchy has on his/her relationships 
with subordinates and (b) the transmission of 
influence by subordinates to superiors. Re- 
search in this latter category is diffused and 
varied, and since it is represented in other 
sections of this article (e.g., upward distor- 
tion, feedback), it is not directly discussed 
here, 

Due to its effects on superior-subordinate 
relations, the upward influence of a subordi- 
nate’s superior with his/her boss has received 
considerable attention in the research litera- 
ture. Probably best known is 
“Pelz effect.” 


supervisory 
the Detriot Edison Company, discovered that 


ployee-centered supervisors are associated with 
higher levels of employee satisfaction only 
when the supervisor apparently exercises in- 
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(Redding, 1972, p. 438). More recently 
Wager (1965), who explored leadership be- 
whaviors and influence in one organization, 
reported findings similar to those of Pelz but 
also observed that the magnitude of the mod- 
erating effect of influence varied positively 
with the organizational status of the respond- 
ent. At this point it is important to note that 
Pelz’s and Wager's influence measures were 
concerned only with supervisory influence with 
respect to personnel management of subordi- 


te nates and did not assess influence in such 
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areas as resource allocation, organizational 
changes, policy formation, or objective set- 
ting (Wager, 1965). Moreover, this typifies 
most of the research in this area. 

In recent years investigators have once 
again begun to explore the relationship be- 
tween superior's upward influence and com- 
munication with subordinates. For example, 
House, Filley, and Gujarati (1971) report 

{ that the interaction between superior’s hier- 
archical influence and consideration behavior 
with subordinates varies from company to 
company (ie. it is situational). Perhaps of 
even greater importance is their finding that 
when a superior is too high in upward influ- 
ence, dysfunctional consequences may emerge 
in relation to subordinate willingness to openly 
communicate with the superior. They argue 
that 


where supervisors are seen to have such high influ- 
ence, it is likely that there will be greater status 
Separation between them and their subordinates, and 
that such status differentiation will result in a 
restriction of upward information flow, less willing- 
ness on the part of subordinates to approach su- 
periors, and less satisfaction with the social climate 
of the work unit. (p. 429) 


In a related study, Roberts and O'Reilly 
(1974) report that subordinates who per- 
ceive their superior as having high upward 
influence also have a high desire for interac- 
Шоп with the superior, high trust in the 
superior, and a high estimation of accuracy of 
information received from the superior. Simi- 
larly, Jones, James, and Bruni (1975) have 
found that subordinate confidence and trust 
in a superior is positively related to the su- 
perior's success in interactions with higher 
levels of management. Finally, O'Reilly and 
Roberts (1974) and Roberts and O'Reilly 
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(1974), who examined the association between 
superior upward influence and subordinate 
upward communication distortion, have dis- 
covered only weak correlations between these 
variables. 

In summary, results from studies that ex- 
plored the relationship between superior's up- 
ward influence and communication with sub- 
ordinates suggest the following conclusion: 
Subordinates who perceive their superior as 
having substantial but not excessive upward 
influence with their bosses will be more satis- 
fied with their superior and will interact and 
trust him/her more than will subordinates who 
perceive their superior as low in upward in- 
fluence. However, since several of the previ- 
ously described investigations indicate that 
this conclusion may be situation bound and 
may be contingent on factors other than those 
already studied, the conclusion warrants only 
tentative acceptance. 


Semantic-Information Distance 


Originally coined by Tompkins (1962), the 
term semantic-information distance describes 
the gap in information and understanding 
that exists between superiors and subordi- 
nates (or other groups within an organiza- 
tion) on specified issues. This concept is 
analogous to the concept of “disparity” ad- 
vanced by Browne and Neitzel (1952), to 
Weaver's (1958) construct of “semantic bar- 
rier,” to “categorical and syndectic similarity” 
as proposed by Triandis (1959a, 1959b, 
1959c, 1960), to "semantic agreement" as 
discussed in Maier, Hoffman, Hooven, and 
Read (1961) and Maier et al. (1963), and to 
“congruence” as explored in research by 
Minter (1969). Studies that examined the 
nature and definitional qualities of semantic- 
information distance through 1970 are dis- 
cussed in detail in Redding's (1972) review of 
organizational communication literature. 

The basic conclusions that can be drawn 
from the early research on semantic-informa- 
tion distance can be briefly described as fol- 
lows: (a) The larger the semantic distance 
between superior and subordinate, the lower 
will be the subordinate's morale (Browne & 
Neitzel, 1952); (b) superiors tend to over- 
estimate the amount of knowledge subordi- 
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nate's possess on given topics (Odiorne, 
1954); (c) management personnel tend to 
describe themselves by traits that are dif- 
ferent from those that subordinates use to 
describe themselves (Porter, 1958); (d) 
managers and workers differ in the criteria 
that they use in making judgments about 
people (Triandis, 1959а, 1959b, 1959c, 
1960); (e) significant gaps in semantic dis- 
tance exist between union and management 
(Schwartz, Stark, & Schiffman, 1970; 
Weaver, 1958) and between union leadership 
and their members (Tompkins, 1962); (f) 
superiors and subordinates have difficulty 
agreeing on the basic job duties and demands 
facing subordinates (Maier et al., 1961; 
Rosen, 1961); (g) whether a superior has 
previously held his/her subordinate’s job has 
little effect on reducing the semantic-informa- 
tion distance between them (Maier et al., 
1963); (h) superior’s perceptions of the atti- 
tudes of subordinates toward him/her is often 
unrelated to their actual attitudes (Bowers, 
1963; White, 1976); (i) serious semantic 
differences between superior and subordinate 
are frequent (e.g., Minter, 1969, reports that 
they occur over 60% of the time); and (j) 
there is some evidence that indicates that su- 
periors “find it easier to communicate with 
subordinate managers whose attitudes are 
similar [rather than dissimilar] to their own” 
(Miles, 1964, p. 324). 

In general, most research related to se- 
mantic-information distance conducted since 
1970 is supportive of the aforementioned 
studies, Greene (1972) has found that the 
more accurately a subordinate complies with 
his/her superior’s expectations of subordinate 
behavior, the higher the subordinate’s job 
satisfaction and the better his/her perform- 
ance evaluation by the superior. Supportive of 
the results of Greene’s research is a study by 
Pfeffer and Salancik (1975) that suggests 
that the behavior of subordinates and su- 
periors is constrained by the expectations of 
other members of the role set. Several recent 
investigations contribute to the list of areas in 
which significant superior-subordinate se- 
mantic-information distance exists, Examining 
superior-subordinate dyads, Boyd and Jensen 
(1972) found that first-line managers and 
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their superiors experience difficulty in agreeing 


on the authority of the first-line manager, р, 
whereas Moore (1974) reports that а new 


manager's superior and his/her subordinates 
tend to disagree on how long it will take the 
manager to learn the new position. Assuming 
that empathic ability is negatively correlated 
to semantic-information distance, Northouse's 
(1977) study would strongly indicate that 
one means of reducing semantic distance is 
by increasing trust between superior and 


subordinate. Finally, a study by Baird and } 


Diebolt (1976) found no relationships be- 
tween superior-subordinate role congruence 
and several communication variables; how- 
ever, limitations within the study restrict the 
generalizability of its findings. 

In summary, results of empirical research 
in the area of superior-subordinate semantic- 
information distance probably provide some 
of the most consistent conclusions of any topic 
of study in organizational communication. In- 
cessantly, we find the existence of semantic- 
information distance in superior-subordinate 
relations, often at levels that would appear to 
seriously obstruct organizational effectiveness. 
The catalogue of topical areas in which se- 
mantic differences between superiors and sub- 
ordinates tend to occur is expanding and 
would strongly suggest that future research 
should pursue the development of valid and 
reliable techniques to reduce this semantic 
gap. 


Effective Versus Ineffective Superiors 


Interest in identifying the communicative 
behaviors of effective leaders probably has 
existed since the earliest days of civilization, 
when humankind became proficient at or- 


ganizing for battlefield warfare and thus re- .. 


quired an expendable supply of effective lead- 
ers. Hence, over the years the identification 
of effective as Compared to ineffective com- 
munication behaviors of superiors has re- 
ceived more investigation than any other area 
of organizational communication. 

From the period of 1950 to the mid-1960s, 
a series of doctoral dissertations completed at 
Purdue University attempted to determine 
the communication correlates of “good” su- 
Pervisors (Funk, 1956; Kelly, 1963; Minter, 
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1969; Miraglia, 1964; Pyron, 1964; Richetto, 
1969; Simons, 1962; Sincofí, 1970; Smith, 
\1968; Zima, 1969). For the majority of these 
studies, good supervision as compared to poor 
supervision was determined by higher man- 
agement evaluation of supervisors. Redding 
(1972, pp. 436-446) succinctly summarizes 
the results of these researchers and suggests 
the following general conclusions: 


1. The better supervisors tend to be more “com- 
munication-minded”; eg. they enjoy talking and 
speaking up in meetings; they are able to explain 
instructions and policies; they enjoy conversing with 
subordinates. (See especially Funk, 1956; Pyron, 


1964.) 


N 2. The better supervisors tend to be willing, em- 


pathic listeners; they respond understandingly to 
so-called “silly” questions from employees; they are 
approachable; they will listen to suggestions and 
complaints, with an attitude of fair consideration 
and willingness to take appropriate action, (See 
especially Funk, 1956; Simons, 1962; Kelly, 1963; 
Zima, 1969.) 

3. The better supervisors tend (with some notable 

exceptions) to "ask" or “persuade,” in preference to 
“telling” or "demanding." (See especially Simons, 
1962; Pyron, 1964.) 
4 4. The better supervisors tend to be sensitive to 
the feelings and ego-defense needs of their subordi- 
nates; eg. they are careful to reprimand in private 
rather than in public. (See, €g., Simons, 1962.) 

5. The better supervisors tend to be more open 
in their passing along of information; they are in 
favor of giving advance notice of impending 
changes, and of explaining the “reasons why” be- 
hind policies and regulations. (See especially Funk, 
1956; Simons, 1962.) (Redding, 1972, p. 443) 


Other research on superior-subordinate com- 
fnunication contemporary to that of the Pur- 
due group generally supports the thrust of 
the above conclusions (Brown, 1964; Jain, 
1971; Ponder, 1959; Sadler, 1970; Tacey, 
1959, Walker, Turner, & Guest, 1956). 

For example, Ponder (1959) reports that 
as compared to ineffective foremen 


more time with employees carrying out the 
job, providing general supervision, and han- 
dling personnel matters. Moreover, more recent 
research provides additional testimony to the 
validity of the claims of these investigators 
(Duffy, 1975; Heizer, 1972; Sank, 1974; 
White, 1972) or has attempted to further 
elucidate the communication behaviors asso- 
ciated with various managerial interaction 
styles (e.g., Bradley & Baird, 1977). 
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Despite the strong evidence that charac- 
terizes the communication profile of effective 
superiors, other research suggests that effective 
supervisory communication behaviors are sit- 
uational and contingent on a variety of fac- 
tors (e.g., Downs & Pickett, 1977). As Red- 
ding (1972) notes in his own review of the 
Purdue studies, ^The precise combination of 
behaviors or attitudes which ‘works’ in one 
company is likely to be different from what 
*works' in another company or organization" 
(p. 445). The importance of viewing effective 
as compared to ineffective superior communi- 
cation behavior from a contingency perspec- 
tive is demonstrated by developments in three 
areas of leadership research: (a) the tradi- 
tional study of leadership that employs the 
*consideration-initiating structure” framework 
(e.g., Fleishman & Harris, 1962), (b) Fied- 
ler's contingency approach to leadership (e.g. 
Fiedler, 1967), and (c) a more recent view of 
leadership that uses a dyadic linkage – role- 
making model (e.g., Dansereau, Graen, & 
Haga, 1975). 

As a result of extensive leadership research 
conducted at Ohio State University during 
the 1950s and early 1960s, two basic dimen- 
sions of leadership behavior were identified: 
(a) “consideration” and (b) “initiating struc- 
ture.” (See Stogdill, 1974, pp. 128-141, for 
a complete review of these studies.) Leader 
consideration was found to be typified by 
friendship and warmth, mutual trust, rapport 
and tolerance, and two-way communication 
between a leader and his/her work group 
(Fleishman, Harris, & Burtt, 1955). Initiat- 
ing structure includes “behaviors in which 
the supervisor organizes and redefines group 
activities and his relation to the group... - 
This dimension seems to emphasize overt 
attempts to achieve organizational goals" 
(Fleishman & Harris, 1962, p. 43). These 
two basic dimensions of leader behavior are 
analogous to those denoted as “employee 
orientation” and “production orientation” in 
the University of Michigan Institute for 
Social Research leadership studies (e.g., Katz, 
Maccoby, & Morse, 1950). The importance 
of the above investigations to the identifica- 
tion of the effective communication behaviors 
of supervisors rests on the similarity between 
the constructs of consideration and employee 
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orientation, and communication. For example, 
Miraglia (1964), who studied the parallels 
between consideration and communication 
ability, discovered that consideration “is 
largely a matter of communication behavior” 
(Redding, 1972, p. 148). This conclusion has 
been supported in research by Jain (1973) 
and more recently in a study by Dennis 
(1974). 

The general conclusion drawn from the 
early consideration and initiating structure 
research was that superiors are “rated as 
more effective when they score high in both 
consideration and leadership structure” (Stog- 
dill, 1974, p. 140). Perhaps of even greater 
significance for communication researchers is 
the general finding that leaders high in con- 
sideration (good communicators) can increase 
structure within their work groups and still be 
rated as effective leaders (e.g., Fleishman & 
Harris, 1962). However, current inquiries sug- 
gest that numerous situational variables im- 
pinge on the validity and reliability of 
consideration-initiating structure (C-IS) to 
predict leader effectiveness. Reviewing this 
literature through 1973, Kerr, Schriesheim, 
Murphy, and Stogdill (1974) identify the 
following as situational variables that mod- 
erate the C-IS ability to predict leader be- 
havior and performance: 


subordinate need for information, job level, subordi- 
nate expectations of leader behavior, perceived or- 
Eanizational independence, leader's similarity of 
attitudes and behavior to managerial style of man- 
agement, leader upward influence; and character- 
istics of the task, including pressure and provision 
of intrinsic satisfaction. (p. 62) 


More recent investigations also suggest that 
the following situational variables may mod- 
erate the C-IS predictive capability: sex 
(Day & Stogdill, 1972), task type (Hill & 
Hughs, 1974), length of employment and 
organizational climate (Kavanagh, 1975), and 
work-unit size (Schriesheim & Murphy, 
1976). 

Fiedler’s work on leadership likewise indi- 
cates that researchers should be examining 
the communication attributes of effective su. 
periors from a contingency perspective (e.g., 
Fiedler, 1964, 1967, 1970, 1971a, 1971b, 
1972a, 1972b). Emphasizing the role of 
leader personality and style of interaction on 
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work-group performance, Fiedler argues that 
three dimensions of task situations primarily 
determine leader effectiveness: leader-member 
relations (which obviously are dominated by 
communication behavior), task structure and 
leader-position power. In essence, this re- 
search suggests that supervisors have pre- 
dominant styles of interacting with subordi- 
nates and that their effectiveness will vary, 
depending on whether the situation (as previ- 
ously defined) is best suited to that style. 
Finally, recent research that views leader- 
ship from a dyadic linkage —role-making 
perspective shows that supervisors do not 


develop the same kinds of relationships with” 


all subordinates and superiors, and thus the 
communicative behavior that may be effective 
in one type of relationship may not be effec- 
tive in another (Cashman, Dansereau, Graen, 
& Haga, 1976; Dansereau, Cashman, & 
Graen, 1973; Dansereau et al., 1975; Graen, 
Cashman, Ginsburgh, & Schiemann, 1977; 
Haga, Graen, & Dansereau, 1974). Specifi- 
cally, results of this research indicate that 
supervisors tend to develop one of two types 
of exchange patterns with subordinates, that 
is, either a pattern of leadership exchange 
(characterized by "influence over a member 
without resort to authority" [Cashman, 1976, 
P. 281]) or supervision exchange (in which 
"influence over a member is based primarily 
upon authority" [Cashman et al, 1976, p. 
281]). Moreover, it has been found that su- 
pervisors in turn develop either leadership 
exchanges or supervision exchanges with 
their bosses. In addition, findings suggest that 


subordinate members of the upper dyad who de- 
velop leadership exchanges with their bosses have 
greater influence with their bosses and receive more 
latitude, support and attention from their bosses than 
their colleagues who fail to develop leadership ex- 
changes. (Graen et al., 1977, p. 502) 


In summary, investigations that explored a 
dyadic linkage — role-making model of leader- 
ship suggest that superior-subordinate com- 
munication patterns are not stable across all 
Superior-subordinate interactions and may 
vary as a function of organizational under- 
structure. 

As noted in the opening of this section, we 
are rich in research studies that have at- 


tempted to identify the effective and ineffec- ; 
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tive communication correlates of supervision. 
Clear evidence has been presented that sug- 
gests a certain profile that characterizes the 
communication behaviors of effective super- 
visors. On the other hand, several other 
research traditions were reviewed which indi- 
cate that the qualities of effective leadership 
vary from situation to situation and are con- 
tingent on numerous factors. Which set of 
findings and conclusions are we to believe? 
The answer would appear to be both. Data 
that comprise the Purdue and other related 
effectiveness studies have been collected from 
a myriad of organizations and supervision 
situations—the pattern of results is too con- 
sistent to reject without further research. It 
may be that for 60% of the superior-subordi- 
nate communication situations, these findings 
are applicable, yet they may only apply to 
10% of the cases. Obviously, the only way 
we will be able to resolve this question is by 
research that investigates the effects situa- 
tional variables have on superior-subordinate 
communication. 


Personal Characteristics 


In the process of studying superior-sub- 
ordinate communication, researchers have at- 
tempted to discover the personal character- 
istics of members of that dyad that mediate 
their communication behavior. Since the 
variables examined in these investigations are 
and at present lack co- 
herent organization, this section endeavors to 
provide such structure. 

Several investigators have explored the ef- 
fects interactant tendencies for internal as 
compared to external locus of control have on 
superior-subordinate interaction (Durand & 
Nord, 1976; Mitchell, Smyser, & Weed, 1975). 
Findings from these studies suggest (a) that 
internal subordinates see their supervisors as 


more considerate than do externals; (b) that 


internals are most satisfied with participative 
superiors, whereas externals are most satisfied 
with directive superiors; and (c) that inter- 
nal superiors tend to use persuasion to obtain 
subordinate cooperation, whereas externals 
rely more on coercive power. In addition, re- 
search related to the study of locus of con- 
trol indicates that subordinates and superiors 
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with passive personalities tend to exaggerate 
the volume of their interaction with others, 
whereas active persons tend to underestimate 
their interaction (Webber, 1970). 

Studies that examined the characteristics 
of supervisors and their communication-re- 
lated behavior with subordinates suggest the 
following interesting conclusions: (a) High- 
least-preferred-co-worker (LPC) leaders un- 
der conditions of threat tend to engage in 
considerate behavior, whereas low-LPC lead- 
ers tend to increase initiating structure (Green, 
Nebeker, & Boni, 1976); (b) lower level su- 
pervisors tend to be more dogmatic than 
upper-middle and top-level managers (Close, 
1975); (c) young managers (20-29 years) 
tend to be more autocratic and low in human 
relations skills than are middle-aged (30-40 
years) or late middle-aged (40-55 years) 
managers (Pinder & Pinto, 1974); and (d) 
superiors tend to rate subordinates as compe- 
tent when they have values similar to those 
of the superior (Senger, 1971). 

On the other hand, studies exploring sub- 
ordinates perceptions of superior's communi- 
cation behavior and personality indicate (a) 
that superiors who are apprehensive com- 
municators are not particularly liked by sub- 
ordinates (Daly, McCroskey, & Falcione, 
Note 3); (b) that subordinate's satisfaction 
with superiors can be predicted from several 
dimensions of homophily-heterophily (Daly, 
McCroskey, & Falcione, Note 4); (c) that 
authoritarian subordinates are most satisfied 
when they work for directive superiors (Bass, 
Valenzi, Farrow, & Solomon, 1975; Tosi, 
1973); (d) that subordinate satisfaction with 
immediate supervision is related to subordi- 
nate perception of superior’s credibility (Fal- 
cione, 1974); (e) that confirmation of sub- 
ordinate’s needs for affection and dominance 
results in greater perceived frequency of 
interaction between superior and subordinate 
(Hawkins, 1976); (f) that subordinates in 
small work groups, who require high interac- 
tion with co-workers and superiors and high 
interdependence, have negative attitudes 
toward authoritarian supervisors, whereas sub- 
ordinates in large work groups, with re- 
stricted interaction and highly independent 
work, have more positive attitudes toward 
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authoritarian supervision (Vroom & Mann, 
1960); and (g) that subordinates, regardless 
of their personality, tend to be most satisfied 
with superiors high in human relations orien- 
tation (Weed, Mitchell, & Moffitt, 1976). In 
addition, it should be noted that Hall (1974, 
1975) has conducted a series of investiga- 
tions that examine, in part, personality corre- 
lates of organizational members that affect 
superior-subordinate communication, the re- 
sults of which are too voluminous to report 
here. 

An area of superior-subordinate research 
that has received considerable attention of 
late is concerned with differences between 
male and female supervisory behaviors. The 
general results of these inquiries indicate that 
subordinates do not describe the behaviors of 
male and female leaders differently (e.g, 
Bartol & Wortman, 1975; Day & Stogdill, 
1972) but do agree on the existence of leader 
sex role stereotypes (e.g, Rosen & Jerdee, 
1973; Schein, 1973, 1975). Specifically, there 
is strong evidence which suggests that sub- 
ordinates of both sexes are more satisfied with 
consideration behavior from a female superior 
than a male and are more satisfied with 
exhibition of initiating structure by male su- 
periors than with similar behaviors from fe- 
male superiors (Bartol & Butterfield, 1976; 
Petty & Lee, 1975; Petty & Miles, 1976). 
Moreover, Sussman, Pickett, Berzinski, and 
Pearce (in press) report that the sexual 
composition of superior-subordinate dyads 
does impose “norms and restrictions” on up- 
ward communication in the dyad. 

In summary, studies that examined the 
effects of personal characteristics on superior- 
subordinate communication have tended to 
focus on three basic areas: (a) single char- 
acteristics of superiors or subordinates that 
affect their communication behavior, (b) char- 
acteristics of superiors and subordinates, taken 
together, and their effect on superior-subordi- 
nate communication, and (c) differences in 
superior-subordinate communication as a 
function of the sex of the interactants, Al- 
though the findings of most of this research 
are interesting and important, on the whole 
they tend to be isolated and to lack theoreti- 
cal foundations, Future investigations should 
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endeavor to remedy this situation by relating 
such studies to the larger scope of organiza- 
tional communication theory. 


Feedback 


Probably one of the most common com- 
plaints aired by superiors and subordinates 
about their communication relationship is that 
one of the interactants does not provide the 
other with sufficient and relevant feedback. 
Both upward and downward feedback appear 
to be essential for effective superior—subordi- 
nate relations, since such feedback provides 
information that denotes the success or failure 
of policies and objectives, that suggests the 
need for corrective actions and controlling 
mechanisms, and that provides the members 
of the dyad with knowledge of the other 
party's sentiments about formal and in- 
formal organizational activities. In his collec- 
tion of theory and research in organizational 
communication, Redding (1972) provides an 
extensive review of empirical inquiries that 
explore feedback in superior-subordinate com- 
munication through about 1970 (see espe- 
cially pp. 39-62). Now classic studies such as 
Leavitt and Mueller (1951), Smith and Kight 
(1959), Gibb (1961), Zajonc (1962), Bow- 
man (1963), Haney (1964), Meyer, Kay, 
and French (1965), Cook (1968), and Minter 
(1969) are summarized in Redding’s an- 
thology. Hence, the following review examines 
only empirical research conducted in the area 
of feedback іп superior-subordinate com- 
munication after 1970, 

Studies that examined feedback in superior— 
subordinate communication since 1970 can be 
grouped into one of two categories: (a) in- 
vestigations that explored the effects of sub- 
ordinate's feedback to superiors on superior's 
behavior and (b) research analyzing the 
effects of superior’s feedback to subordinates 
on subordinate’s behavior. Investigations in 
the former category report a number of 
interesting findings. Brenner and Sigband 
(1973), who surveyed over 700 managers in 
a major aerospace firm have found that sub- 
ordinate’s feedback to superiors is greater 
when 


(a) subordinates were told what was to be done 
with completed assignments, (b) the superior 
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formerly held the subordinate’s position, (c) the 


a superior made the largest proportion of assignments 


Є 


Wto the subordinate, and (d) the subordinate felt 
that he could secure clarification of assignments 
from his immediate superior. (p. 325) 


Attempting to determine if a leader's verbal 
behavior could be altered by manipulating 
feedback to him/her, Butler and Jaffee 
(1974) report that positive feedback to a 
leader made him/her more task oriented, 
whereas negative feedback increased negative 
social-emotional behavior (as classified by 
Bales's Interaction Process Analysis category 
system). These researchers argue that their 

‘results 
indicate that in a production-oriented organization, 


positive feedback is to be preferred to negative 
feedback, and negative feedback might have very 


little to offer if no specific suggestions for changing 
one’s behavior are given. (p. 335) 


In a related study, Fodor (1974) explored 
the effects of a subordinate’s disparagement of 
a superior’s competence on the superior’s dis- 
tributions of rewards to subordinates. Results 
‘indicated that the superior tended to favor 
a compliant subordinate who was not an in- 
gratiator. Finally, several studies have also 
attempted to determine whether the subordi- 
nate’s feedback to superiors elicits changes in 
the superior’s behavior and, concomitantly, 
changes in the subordinate’s attitudes toward 
the superior. For example, Hegarty (1974), 
who used survey feedback methods, found that 
„supervisory performance improved subse- 
quent to subordinate feedback, whereas Bur- 
naska (1976) relates findings which suggest 
that feedback and subsequent training can 
quickly change a supervisor’s behavior but 
that worker perceptions of the superior 
change only with time. 

A number of investigations have examined 
the effects of a superior’s feedback to sub- 
ordinates on the behavior of those subordi- 
‘hates. Harvey and Boettger (1971), in an 
experiment designed to improve communica- 
tion in a managerial work group, reported a 
norm among subordinates against asking 
superiors for clarification of memos that are 
unclear or contain double messages. In a 
preliminary investigation that explored 
sources of feedback, Greller and Herold 


m" (1975) suggest that intrinsic (ће. psycho- 
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logically close to the individual) sources of 
information are seen by workers as providing 
more feedback than sources that are seen as 
external (i.e., psychologically distant). More- 
over, in a related study, Kim and Hamner 
(1976) provide evidence that evaluative 
supervisory feedback to subordinate perform- 
ance (i.e., extrinsic feedback) and nonevalua- 
tive feedback (i.e., subordinate self-generated 
or intrinsic feedback), when combined with a 
goal-setting production program, increase sub- 
ordinate performance significantly beyond that 
of groups involved in just goal setting. Recent 
research also suggests (a) that superiors’ 
feedback to a subordinate, which shows a lack 
of trust in the subordinate, results in subordi- 
nate dissatisfaction and aggressive feelings 
(Brenenstuhl, 1976); (b) that superiors per- 
ceived as expressive (high in human rela- 
tions) are more likely to provide subordi- 
nates with social approval than those su- 
periors perceived as instrumental (a Weberian 
orientation) (Marcus & House, 1973); (c) 
that in conflict situations supervisory re- 
sponses that relate acceptance and encourage- 
ment of subordinate disagreement are associ- 
ated with high subordinate satisfaction 
(Burke, 1970; Renwick, 1975); and (d) 
that under low surveillance (infrequent need 
to report to superior) positive feedback from 
superior to subordinate leads to greater sub- 
ordinate compliance than when the subordi- 
nate receives no direct feedback, whereas 
under high surveillance conditions subordi- 
nates who receive positive feedback from 
their superiors comply less than when they 
receive no direct feedback from their su- 
periors (Organ, 1974). 

In addition, a variety of research has ex- 
plored the effects of superior reward or rein- 
forcement behavior on subordinate perform- 
ance and satisfaction. These studies suggest 
the following conclusions: (a) Leader positive 
reward behaviors (e.g., recognition of sub- 
ordinate performance) are generally associated 
with subordinate satisfaction, but the rela- 
tionship between leader punitive rewards 
(e.g, corrective actions) and subordinate 
satisfaction varies as a function of the nature 
of the task performed by each work group 
(Sims & Szilagyi, 1975); (b) superiors who 
frequently criticize their subordinates for 
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poor work are generally rated as less effective 
than those who criticize less frequently (Old- 
ham, 1976); and (c) a superior tends to posi- 
tively reinforce a subordinate when he/she is 
positively reinforced by the subordinate's 
performance and to negatively reinforce a 
subordinate when he/she is negatively rein- 
forced as a result of the subordinate's per- 
formance (Barrow, 1976; Greene, 1975; Hin- 
ton & Barrow, 1975; Lowin & Craig, 1968). 

lt should also be noted that research dis- 
cussed earlier by Stull (1975) and Jablin 
(1978a) indicates that both superiors and 
subordinates prefer message responses from 
one another that provide positive relational 
feedback to the source of the message. In ad- 
dition, Hill's (1973) findings suggest a signifi- 
cant tendency for subordinates to perceive 
their bosses as using one style of response to 
"handle interpersonal problems and another, 
different style to tackle technical problems" 
(p. 45). 

In summary, results of investigations since 
1970 that inquired into the nature of su- 
perior-subordinate feedback processes are gen- 
erally consistent with conclusions from earlier 
research. Feedback from superiors to sub- 
ordinates appears to be related to subordinate 
performance and satisfaction. However, at the 
same time, findings suggest that the subordi- 
nate's performance to a large extent controls 
the nature of his/her superior's feedback. 
Thus, present evidence indicates that future 
research should continue to explore the recip- 
rocal character of feedback in superior-sub- 
ordinate relationships, with particular empha- 
sis on specific influence mechanisms. 


Systemic Variables 


Researchers have long been concerned with 
the effects that systemic organizational varia- 
bles (e.g., technology, control Structure, hier- 
archy, environment) have on the quality and 
nature of superior-subordinate communica- 
tion. However, without doubt, less empirical 
as compared to theoretical research has been 
conducted in this area, 

The exact role of technology in determining 
communication in organizations has puzzled 
researchers for at least two decades. For ex- 
ample, in the late 1950s Simpson (1959) 
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found what he believed was the critical varia- 
ble mediating the flow of vertical and hori- 
zontal communication in organizations: the 
degree of “mechanization” of work processes, 
Trist and Bamforth (1951), Gouldner 
(1954), Woodward (1958, 1965, 1970), 
Lawrence and Lorsch (1967), Perrow (1967), 
Pugh, Hickson, Hinings, and Turner (1969), 
and Peterson (1975), among others, all report 
findings which suggest that organizational 
members’ perceptions of organizational and 
communication climate are linked to techno- 
logical processes within organizations, Of even 
greater significance is Dubin’s (1965) discov- 
ery that what is considered effective super- 
vision may, in part, be a function of an 
organization’s or work group’s technology. 
Moreover, a recent doctoral study by Derry 
(1973) directly examined the effects tech- 
nology and hierarchical position have on 
supervisory style, “communication respon- 
siveness,” and social interaction strategy. Re- 
sults from two technologically diverse units 
within a manufacturing company (i.e., a man- 
ufacturing group and a research and develop- 
ment unit) indicated different patterns of 
superior-subordinate communication, depend- 
ing on the interaction between technology and 
hierarchy. 

Several investigations have also examined 
the relationship between organizational struc- 
ure and participant communication behavior 
(e.g, Bass, 1976; Blankenship & Miles, 
1968; Ghiselli & Siegel, 1972; Porter & Law- 
ler, 1964). Findings from these studies suggest 
the following conclusions that are relevant to 
superior-subordinate communication: (a) Up- 
per level managers tend to involve their sub- 
ordinates more in decision making than do 
lower level managers, whereas lower level 
managers tend to have decisions initiated for 
them by their superiors (Blankenship & 
Miles, 1968; Jago & Vroom, 1977); (b) “as 
compared with firms which have tall organiza- 
tional structures, those which have flat struc- 
tures reward with more rapid advancement 
those managers who favor sharing informa- 
tion and objectives with subordinates” (Ghi- 
selli & Siegel, 1972, p. 622); and (c) in situ- 
ations that are "regular, clear and struc- 
tured," authoritative managerial direction is 
frequent, but subordinates often perceive such 
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direction “to be more effective under reverse 
conditions” (Bass, 1976, p. 215). 

In summary, empirical research that has 
examined the relationship between systemic 
organizational variables and communication 
between superiors and subordinates has 
tended to focus on one of two areas: (a) 
technology or (b) organizational structure. 
However, the major portion of this research 
has often been simplistic and/or of limited 
generalizability. Future research in this area 
is sorely needed, for if we are to ever under- 
stand the micro system of superior-subordi- 
nate communication, we must first explicate 
its relationship with variables in the organiza- 
tion's macrosystem. 


Discussion 


This literature review has attempted to 
organize into nine topical categories empiri- 
cal research on superior-subordinate Com- 
munication. A close examination of this re- 
search suggests a number of conclusions. 
First, inspection of the variables explored 
within each category indicates that several 
basic constructs appear in more than one 
grouping. Specifically, we find at least three 
items that are consistently being studied: (a) 
the effects of power and status on superior- 
subordinate communication, (b) trust as a 
moderator of superior-subordinate communi- 
cation, and (c) semantic-information distance 
as a source of misunderstanding in superior- 
subordinate communication. Moreover, these 
three variables tend to be explored concurrent 
to one another rather than in isolation of each 
other, Power and status differentials are an 
inevitable result of organizational develop- 
ment and in part serve as the impetus for the 
semantic-information distance between su- 
periors and subordinates. However, the rela- 
tionship of interpersonal trust to these varia- 
bles is not perfectly clear, since in some 
situations it facilitates openness and under- 
standing between superior and subordinates, 
whereas in other circumstances it appears to 
have no effect whatsoever. Moreover, there is 
some empirical and theoretical evidence (e.g., 
Sussman, 1975) which suggests that superior- 
subordinate semantic-information distance is a 
valuable and important feature of organiza- 
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tion and that too large an attenuation of this 
gap may have dysfunctional consequences for 
organizational effectiveness. In short, the rela- 
tionship between superior-subordinate seman- 
tic-information distance and organizational 
effectiveness may be a curvilinear association 
and may be one that is differentially mod- 
erated in various situations by interpersonal 
trust and perceived power and status differen- 
tials between the interactants. Obviously, ad- 
ditional multivariate research is required be- 
fore we can fully understand the relationships 
between interpersonal trust, power and status, 
semantic-information distance, and superior— 
subordinate communication. 

The second major conclusion to be drawn 
from the preceding review of literature is that 
a contingency/situational approach to the 
study of superior-subordinate communication 
is necessary. Repeatedly, we find that situa- 
tional variables moderate the type and qual- 
ity of communication exchanged between su- 
periors and subordinates. Furthermore, if 
future research confirms the above proposi- 
tion, much of the existing literature relating 
to superior-subordinate communication will 
become suspect and will require replication 
and clarification within and between organi- 
zations. In addition, it is essential that re- 
searchers start to explore the effects systemic 
organizational variables have on superior-sub- 
ordinate communication, for as Graham and 
Roberts (1972) observe, 


we will only increase our understanding of organi- 
zational behavior as more researchers simultane- 
ously investigate individual, group and organiza- 
tional variables within organizations, and between 
organizations and the effects of environmental fac- 
tors on those components. (p. 130) 


Moreover, it is likely that such an approach 
will be more conducive to theory building in 
the area of superior-subordinate communica- 
tion and organizational communication in gen- 
eral. 

Finally, this review and interpretation of 
the literature identifies a need for some 
changes in the research questions we are ask- 
ing about superior-subordinate communica- 
tion and in the methods that are employed to 
answer those questions. At present, the ma- 
jority of investigations exploring superior— 
subordinate communication have focused on 
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describing the various problematic states of 
superior-subordinate relations. For example, 
we can describe with a fair degree of confi- 
dence an open and closed superior-subordi- 
nate relationship, the communication qualities 
and characteristics of effective supervisors 
(for at least a limited number of situations), 
and the types of messages that tend to be 
distorted in upward communication. However, 
a much smaller amount of research has been 
directed towards discovering the antecedents 
to these conditions. For instance, we need to 
Start asking questions such as How do open 
and closed superior-subordinate communica- 
tion relationships develop? How do initial 
attributions and expectations of new superior- 
subordinate relations affect subsequent com- 
munication behavior? What Stages of growth 
are characteristic of superior-subordinate 
communication relationships? How do su- 
perior-subordinate communication patterns 
change over time, and what causes these 
changes? And, of course, while these ques- 
tions are being explored, a contingency orien- 
tation within our research investigations 
should be maintained. As Redding (1972) 
has observed, “the contingency nature of most 
generalizations about organizational phenom- 
ena should be kept in mind when interpret- 
ing the findings of studies dealing with super- 
visory communication” (p. 439). 

The approach that Graen and his associates 
have employed to study dyadic linkages in 
superior-subordinate leadership exchanges 
provides an excellent model for the general 
study of superior-subordinate communica- 
tion. Specifically, these researchers have at- 
tempted to trace the development of superior- 
subordinate relations from their initiation 
through the emergence of stable interaction 
patterns. In other words, they have used a 
developmental and longitudinal research de- 
sign in exploring leadership behavior. Such an 
approach also appears to be idea] for the study 
of superior-subordinate communication, since 
it can provide descriptive, analytical, and 
often quasi-experimental data about Superior— 
subordinate relationships. Tt Should also be 
noted that a growing body of research is 
emerging that explores the relationships be. 
tween applicant job expectations and em. 
ployee attitudes, Satisfaction, and so forth 
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subsequent to employment in an organization. 
(e.g, Ilgen & Seely, 1974; Katzell, 1968; | 
Wanous, 1973). Information from these 


stud-} 


ies that relates to potential characteristics of? 


superior-subordinate communication can serve 
as the initial point to begin our inquiries into 
the development of superior-subordinate 


| 


communication patterns, In summary, combin- 


ing our current knowledge of superior-sub- 
ordinate communication with future studies 
that examine the dyad from a developmental, 
longitudinal perspective appears to provide the 
most promise for increased understanding of 
the phenomena we call superior—subordinate 
communication, 
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Three Psychosomatic Disorders 
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Studies of hypnosis in the treatment of skin disorders, headaches, and asthma 

were reviewed in terms of outcomes and methodological soundness. Some studies 

focused on changing physiological functions, others on increasing insight in 
A their patients, and still others on altering patients’ perceptions of their symp- 
toms. Methodological weaknesses included lack of control groups, nonrandom 
assignment of patients to treatment conditions, and confounding of treatment 
effects or lack of control for placebo effects. Additional weaknesses centered 
around the use of single outcome measures and the failure to assess the specific 
roles of mediating variables. Most of the studies reviewed showed positive 
treatment effects. However, there is equivocal evidence that hypnosis can di- 
rectly influence autonomic functioning. Hypnosis may be valuable in facilitating 
one’s capacity to gain insight into how one’s symptoms developed and are main- 
tained. In addition, hypnotic procedures have resulted in some success when 
used to indirectly alleviate symptoms by altering how individuals perceive their 


3 
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disorders and how these disorders affect their lives. 


Hypnosis has become an accepted mode of 
treatment in both medicine and clinical psy- 
chology. Its judicious use was endorsed by 
the American Medical Association in 1958 and 
by the British Medical Association in 1955. 
In 1960, the American Psychological Associ- 
ation sanctioned its use in clinical practice 
(Hilgard, 1968). Several societies are de- 
voted to the professional study of hypnosis, 
and both psychology and medical training 
programs offer courses in it. 

This article reviews the published reports 
of the clinical application of hypnosis to skin 
disorders, headaches, and asthma. The review 
covers the period from 1967 to 1977 compre- 
hensively, as well as reports prior to 1967 
that had a substantial impact on the field. 

Hypnosis is most frequently associated 
with suggestibility. Barber (1974) and his 
collaborators have, in essence, equated the 
two by arguing that subjects, when given 
direct suggestions without a hypnotic induc- 
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tion, displayed behavior similar to that of 
subjects given a formal induction. Other in- 
vestigators have adhered to the conceptualiza- 
tion of hypnosis as an altered state of con- 
sciousness (Fromm, 1977; Hilgard, 1968). 
Tn either case, its investigation as à treatment 
adjunct for psychosomatic disorders would be 
justifiable. However, expectations for success 
might vary with the definition that the 
investigator accepts. 

Although there is disagreement about 
which specific ailments are psychosomatic 
and which are primarily organic, it is widely 
accepted that psychosomatic disorders are 
etiologically related to or are exacerbated by 
psychological factors, even though physical 
symptoms are present and medical interven- 
tion is often necessary. Some practitioners 
argue that illness is always a psychosomatic 
event. Others merely label a disorder psycho- 
somatic when physical causes are not readily 
identifiable. Obviously, an investigator’s view 
of a disorder as psychosomatic would influ- 
ence the decision to use hypnosis in a particu- 
lar case, which might, in turn, affect the 
outcome of treatment. 
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Most investigators do not present theoreti- 
cal arguments for using hypnosis with psycho- 
somatic disorders. Therefore, much must be 
inferred. From the theories that are discussed 
in the literature, three rationales seem to 
emerge. First, there are those who contend 
that through hypnosis and hypnotic sugges- 
tions, certain autonomic nervous system func- 
tions, which are not readily controlled by the 
individual, come under voluntary control and 
mediate symptom changes. For example, Claw- 
son and Swade (1975) contend that blood 
flow to internal organs and limbs can be 
controlled. 

Next, there are investigators who use hyp- 
nosis for its assumed capacity to assist pa- 
tients in gaining insight into how their 
symptoms develop and are maintained. Once 
this insight is achieved, it is further assumed 
that individuals will be able to solve their 
problems, thus eliminating the symptoms. 

Finally, there are investigators who use 
hypnosis to alter the way in which one's dis- 
order is perceived. For example, suggestions 
are given that the symptoms will no longer 
concern the patient. This altered perception 
has two potential effects. First, the patient 
can become less debilitated by these symptoms 
and thereby can function more effectively, 
and second, the emotional component of the 
disorder, which may exacerbate the condi- 
tion, can be reduced. 

Research articles have occasionally ap- 
peared that investigate the use of hypnosis 
in treatment of other psychosomatic disorders, 
such as Raynaud's disease, arthritis, Seizures, 
stomach disorders, Gilles de ја Tourette’s 
syndrome, hypertension, and so forth. Hyp- 
nosis research has been reported most fre- 
quently in the treatment of skin disorders, 
headaches, and asthma. Therefore, the present 
review has been limited to these three dis- 
orders. 


Methodological Considerations 


As is true of much clinical Tesearch, studies 
of hypnosis in the treatment of psychoso- 
matic disorders reflect many methodological 
shortcomings. Understandably, in а clinical 
setting, it is frequently extremely difficult to 
establish proper controls, Considerably more 
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than half of the studies reviewed were sin- 


gle- or multiple-case reports. A few used a: 
treatment group procedure but did not in- | 
clude а comparison group. Of the remaining | 


controlled studies, some did not randomly 
assign subjects to treatment conditions, 

A glaring weakness across the majority of 
studies was the lack of specification of the 
hypnotic procedure used, Few studies dis- 
cussed the induction technique in detail, and 
even fewer reported the use of a reliable 
instrument for assessing whether the patient 
was hypnotized. An additional widespread 
shortcoming was the lack of control for ex- 
perimenter bias, which could have been ac- 
complished by keeping the experimenter blind 
to patient treatment group. 

Methodological weaknesses were also noted 
in terms of the outcome measures employed. 
Self-report and physicians’ impressions were 
most frequently used. Physiological change 
Was assessed on occasion. A combination of 
both objective and subjective measures, how- 
ever, was infrequently reported. 

Finally, none of the studies that were re- 
viewed employed an acceptable control for 
placebo effects, 

A summary of the Studies can be found in 
Table 1, which includes an indication of 
whether the treatment was successful (+) 
or unsuccessful (—), time interval of follow- 
UP, type of design (case, treatment group, 
treatment and control condition), whether or 
not subjects were randomly assigned, and the 
number of patients/subjects studied. 


Skin Disorders 


Hypnosis is commonly used as one form of 
treatment for skin disorders because of the 
generally assumed association of skin re- 
Sponses and the autonomic nervous system. 
Both the skin and the autonomic nervous 
system develop embryologically from the 
ectoderm, and correlations between self-report 
of the normal Tesponses to embarrassment 
with blushing and to fear with blanching and 
Boose pimples are well accepted (Jabush, 
1969). The types of disorders treated include 
Psoriasis (a condition characterized by red- 
dish and Silvery scales), dermatitis (an in- 
flamation of an area of the skin), boils (pyo- 
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Table 1 
Experimental Designs and Treatment Outcomes 
Results* 
Random 
Study P I 5 Follow-up Design Assignment п 
Skin Disorders 
Physiological change 
Twerski & Naar (1974) Am pem 6 months Case report na 1 
Ewin (1974) ля a 6 years Case report na 1 
Jabush (1969) de NE 21 years Case report. na 1 
Tasini & Hackett (1977) MAE 4-8 months Case report na 3 
Clawson & Swade (1975) 55 + 0-4 years Case report na 3 
Vollmer (1946) e Wi Not reported Multiple case na 7 
Surman, Gottlieb, Hackett, zi ES 3 months Treatment group No 24 
& Silverberg (1973) 
Peters & Stern (1971) = na Treatment and Yes 20 
control conditions 
Johnson & Barber (1976) = = na Treatment and Yes 48 
b: control conditions 
Insight 
Ewin (1974) ЕЕ £ 11-5 years Multiple case na 3 
French (1973) 2 = 5 weeks Case report. na 1 
Frankel & Misch (1973) STE, 13 months Case report. na 1 
Altered perception 
Ament & Milgrom (1967) RITE 2 months Case report. na 1 
Klinge (1971) пети Not reported Multiple case na 3 
Headache 
Physiological change 
Ansel (1977) + + Not reported Case report na 1 
Todd & Kelly (1970) as FD 1 month Case report na 1 
Graham (1975) "E s 9-12 months Case report na 2 
Harding (1961) + + 6-30 months Treatment group No 25 
| Cedercreutz, Lahteenmaki, + + 22 months Treatment group No 155 
7U* — & Tulikoura (1976) 
Andreychuk & Skriver ај x None Treatment and Yes 33 
(1975) control condition 
Anderson, Basker, & + E 1 year Treatment and Yes 47 
Dalton (1975) control condition 
Insight 
Blumenthal (1963) T 1 year Case report na 2 
Altered perception 
Kroger (1963) T Not reported Multiple case na Not 
reported 
Asthma 
Physiological change 
* Dennis (1965) + + na Treatment group No 5 
Weiss, Martin, & Riley cs na Treatment and Yes 16 
(1970) control condition 
Smith & Burns (1960) = == 4 weeks Treatment group No 25 
Mun (1969) + + + Not reported Treatment group No 10 
l Атоо Aronoff, & Peck Er $ Not reported Treatment group No 17 
| (1975 
uos & Day * us F na Treatment group No 20 
(1972; 
Fry, Mason, & Pearson F na Treatment and Yes 47 
- (1964) control condition 
Altered perception 
Hanley (1974) + 6-30 months Case report na 2 
Smith (1970) Т + + па Case report na 2 
Edwards (1960) = И 1 year Treatment group na 6 
* Mun (1969) + Е 6 months Treatment group No 36 
Moorefield (1971) zr 5 17 months Treatment group. No 9 
Collison (1975) + ES Not reported Treatment group No 121 
White (1961) - T 12-17 months Treatment group No 10 
Maher-Laughan (1970) 
Study A + 6 months Treatment and Yes 55 
control condition 
Study B => 1 year Yes 252 
Study C T 6 years Treatment group No 173 


Note, Р = physiological or objective change associated with symptom removal; I = physician's or investigator's impression of symp- 


tom removal; S — self-report of symptom remission; na 


= not applicable. 


) 4 -+ indicates that treatment was successful; — indicates that treatment was unsuccessful. 
* 
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genic infections originating in hair follicles), 
and warts (thickenings of the Malpighian and 
granular layers of the epidermis). 

Physiological changes. A number of sin- 
gle- and multiple-case studies have reported 
the successful application of hypnosis to 
various skin disorders, for example, Twerski 
and Naar (1974), refractory dermatitis; Ja- 
bush (1969), boils; and Tasini and Hackett 
(1977), Clawson and Swade (1975), Vollmer 
(1946), and Ewin (1974), warts. These in- 
vestigations did not take into account baseline 
rates of spontaneous remission, and in addi- 
tion, they often confounded the effects of 
hypnosis with other treatment effects. Al- 
though several investigators hypothesized that 
autonomic changes would occur through hyp- 
nosis, no physiological measurements were 
made. 

Surman, Gottlieb, Hackett, and Silverberg 
(1973) conducted a controlled investigation 
of the effects of hypnotic treatment of warts. 
Seventeen patients were told, during each 
weekly hypnotic session, that they would ex- 
perience a tingling sensation in the warts on 
one side of their body and that only those 
warts would subsequently disappear. Treat- 
ment continued for 5 consecutive weeks. Seven 
patients served as a waiting list control 
group. The authors concluded that warts re- 
spond to hypnosis but that hypnotic sug- 
gestion does not affect them selectively. There 
were, however, critical differences between the 
treatment and comparison groups. The aver- 
age duration since the appearance of the 
warts had been 2.8 and 3.9 years, respectively, 
for these groups, As reported by the authors, 
warts spontaneously disappeared, on the aver- 
age, 2.28 years after onset. It may well be 
that the warts of the patients in the com- 
parison group were resistant to remission over 
time, whereas those in the treatment group 
were less resistant. The conclusion that hyp- 
notic suggestion can eradicate warts becomes 
even more suspect when, although none of 
the 7 control subjects lost any warts during 
the 3-month treatment period, one of them 
who was scheduled for postexperiment hyp- 
notherapy, spontaneously lost all but one 
wart before the first treatment. session. 

The few well-controlled studies that in- 
vestigated physiological changes were not per- 
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formed on clinical populations. In a labora- 
tory study conducted by Peters and Ster 
(1971), it was hypothesized that blood pres 
sure, skin temperature, and pulse rate woul 
increase in subjects given suggestions of hives 
and would decrease in those given the sug: 
gestion that they would show symptoms 
(peripheral vasoconstriction) associated with 
Raynaud's disease. These effects had been 
observed during the natural occurrence of 
these disorders. Subjects were well screened | 
for hypnotic susceptibility before entry into 
the experiment. There were four treatment) 
groups, each of which received two sessions, 
Group 1 received suggestions for hives while 
hypnotized in Session 1 and suggestions for 
symptoms of Raynaud’s disease with method 
acting instructions in Session 2. Group 2 те 
ceived the same treatment as Group 1, wit 
sessions given in reverse order. Group 3 rez 
ceived suggestions for symptoms of Ray- 
naud's disease while hypnotized, then hives” 
suggestions with method acting instructions. 
Group 4 was treated identically to Group 3, 
with sessions administered in reverse orde 

Although Peters and Stern did not elaborate 
on what they meant by method acting, it 
appears to be similar to role playing. Results 
did not support the contention that specifi 
changes would occur depending on suggestions 
given during hypnosis. It was observed, how- 
ever, that hypnotized subjects, regardless of 
the suggestions given, did show a decrease in 
skin temperature and blood volume in the) 
fingers. 

In a second well-controlled nonclinical ex- 
periment, conducted by Johnson and Barber 
(1976), the effects of hypnotic suggestion on 
blister formation were examined. This study 
Was especially noteworthy for its attempt to 
control many of the factors that possibly 
influenced results of earlier studies, including 
(a) direct observation of subjects, to elimi- 
nate the possibility that skin changes might) 
be a function of self-injurious behavior; (b) 
Procedures to insure that the experimenter 
remained blind, to reduce the possibility of 
experimenter bias; (c) an evaluation of hyp- 
notic susceptibility; and (d) an evaluation of 
skin sensitivity, to assess its influence on skin 
change. An additional strong point of this 
study was the assessment of skin temperature 
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changes, which were assumed to mediate 
blister formation, in addition to the direct 
"Wbservation of external changes of the skin. 

"None of the experimental subjects dis- 
played any true blisters. In addition, only 2 
of the 48 subjects displayed visible skin 
changes. In both of these subjects, changes 
could be attributed to factors other than 
hypnotic suggestion. No differences in tem- 
perature were found between the test hand 
and control hand. Comparison of susceptible 
versus nonsusceptible subjects revealed no 
differential change in hand temperature. The 
actual data were mot presented, however, 

Ek o it impossible to determine if this 
onsignificant finding was due to a lack of a 
cause-effect relationship or because of a lack 
of statistical power, due to the small number 
of subjects who were considered susceptible. 
Although the study indicated that the gen- 
eral population may not show physical 
changes in response to suggestion, it does not 
preclude the possibility that subjects who 
spontaneously develop blisters (and other 

„kin disorders) might respond positively to 
suggestion. 

Insight, Several case studies illustrate how 
hypnosis was used to help patients gain in- 
sight into the development and maintenance 
of their symptoms. 

Ewin (1974) reported on three cases of 
successful treatment of conyloma acumatum 
(warts on the genitalia or perineal region). 
The first case met with initial failure when 
suggestions for direct physiological changes 
were made, After this failure, the assumption 
was made that the warts were serving some 
purpose not consciously perceived by the pa- 
tient, While hypnotized, the patient revealed 
that he had been engaging in extramarital 
intercourse, which he thought might be hav- 
ing a deleterious effect on his marriage. The 
warts were restricting this extramarital activ- 

“ity, thereby bringing the patient closer to 
his wife. During hypnosis, it was suggested 
that this type of problem solving was in- 
efficient. Within 3 weeks of this single session, 
all the warts disappeared. In a second similar 
case, a male patient revealed, while hypno- 
tized, that his warts kept him from engaging 
in extramarital relationships. He was told 
E 
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that his behavior could be consciously con- 
trolled and that the warts were unnecessary. 
Two-and-one-half months later, the warts 
were two thirds gone, with complete remis- 
sion occurring 1 month later. In a third case, 
suggestions without hypnosis, were given to 
a male homosexual with a rosette of perineal 
warts. It was suggested to the patient that 
he feared anal rape, and that he should for- 
get about protecting his anus with this ring 
of warts. One month later, the author reported 
50% remission, and after one additional ses- 
sion with hypnosis, complete remission was 
claimed. In each of these cases, length of 
time between initial treatment and eventual 
cure ranged from 2 to 34 months. Spontane- 
ous remission of symptoms might have oc- 
curred during that time. 

French (1973) described the successful 
treatment of a woman who suffered from 
venereal warts. While hypnotized, the pa- 
tient recalled that the warts first appeared 
during a gratifying sexual affair. During this 
session the patient was told that she had to 
choose between keeping both the affair and 
the warts or relinquishing the affair and 
thereby. the warts. The woman chose to give 
up the affair and the warts. After 3 weeks it 
was reported that she was greatly improved, 
with complete recovery noted after 5 weeks. 

Frankel and Misch (1973) reported on a 
case of long-standing psoriasis in which hyp- 
nosis was used to help a patient gain insight. 
After several sessions that used hypnotic 
suggestion to influence symptoms directly, 
the patient was requested to discuss his re- 
sistance to relinquishing his psoriasis. He 
stated that he realized that maintenance of 
the symptom allowed him the comfort of 
avoiding threatening social interactions. After 
8 weeks, the patient had improved consider- 
ably. The authors indicated that in addition 
to the self-report and observations of improve- 
ment, there was also а 2.1 °F. (1.17 °С) in- 
crease in skin temperature from the begin- 
ning of the trance to its completion. It was 
impossible to determine, however, whether 
this change was a function of hypnosis, sug- 
gestion alone, emotional changes, or a by- 
product of a spontaneous remission of the 
condition. 
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Altered perception. This final pair of in- 
vestigations employed hypnosis with skin dis- 
orders to alter the subjective perception and/ 
or reduce the discomfort created by the symp- 
toms. 

Ament and Milgrom (1967) found hypnotic 
suggestion to be successful in the treatment of 
а case of pruritus. After the first session, 
decreased pruritus was reported with addi- 
tional improvement noted after each subse- 
quent session. After five sessions, however, a 
previously suspected diagnosis of myelocytic 
leukemia was made. For the following 2 
months, the patient's skin remained generally 
in good condition. It cannot be known with 
certainty, however, what psychological effect 
the development of leukemia may have had 
on the patient and consequently on the 
pruritus condition. 

In a second series of cases presented by 
Klinge (1971), three dermatitis patients were 
successfully treated with hypnotic sugges- 
tions and subsequently perceived their symp- 
toms as less severe. Unfortunately, this treat- 
ment effect was confounded by simultaneous 
treatment with vitamin A, hydrocortisone, 
and bandaging. 

It appears that well-controlled studies of 
skin disorders have not substantiated the case 
study findings of physical change associated 
with the hypnotic treatment. Evidence for 
autonomic changes, which are often assumed 
to mediate external skin changes, was typi- 
cally absent, or the autonomic changes were 
not measured. Although inconclusive because 
of the lack of control for Spontaneous remis- 
Sion, there were several interesting instances 
of symptom improvement as a result of the 
use of hypnosis to help patients gain insight 
into the maintenance of their symptoms. Some 
success, although limited to a small number 
Of cases, was also reported as a result of 
attempts to alter perceptions and reduce the 
discomfort created by these symptoms. 


Headaches 


A second disorder that is commonly treated 
with hypnosis and hypnotic suggestion is 
headache. Migraine and tension headaches are 
most commonly treated. Migraine headaches 
are thought to be initiated by constriction of 
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the blood vessels to the head and brain, whi 
is followed by overcompensation, causing the 
vessels to become enlarged. This expansion iy 
credited with causing the unilateral pain, 
dizziness, nausea, and blurred vision associated? 
with migraine (Wolff, 1963). 
treatment consists of two interventions, one 
aimed at inhibiting the initial constriction of 
the vessels, the other aimed at reducing the 
dilation of blood vessels after onset of the 
headache (Graham, 1975). Tension head- 
aches, on the other hand, are a result of pro- y 
longed muscular contractions (usually associ- 
ated with stress) about the face, scalp, and 
neck. Behavioral treatment generally consists H 
of deep muscle relaxation training or biofeed- 
back. 

Physiological changes. Successful treat- 
ment of individual cases of migraine headache 
has been reported by Ansel (1977) and 
Graham (1975) and of tension headaches by. 
Todd and Kelly. (1970). Treatment of 25 
cases of intractable migraine was reported by 
Harding (1961). All of the patients’ symp- — 
toms had become progressively worse despite 
previous drug treatments. Hypnosis was recs 
ommended as a last resort. Treatment con- 
sisted of four sessions that included history. 
taking, explanation of migraine, and hypnotic | 
Suggestions that the patients picture the 
blood vessels in their heads growing smaller ~ 
and returning to normal. Improvement was | 
assessed in two ways: through self-report and 
through the hypnotist’s impressions. Five had 
complete remission, 10 showed substantial! 
improvement, and 5 experienced some relief, 
Five patients showed no improvement at all. 
Three of the 5 failures were reported not to 
be hypnotizable, This study is noteworthy 
for the effort made to systematically treat a 
number of patients with a standard approach. 
However, no report of the successful pa- 
tients’ hypnotizability was made, thus pre- 
venting an assessment of the relationship be- Еј 
tween hypnotizability and treatment. success. 

Cedercreutz, Lahteenmaki, and Tulikoura 
(1976) treated 155 consecutive skull-injured 
patients who had been experiencing headache | 
and vertigo for 1 week or longer after their 
injury. Treatment consisted of between 1 and 
10 weekly sessions, During hypnosis, thera- | 
peutic suggestions were repeated several times. 
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Trance capability was assessed by the investi- 
gators on a 4-point rating scale. The authors 
"weported that although 58% of the 120 head- 
ache patients who were rated as achieving a 
light trance or above were symptom Íree 22 
months later, none of the nonhypnotizable 
patients were improved. It was further pointed 
out that a strong relationship between the 
duration of the symptoms and the likelihood 
of therapeutic benefit existed. The likelihood 
of obtaining relief was markedly greater if 
x) therapy began shortly after the original 
trauma. The conclusion of Cedercreutz and 
his collaborators, that hypnosis is an appro- 
Wpriate treatment procedure for those patients 
with posttraumatic skull injuries who com- 
plain of headache and vertigo, does not nec- 
essarily follow, however. Earlier investigators, 

including Cedercreutz and Kampman (1972), 

| found that in 3- to 4-year follow-ups on skull- 
injured untreated patients, 3096-3496 still 
K* suffered from headaches (conversely 66%- 
70% spontaneously recovered). The present 
study, although it reported that 50% of the 
А headache patients completely recovered and 

20% partially recovered, did not take into 

account the base rate of spontaneous remis- 

sion. Similarly, their finding that successful 

treatment was negatively correlated with the 

length of time between injury and the onset 

of treatment may be misleading. The symp- 

X, toms of those patients who were treated long 

after their injury were evidently persistent 

and not likely to disappear over time, 

"whereas patients who were seen immediately 

after an accident would be more likely candi- 
dates for spontaneous remission. 

In a controlled study conducted by An- 
dreychuk and Skriver (1975), biofeedback 
was compared to hypnosis as a treatment for 
migraine. In addition, the role of suggestibil- 
ity in treatment was examined, Thirty-three 
established migraine sufferers were randomly 

«placed in one of three groups: hypnosis 
treatment, hand warming training, or alpha 
enhancement training. All groups showed sig- 
nificant improvement from the baseline to the 
completion of 5 weeks of treatment. No 
treatment was found to be superior. Highly 
susceptible patients responded better than 

n less susceptible ones, regardless of the type 
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of treatment received. The initial goals of 
the biofeedback and hypnotic treatments dif- 
fered, however. Biofeedback had as its inter- 
mediate goal the increase of blood flow and 
temperature in the hand, whereas the hypnosis 
treatment was geared exclusively to influence 
the headaches. It would have been valuable 
to set as an intermediate goal for both 
groups an increase in hand temperature and 
blood volume and to assess these goals prior 
to evaluating the effectiveness of each treat- 
ment in terms of symptom removal. 

Anderson, Basker, and Dalton (1975) com- 
pared the effects of hypnotherapy and auto- 
hypnosis with a drug treatment program of 
Stemetil. The authors reported a difference 
in both number and intensity of attacks 
between the 6-month baseline and the first 6 
months of hypnotic treatment. Stemetil 
treatment did not result in significant change. 
There was, however, no control for the added 
personal contact between physician and pa- 
tient in the hypnosis treatment group, which 
could have accounted for the greater im- 
provement obtained. 

Insight. Blumenthal (1963) reported the 
successful treatment of two headache pa- 
tients. The first patient had suffered with 
persistent pain over the left side of her neck 
and occiput. Through hypnosis, the patient 
recalled that her stepfather, who on several 
occasions had attempted to seduce her, had 
commonly placed his hand on her face and 
neck to comfort her and show her affection. 
In the second case, a Catholic father of five 
suffered from migraine headaches and com- 
plete sexual impotence. While hypnotized, he 
revealed that he was concerned about making 
his wife pregnant. Blumenthal hypothesized 
that the headaches served as a subconscious 
"displacement for his sexual feelings of erec- 
tion and engorgement into the full engorged 
throbbing vascular headache" (p. 201). An 
exact report of the therapeutic procedures 
was not presented. 

Altered perceptions. Kroger (1963) told 
headache patients, while they were hypno- 
tized, to imagine their hand in ice water. They 
were then told that their hand was numb and 
anesthetized and that they could eliminate 
the pain in their head by placing this hand 
next to their head and face. Posthypnotic 
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suggestions that future headaches could be 
reduced in the same fashion were given. 
Kroger claimed that this procedure was suc- 
cessful but presented no data to corroborate 
the claim, 

The case studies on headache fail to meet 
many of the previously mentioned methodo- 
logical requirements. The treatment groups 
and controlled studies, in spite of the meth- 
odological problems mentioned, seemed to 
indicate that hypnosis may be a useful treat- 
ment adjunct for headaches. Again, as was 
found in the treatment of skin disorders, 
there were several interesting reports of using 
insight and altered perceptions to alleviate 
symptoms, but these too were methodologi- 
cally unsophisticated. 


Asthma 


The third major psychosomatic disorder 
that was treated with hypnosis is asthma. 
Although there is no consensus as to the 
precise etiological factors involved in asthma, 
“most authors agree that there is some inter- 
action among psychological, infective, and 
allergic factors. Emotional factors аге 
thought to be aggravating or triggering in- 
fluences that induce the asthma attack in 
some patients (Peshkin, 1967). It is believed 
that anxiety that is associated with an antici- 
pation of the symptoms themselves often 
results in an exacerbation of the symptoms 
(Hanley, 1974), Symptoms remain consis- 
tent even though causes may vary. During an 
attack, the air passages, including the trachea, 
major bronchi, and peripheral bronchioles, 
constrict. Bronchiospasms and increased se- 
cretion of mucus are also present. These 
physiological changes are responsible for the 
Wheezing and difficulties in breathing seen in 
most asthmatic attacks, 

Physiological changes. Yn an investiga- 
tion by Dennis (1965), five good hypnotic 
subjects who had chronic asthmatic symptoms 
were exposed repeatedly to a true allergen 
and a placebo. Each exposure was given with 
suggestions for allergic reactions and no 
allergic reactions. This was done while the 

subject was either hypnotized or in the wak- 
ing state. Patients were given a suggestion to 
experience a feeling of coolness. This pre- 
sumably would induce vasoconstriction and 
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thereby reduce asthmatic symptoms. Reac- 
tions to placebo could not be evoked in either 
the hypnotic or nonhypnotic conditions. How, 
ever, allergic reaction to true allergen was 
successfully inhibited in the hypnotic condi- 
tion. No attempt at monitoring the physio- 
logical mechanisms (ie. vasoconstriction), 
which were assumed to mediate change in 
symptoms, was reported. 

Weiss, Martin, and Riley (1970) found 
that suggestions of an asthma attack alone 
failed to produce allergic reactions in all but 
l of 16 asthmatic children, when they were 
presented with a saline solution. The ! sub- 
ject who reacted was thought to be allergic to 


the saline solution that was used in the 
bronchial challenge test. 
Smith and Burns (1960) treated 25 


chronic asthmatic children, ages 8-15. Hyp- 
nosis was achieved in all patients, and sug- 
gestions were given for symptomatic relief 
from asthma. Only objective measures of 
respiration were obtained, such as vital ca- 
pacity and forced expiratory capacity. After 
four weekly sessions, no improvement was 
noted. These negative findings may be at- Ù 
tributed to the unusually brief treatment and 
the sole use of physiological assessment. 

Mun (1969) provided 10 asthmatic chil- 
dren with a series of treatments including one 
based on hypnotic Suggestion, Each child was 
trained in hypnosis and was easily capable of 
entering into a trance, Each child, on the 
onset of an attack, was requested to report to 
the investigator for treatment. After the 
child reported, peak flow rate (a laboratory 
technique for assessing lung functioning) was 
measured, followed by one of four treatments. 
After 15 minutes of treatment, peak flow as 
well as self-report measures were obtained. 
Treatments consisted of Tedral medication 
after the first attack, Isoprenalinenebulizer 
during the second, hypnosis and hypnotic 
Suggestions for symptom removal during the . 
third attack, and Tedral medication and hyp- 
Notic suggestions during the fourth attack. 
Improvement was reported across all treat- 
ments, with hypnotic suggestion treatment 
showing the greatest effects, followed by the 
combined treatment of Tedral and hypnotic 
Suggestion. No statistical analysis for any of 
the treatment comparisons was reported and 
treatment effects were obviously confounded.. Џ 
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Aronoff, Aronoff, and Peck (1975) re- 

ported on 17 asthmatic children who chose 
‘hypnotic treatment over traditional medical 
treatment, Hypnotic suggestions for symptom 
reduction were given during an actual asth- 
matic attack. Improvement was assessed in 
terms of physiological change and self-report. 
However, positive expectations for treatment 
and an expected decline in symptoms follow- 
ing the peak of an attack confounded the 
effects of hypnotic suggestion. 

Philipp, Wilde, and Day (1972) looked at 

the effects of nonhypnotic suggestion on two 
types of asthma patients, those who reacted 


N to skin testing (allergic) and those who did 


not (emotional). Both groups were subjected 
to two treatments, suggestions of a reaction 
when exposed to a neutral drug and sug- 
gestions of no reaction when an active drug 
was presented, Outcome measures included 
assessment of vital capacity and forced ex- 


* piratory volume. It was found that emotionals 


reacted more to suggestions for symptoms on 
exposure to a neutral drug than did allergics. 


_ Another reported finding was that relaxation 


4 


Д 


- 
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training across both groups tended to improve 
respiratory efficiency. 

In a well-controlled study conducted by 
Fry, Mason, and Pearson (1964), the effects 
of hypnotic suggestion on allergic reactions to 
known allergens was assessed. Forty-seven 
asthmatic patients, who had displayed posi- 
tive skin reactions to extracts of pollen or 

¿house dust and who were hypnotically sus- 
ceptible, served as subjects. In the first part 
of the investigation, 18 subjects were ran- 
domly placed in either a control group, in 
which two skin tests were conducted, or in an 
hypnosis group, in which hypnotic suggestions 
for no skin reactions were given. Each sub- 
ject was administered four solutions of vary- 
ing strengths of an allergen. Results indicated 
that controls showed mixed reaction in terms 

“of size of wheals after an exposure to aller- 
gens, whereas hypnosis subjects consistently 
showed a decrease in wheal size. In the sec- 
ond part of the investigation, 29 subjects 
were randomly divided among three groups: 
One received hypnotic suggestions that the 
right arm would not react to skin tests, an- 
other received suggestions that neither arm 

would react to testing, and the third received 
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hypnosis with no suggestions. All groups 
showed marked decreases in wheal size. No 
differences were noted between the treatment 
groups, with hypnosis alone yielding as great 
a change as when it had been combined with 
suggestions, 

Insight. Surprisingly, none of the asthma 
studies that were reviewed used hypnosis as 
a means of assisting patients in gaining in- 
sight into the maintenance of their symptoms. 

Altered perceptions. The goal of the fol- 
lowing investigations was to change the pa- 
tients’ perception of their symptoms so as to 
reduce the stress related to them and thereby 
reduce the intensity of the symptoms them- 
selves. 

Hanley (1974) described two cases in which 
hypnotherapy was used as a treatment for 
asthma. Therapy consisted of hypnotic sug- 
gestions to increase the individual's self- 
confidence, especially in controlling the asth- 
matic symptoms. Both patients reported some 
improvement in symptoms and in day-to-day 
living. It is not known, however, what effect 
the physician’s supportive statements may 
have had without hypnosis. 

Smith (1970) found, in two female sub- 
jects, an increase in pulmonary resistance fol- 
lowing direct hypnotic suggestions of cough- 
ing, fear, anger, or of an imminent asthmatic 
attack and a decrease in resistance following 
suggestion of relaxation. Recall of previous 
asthma attacks could have been sufficient to 
create the change in pulmonary resistance. 

Edwards (1960) used hypnotic suggestions 
for symptom reduction and a decrease in the 
psychological effects caused by an attack in 
six chronic asthma patients. Three assess- 
ment measures were employed: the patient’s 
own testimony, stethoscope monitoring (phy- 
sician’s impression), and ventilatory function 
tests (vital capacity and forced expiratory 
volume). During and after treatment, pa- 
tient reports revealed that their conditions 
improved. Physiological changes were not 
found, however. Although the author dis- 
cussed the difficulty in making claims about 
treatment from a small uncontrolled study 
such as the one that he presented, this was 
not the most serious methodological flaw. 
Each of the cases were treated at the peak of 
an acute asthmatic attack. A regression 
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toward the mean would be expected on the 
assessment following the crisis, thus casting 
doubt on any claims of self-rated improve- 
ment due to treatment. 

In a second investigation by Mun (1969), 
children were hypnotically age regressed to 
the time of their first attack and were told 
that the cause of the first attack was no 
longer operative and that they should no 
longer have fear and tension about the con- 
stantly recurring need to breathe. Mun re- 
ported improvement in selí-ratings and ob- 
jective measures, but, again, he presented no 
data to support these claims. 

Moorefield (1971) described the successful 
treatment of nine patients with chronic 
asthma. Hypnotic procedures included sug- 
gestions for reduced tension and anxiety, for 
the ability to breathe easier, and for increased 
confidence during stress. Patients were seen 
for an average of 13.4 I-hour sessions. Com- 
plete recovery in all but one patient was 
reported. The effects of the hypnotic sug- 
gestions were confounded, however, since the 
patients were simultaneously treated with 
systematic desensitization, 

The data on 121 asthmatic patients who 
were treated with hypnotherapy were ana- 
lyzed retrospectively by Collison (1975). 
Treatment consisted of hypnotic suggestions 
that were intended to improve the patients’ 
ability to cope with their environment and to 
assist them in obtaining both physical and 
mental relaxation. Posthypnotic suggestions 
for continued relaxation as well as training in 
autohypnosis were provided at each session. 
Patients were also classified according to the 
depth of trance typically achieved. Results 
indicated that 21% of the patients remained 
completely free of asthma attacks through 
the follow-up. Thirty-three percent showed 
improvement but continued to have mild at- 
tacks. Twenty-two percent improved, but 
symptoms were still moderately severe. Fi- 

nally, 24% showed no changes from pre- to 
posttreatment. Patients who were rated as 
good hypnotic subjects received the most 
therapeutic benefit, whereas none of the pa- 
tients who were poor hypnotic subjects be- 
came symptom free. Collison acknowledged 
that a retrospective analysis only allows for 
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limited inferences. In addition, all patients 
had volunteered for hypnotherapy, which sug- 
gests that they may have had preexisting posi: 
tive expectations for hypnosis. The care that 
Collison took in compiling his data was, 
nevertheless, commendable. 

White (1961) hypnotically treated 10 
asthmatic patients and obtained mixed re- 
sults. Sessions consisted of hypnotic sug- 
gestions for easier breathing, lessening of ten- 
sion and bronchiospasms, and an increase in 
self-confidence. Drug consumption and respira- | 
tory function as well as self-report were all 
assessed. Patients were rated in terms of 
depth of hypnosis achieved. Results varied? 
depending on the particular measure used. 
Physiological measures indicated that im- 
provement followed after 24% of the treat- 
ment sessions, whereas self-report of improve- 
ment followed 62% of the trials. Subject im- 
provement was independent of the depth of 
hypnosis achieved, 

Maher-Laughan (1970) reported three in- 
vestigations that included ^ autohypnosis 
training and hypnotic suggestions of a re-. 
lease in tension, an increase in selí-confi- 
dence, and a belief in the possibility of 
recovery, In the first study, 55 patients were 
randomly placed in either a relaxation and 
breathing exercise control group or in a hyp- 
nosis and autohypnosis training treatment 
group. Change was assessed in terms of a 
computed wheezing score and in the number 
of times bronchodilators were used. Greater, 
improvement was noted in the hypnosis treat- 
ment group. The second study included а 
larger number of patients who were treated 
by a greater number of centers and added a 
physiological assessment of respiratory func- 
tion. Whereas males reported similar improve- 
ment in both the hypnosis group and the 
control group, females reported less improve- 
ment in the control group. The hypnosis 
group showed greater improvement on ће“ 
physiological assessments of forced expiratory 
volume. Self-report and physiological indices 
Were independent of the type of asthma 
treated (psychological, infectious, or allergic). 
The third study was similar to the first two 
except that patients were preselected for 
treatment, and a control group was not em- 
ployed. In this investigation, 82% of the 
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patients improved. Duration of treatment was 

t 6 months, 1 year, and 6 years, respectively, 

“in the three studies, with few or no treat- 

ment effects recognized earlier than 1 month 
into treatment. 

The greatest number of controlled studies 
on the largest number of subjects has been 
carried out with asthmatic patients. The 
methodological sophistication of these studies 
far exceeds the studies of skin disorders or 
headaches. Effectiveness of hypnotic treatment 
appears to be a function of which outcome 
measures are employed. When self-report is 
used, most investigators report an improve- 

N ment in the patient's symptoms. However, 
physiological measures show more equivocal 
results. 


Discussion and Conclusions 


Of the 38 studies reviewed, 31(81.6%) re- 
ported overall positive results, 5(13.1%) re- 
ported negative outcomes, and 2(5.3%) re- 
ported mixed results. Eighteen (47.49%) were 
single- or multiple-case studies, 13(34.2%) 
used a single treatment group, and 7(18.4%) 
instituted necessary control conditions. All 
of the single- and multiple-case studies re- 
sulted in a positive outcome. Ten single 
treatment group studies yielded a positive 
outcome, 1 yielded a negative result, and 2 
yielded a mixed outcome. Three of the 7 
controlled studies resulted in a negative out- 
come. Apparently when methodological short- 
comings were remediated, the effects of hyp- 
nosis were reduced. 

There were 1,189 subjects/patients in the 
37 studies that reported sample size. (Kro- 
ger, 1963, did not report the number of pa- 
tients treated.) Single- and multiple-case 
studies involved only 35 patients (2.9% of 
the total) even though case studies consti- 
tuted almost half of the studies reviewed. 
м Studies obtaining positive results involved 

1,037(87.2%) subjects/patients. Only 152 

(12.8%) subjects/patients were involved in 

studies that yielded negative or mixed results. 

A comparison of the three disorders shows 
that 67.3% of the patients were asthmatic, 

23.4% were headache patients, and 9.2% 

had skin disorders. Inspection of Table 1 

reveals that hypnosis was most successful for 
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asthmatics; all but two studies in Table 1 
used at least a treatment group. Results were 
positive in 11, negative in 2, and mixed in 2 
studies. For headaches, 5 out of 9 were case 
studies, and 1 of the only 2 controlled studies 
yielded negative results. Only 3 out of the 
14 studies of skin disorders used at least a 
treatment group. The 2 controlled studies 
yielded negative results. 

Changes in autonomic nervous system func- 
tioning (ie. blood flow, skin temperature, 
forced air capacity) are thought to serve as 
mediators of symptomatic changes. Twenty- 
three (60.5%) studies used hypnosis in an 
attempt to directly affect physiological change 
(i.e., the blood flow to the skin will decrease). 
In only 8 of these 23 studies was there an 
attempt to assess these changes. Four of the 
8 reported negative results. 

Four (10.5%) of the studies used hyp- 
nosis to help the patient gain insight. All re- 
ported success, but this was based on self- 
reports or physicians’ impressions, and no 
objective measure of success was reported. 

Of 10(26.3%) studies, in which hypnosis 
was used to alter patients’ perceptions of 
their disorders, 8 reported successful out- 
comes, and 2 reported mixed results. 

Physiological or objective indices were used 
in only two studies of skin disorders, with 
negative results. Understandably physiologi- 
cal or objective measures were not used in 
the headache studies because headache pain 
is almost exclusively a subjective experience. 
Vasoconstriction and muscular tension have 
been found to correlate with headache pain 
and could haye been used. In the asthma 
studies, 11 used a physiological or objective 
measure. Positive outcomes were reported in 
7, and negative results were reported in 4 
studies. A much higher rate of success was 
found for the 32 studies using self-report as 
a measure. All outcomes but 1 were positive. 

It appears the attempts to alleviate symp- 
toms by suggesting direct and specific physio- 
logical changes through hypnosis has met 
with some success, as measured by self-re- 
port, but results that used objective or 
autonomic measures have been equivocal. 

None of the investigations of headache at- 
tempted to assess muscular or vascular 
changes. The studies of skin disorders as- 
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sessed the physical changes in the skin. 
However, mediating physiological changes 
such as blood flow and skin temperature were 
generally not monitored. The change in 
symptoms (disappearance of warts, changes in 
pruritus, etc.) cannot be taken as evidence 
for direct physiological change due to hyp- 
nosis. Controls were not sufficiently applied to 
show a cause and effect relationship between 
the applications of the hypnotic procedures 
and the alleviation of symptoms. Evidence 
from the asthma studies supports the conten- 
tion that hypnosis does not produce physio- 
logical changes as consistently as it produces 
self-reported and observed changes. This is 
in agreement with arguments presented by 
Barber (1974) who contends that "the data 
available at present do not support the notion 
that hypnotic trance is a critical factor in 
producing such physiological effects" (p. 70). 

There seems to be some anecdotal evidence 
that alleviation of symptoms may be aided 
by using hypnosis to help patients to gain 
insight into the development and maintenance 
of their symptoms. A difficulty with these 
investigations is a lack of an established cause 
and effect relationship between insight and 
symptom remission. The possibility of spon- 
taneous remission or placebo effects was not 
controlled for in these case studies, 

Evidence exists for the claim that when 
hypnosis is used to alter the perceptions of 
Psychosomatic symptoms, headache and 
asthma patients report symptom improve- 
ment. This appears to be especially useful 
either in disorders in which the perception of 
the symptoms themselves is the primary con- 
cern (ie., tension headaches) or when the 
disorder is being treated by an additional 
treatment regime and symptomatic relief is 
desirable. 

Future researchers, in addition to stating 
the ultimate goal of the hypnotic intervention, 
also should clearly specify the mediating 


physiological mechanisms for the expected ` 


change. The precise role of these mechanisms 
could then be systematically assessed. 
Controls should be instituted in this re- 
search for suggestions alone, hypnosis alone, 
and physician and patient interaction. In this 
way, the role of these variables could be sepa- 
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rately assessed. In addition, the possibilities 
of spontaneous remission of symptoms and 
placebo effects, which seem to occur with 
great frequency in psychosomatic disorders, 
should be controlled. 

Researchers should assess hypnotic suscepti- 
bility using a well-established scale. Although 
some researchers have argued for the place- 
ment of subjects into groups based on their 
susceptibility, this procedure would limit the 
extent to which a cause-effect relationship 
could be inferred between treatment and out- 
come. A more useful procedure would be to 
randomly place subjects, regardless of their 
susceptibility scores, into treatment groups 
while simultaneously monitoring the effects 
of susceptibility on outcome. 

In spite of the many methodological prob- 
lems, there was some indication of the useful- 
ness of hypnosis as a treatment adjunct for 
psychosomatic disorders. This is particularly 
true for asthma, the disorder on which the 
best research has been done and for which 
the most promising results obtained. 
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Infantile autism is a severe form of psychopathology characterized by profound 
behavioral deficits. This article reviews a series of investigations which suggest 
that autistic children show "stimulus overselectivity,” a response to only a 
limited number of cues in their environment, and discusses how such overselec- 
tivity may relate to several of the behavioral deficits in autism. These include 
failure to develop normal language or social behavior, failure to generalize 
newly acquired behavior to new stimulus situations, failure to learn from tradi- 
tional teaching techniques that use prompts, and a general difficulty in learning 
new behaviors. This discussion is followed by the presentation of several studies 
that suggest possible remedial procedures. Finally, the concept of stimulus over- 
selectivity is related to the literature on other theories of attentional or re- 
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sponse deficits in adult schizophrenia, mental retardation, learning disabilities, 


and autism. 


Infantile autism, first described by Kanner 
(1943), is a severe form of psychopathology 
in children that is characterized by extreme 
Social and emotional detachment. Such chil- 
dren typically do not seek or readily accept 
affection and do not play with peers. They 
engage in great amounts of stereotyped, ritu- 
alistic, and repetitive motor behaviors and 
are generally unresponsive to their physical 
environment. They are inconsistent in their 
Tesponse to sensory input, they typically do 
not show a startle reflex, and their parents 
have suspected them to be blind or deaf. 
Language development is either absent or 
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abnormal, and those autistic children who do 
speak usually parrot meaninglessly what they 
hear (echolalia). Another characteristic feature 
of autism is the child's insistence on order 
and sameness in his or her environment. 
When one considers the behavioral impover- 
ishment of these children, it is understandable 
that autism is also characterized by a poor 
prognosis. 

In recent years, numerous studies have ap- 
peared that somewhat alter the prediction of 
а poor prognosis. For example, autistic chil- 
dren who are treated within a learning theory 
framework (with “behavior modification”) 
have shown measurable improvement in 
Speech and language (e.g, Hewett, 1965; 
Lovaas, 1966, 1977; Risley & Wolf, 1967), 
generalized imitation (Lovaas, Freitas, Nelson, 
& Whelan, 1967; Metz, 1965), and appro- 
Priate play (Koegel, Firestone, Kramme, & 
Dunlap, 1974), as well as reduction of in- 
appropriate behaviors (e.g, Carr, Newsom, 
& Binkoff, 1976). A comprehensive summary 
of these and other studies has been provided 
by Lovaas and Newsom (1976), However, in 
Spite of considerable successes, there are still 
a number of problems associated with this 
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form of intervention. Two important problems 
are that the treatment effects may be situ- 
tion specific and reversible, and the child's 
progress in treatment is slow (Lovaas, Koegel, 
Simmons, & Long, 1973). 

Problems in situation specificity of the 
treatment change and in reversibility of treat- 
ment effects can be attenuated somewhat by 
extending the treatment across time and 
situations, by, for example, using parents and 
teachers as therapists (e.g. Schreibman & 
Koegel, 1975). However, there bas been no 
immediate solution to the slow rate of im- 
provement that these children show in treat- 
Ament. It may be most helpful, therefore, to 

‘discuss certain mechanisms that may be basic 

to this problem of attenuated change, with 

a view toward the development of faster and 

more general changes. Towards this end, this 

article will review a set of studies that provide 

substantial evidence which suggests that au- 

tistic children haye a problem in responding 

to multiple cues, and that this problem may 

be at least part of the cause of their slow 
= improvement in treatment. 

Situations in which several cues impinge 
simultaneously on children are typical of those 
they encounter in their everyday teaching 
environments. Operationally, our data show 
that when autistic children are presented with 
multiple stimulus inputs, their behavior comes 
under the control of a range of input that is 
too restricted. This problem was referred to 
as stimulus overselectivity" (Lovaas, Schreib- 
^"man, Koegel, & Rehm, 1971) because the 
children overselected a limited set of stimuli 
from those available in their environment. 
Note that the term stimulus overselectivity does 
not imply that the children scan their environ- 
ment and select for relevant cues. Rather, 
the data suggest that the children respond 
to only part of a relevant cue, or even to a 
minor, often irrelevant feature of the environ- 
44,ment, without learning about the other rele- 

vant portions of that environment. 

Our approach to this analysis in autistic 
children owes much to the conceptual and 
methodological advances of basic operant 
research, as first reviewed by Terrace (1966) 
and later extended by Ray (1972), Ray and 
Sidman (1970), Sidman and Stoddard (1966), 

; and Touchette (1968, 1971). Basically, the 
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studies we report adhere to a discrimination 
learning paradigm, permitting us to simply 
relate aspects of the manipulated stimulus 
input directly to the child's behavior. Ex- 
cellent reviews of this type of research have 
also been provided by Fellows (1968), Suther- 
land and MacKintosh (1971), and Trabasso 
and Bower (1968). 

This article first reviews a set of studies 
demonstrating stimulus overselectivity, and 
then presents data that show how such 
overselectivity may interfere with the autistic 
child's learning and pose problems for the 
generalization of learned material. This will 
be followed by suggestions on how this prob- 
lem might be remedied. Finally, the concept of 
stimulus overselectivity will be related to 
other theories of attentional deficits in autism 
and related disorders. 


Studies Demonstrating Stimulus 
Overselectivity 


Overselective Response to Multiple Cues 


In the Lovaas et al. (1971) study, autistic, 
retarded, and normal children were taught to 
respond to a complex stimulus display (SP) 
containing three elements: (a) a moderately 
bright visual stimulus (consisting of a 160-W 
red floodlight), (b) an auditory stimulus con- 
sisting of white noise at a moderately high 
(65-dB level) intensity, and (c) a tactile 
stimulus on the child’s leg delivered by a 
pressure cuff at 20 mm of mercury. These 
stimuli appeared noticeable to the children, 
since they often oriented to them (e.g., turned 
around to look at the light, touched their legs 
when the cuff was inflated). This complex SP 
was presented to the children, and they were 
reinforced for responding (bar pressing) in the 
presence of the display and not reinforced for 
responding in its absence. After training had 
established this stimulus display as functional 
for the children’s response, single-cue test 
trials were presented in which each component 
(auditory, visual, tactile) was presented sepa- 
rately for a total of 70 presentations over 10 
test sessions. 

The results showed that the normal children 
responded to each of the components equally. 
In other words, each of the separate cues 
became equally functional in controlling the 
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child's behavior. The performance of the 
autistic children was different. Each child 
responded primarily to only one of the com- 
ponent cues. (The retarded children responded 
at a level between these two extremes.) Three 
of the autistic children responded primarily 
to the auditory component, whereas two of 
the children responded primarily to the visual 
cue. None of the autistic children responded 
to the tactile stimulus. It was striking to 
observe the autistic children attentively re- 
spond to one of the component cues (e.g., the 
sound), only to remain motionless in the 
presence of the other (e.g., the light) even 
though that stimulus had been presented as 
discriminative for reinforcement. 

Subsequently, two of the autistic children 
were trained to respond to the component 
that had remained least functional for them 
during test sessions. Both children quickly 
learned to respond to the previously non- 
functional component, when thal component 
was presented alone. This helped to ensure 
that the problem was not one of some rela- 
tively "simple" sensory deficit (as in the case 
of being blind or deaf) but was rather a prob- 
lem in responding to the cue in the context 
of other cues. 

It was concluded from this first study that 
the data could best be understood as repre- 
senting the autistic child's difficulty in re- 
sponding to stimuli in context, a problem 
pertaining to the quantity rather than to the 
quality of stimulus control. The data failed 
to support any notion that a particular sense 
modality was impaired in autistic children 
or that any particular sense modality was a 
"preferred" modality. 


Overselective Response to Two Cues 


Since the autistic child may have been 
“flooded” or “overloaded” with stimulation 
in the first study, a second study (Lovaas 
& Schreibman, 1971) was conducted in which 
the stimulus input was simplified: The child 
was presented with only two cues. The two 
cues were the same red floodlight and white 
noise used in the previous study. The ex- 
perimental paradigm was also the same as in 
the previous study. 
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It may be of interest to examine the data 
from this study in some detail, since they 
show certain unpredictable peculiarities. The 
six normal children tested gave no signs of 
stimulus overselectivity; they responded 
equally to the two components when these 
were presented on separate occasions during 
test trials. It was different with the autistic 
children. The data are presented in Figure 1, 
Four of the children showed the stimulus over- 
selectivity most clearly. For each of these 
children only one of the cues was functional 
in controlling their behavior. Two of these 
children (Kevin and Michael) displayed some 
response to both component cues during early 
parts of the testing but eventually lost rather 
than acquired the response to the nondominant 
cue. The fifth and sixth children (Janet and 
John T.) showed stimulus overselectivity in 
the early test trials but gradually began re- 
sponding to the initially weak component cue 
as testing progressed. They may have over- 
come their overselectivity with testing. 
Another child, Jimmy, showed control by 
one of the stimuli at first (the visual опе); 
but as testing progressed, for some reason 
the visual stimulus lost its control, whereas 
the auditory stimulus gradually assumed con- 
trol. Bobby showed only a slight case of 
stimulus overselectivity, if any. Finally, the 
last child, John P., responded equally to both 
components and showed no evidence of stimu- 
lus overselectivity. 

Thus stimulus overselectivity was not ob-. 
served for all the autistic children in this 
study, whereas all of the children gave evi- 
dence of stimulus overselectivity in the pre- 
vious study, which involved three cues. It 
may be that stimulus overselectivity is most 
clearly observed with a relatively larger quan- 
tity of stimulus inputs. 


Discrimination of a Complex Stimulus From 
Its Components 6 

Several questions сап be raised as to why 
the autistic children responded the way they 
did on these tests. One possibility is that 
autistic children have a genuine difficulty in 
responding to the separate components of 4 
complex input. Another possibility is that 
autistic children do respond to the components | 
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but that they are “superefficient,” so that 
they immediately know (discriminate) that 
it is enough to respond to only one com- 
ponent of a complex input to be reinforced. 
They may exert minimum effort for maxi- 
mum payoff. 
Koegel and Schreibman (1977) conducted 
a study to help answer these questions. The 
procedures and apparatus were similar to 
those employed in the first two studies. Sub- 
jects were first trained to respond to the 
Separate presentations of an auditory and 
„еп a visual stimulus. That is, the com- 
"ponents were presented first. At the comple- 
tion of the training phase the child was 
presented with either the separate cues or 
these cues combined in a complex. Thus, 
three types of trials were presented: visual 
only, auditory only, and visual and auditory 
combined. All responses to the complex audi- 
tory-visual input were reinforced, whereas re- 
Sponses to either of the single cues were not 
‘reinforced. 


The most general conclusion to be drawn 


| 
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from the results of this study is that in this 
conditional discrimination task, autistic chil- 
dren experienced great difficulties discrimi- 
nating the complex stimulus from its com- 
ponents. This conclusion is based on the fact 
that the autistic children continued to respond 
to one of the components, for hundreds of 
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Figure 1. Test sessions for the autistic subjects. (Percentages of correct responses to stimuli are plotted 
on the ordinate, and test sessions are plotted on the abscissa.) 


trials, even though they were not reinforced 
for doing so; concurrently, their response to 
the other (also nonreinforced) cue extinguished 
relatively rapidly. In contrast, the normal 
children extinguished on both of the single 
nonreinforced components quickly and simul- 
taneously. Since the autistic children con- 
tinued to respond to one of the single com- 
ponents, even though such responding was 
not reinforced, it does not seem likely that 
stimulus overselectivity shown by autistic 
children is based on some variable such as 
efficient responding. To respond consistently 
to one, and only one, of the nonreinforced 
components as well as to the reinforced 
stimulus complex seems like a fairly com- 
plicated strategy, unless one postulates that 
the complex and one of the components were 
difficult to discriminate from each other. It 
may also be important to note that the 
autistic children did eventually extinguish 
responding to both component cues. Thus, 
it appears that the autistic children were 
attempting to learn the discrimination but 
had difficulty doing so. 

Another major point of this experiment is 
that with a completely different experimental 
paradigm, the same type of result was ob- 
tained as in the other studies. That is, the 
consistency in the results across all of these 
experiments (regardless of the specific ex- 
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perimental paradigm) provides compelling 
evidence that autistic children use the same 
specific (and abnormal) response strategy 
when they are taught with multiple cues. 
This strategy of responding on the basis of 
fewer cues than normal children use appears 
to be a reliable phenomenon and may provide 
& basis for understanding many of the chil- 
dren's abnormal behaviors (see section on 
Implications, below). 


Visual and Auditory Overselectivity 


Initially, we had thought that the autistic 
children had difficulty in attending to multiple 
cues when these were presented across mo- 
dalities. Hintgten and Churchill (1971), for 
example, speculated that the autistic child 
may have an attentional problem based on 
some difficulty with “integrating” information 
along more than one modality. Subsequent 
studies in our laboratory suggest that autistic 
children have difficulty with multiple cues even 
when these cues are presented within the same 
modality, 

Koegel and Wilhelm (1973) trained 15 

autistic children and 15 normal children of 
approximately the same chronological age on 
a visual discrimination task. Their procedure 
followed the relevant redundant cue paradigm. 
The children were trained to discriminate 
between two cards, each card containing two 
visual cues. After the children had mastered 
this discrimination (consistently responding 
to one of the cards), test trials were con. 
ducted in which the visual cues on the cards 
were split so that one of the components of 
the correct card was Presented versus one of 
the components on the incorrect card. The 
autistic children gave evidence of overselective 
responding in this situation by reliably choos- 
ing a card with one of the components of the 
original complex correct stimulus and re- 
sponding at chance level to the other com- 
ponent. The normal children, on the other 
hand, responded predominantly to both the 
components during testing. At an even finer 
level of analysis, Koegel and Schreibman (1977) 
found that low-functioning autistic children 
show visual overselectivity even between the 
components of a single visual stimulus, such 
as selectively responding to the color, form, 
or shape of a triangle. 
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In another study, Reynolds, Newsom, and 
Lovaas (1974) employed a successive dis- 
crimination paradigm to test autistic and 
normal children for overselectivity within the 
auditory modality. Two auditory compounds 
were randomly presented through a speaker 
located above the child's head. The SP com- 
pound consisted of a continuous high tone, 
with periodic relay clicks. The negative stimu- 
lus (S4) compound consisted of a low tone 
with periodic bursts of the sound of a motor. 
After bar pressing consistently occurred during 
the SP compound, test sessions were ad- 
ministered in which the separate SP com- 
ponents were interspersed among presenta-] 
tions of the S^ components. The normal 
children responded reliably to both SP? com- 
ponents of the auditory input. However, the 
autistic children responded to one SP com- 
ponent, giving evidence for stimulus over- 
selectivity within the auditory modality. 

The studies reviewed above are consistent 
with studies from other laboratories which 
suggest that autistic children have particular 
difficulties when they are expected to asso-; 
ciate multiple stimuli. For example, Cowan, 
Hoddinott, and Wright (1965) found that 
only 2 of 12 autistic children were able to 
associate simple shape or color words with 
Corresponding visual stimuli. Bryson (1970) 
noted that autistic children performing match- 
to-sample tasks disregarded extra visual 
stimuli when they made vocal responses. 
Similarly, Frith and Hermelin (1969) found 
that if normal children were blindfolded during 
maze training tasks, they were relatively more 
handicapped than when autistic children were 
blindfolded. Apparently, the autistic children 
were less affected by additional cues than were 
the normal children. 


) 


Stimulus Overselectivity, ТО, and Mental and 
Chronological Age 


Although the preceding studies suggest that | 
autistic children show stimulus overselectivity, 
one must remain skeptical about the role of 
Such overselectivity in the etiology of autistic 
behavior for at least two reasons. First, since 
these studies provide correlational data only; 
stimulus overselectivity could just as well be 
the effect of autistic behavior as its cause 
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Second, in the studies we have described so 
far, a few autistic children showed little or 
о evidence of overselectivity, whereas some 
children who were not autistic did overselect. 
For example, in the Lovaas et al. (1971) study 
discussed previously, retarded (but nonautis- 
tic) children did overselect. 

A relationship between IQ level and stimu- 
lus overselectivity was demonstrated in a 
study by Wilhelm and Lovaas (1976) that 
reported on three groups of children with 
different IQ levels. On a discrimination task 
that could be solved by the child attending 
to either one, two, or all of three component 
* cues, the low IQ (20) group responded on the 

average to 1.6 cues, the higher IQ (40) group 

responded to 2.1 cues, and the normal IQ 
children responded to all three cues. 

A similar finding has also been reported on 
the relationship between chronological age 
and stimulus overselectivity. Schover and 
Newsom (1976) found that the younger the 
children, the more likely they were to show 
overselectivity. The Schover and Newsom 

^. findings are in accord with other studies that 
show a positive correlation between number of 
cues responded to in the visual discrimination 
task and mental age in normal and retarded 
children (Eimas, 1969; Fischer & Zeaman, 

1973; Hale & Morgan, 1973; Olson, 1971). 

It remains to be determined whether this 
general finding that relates mental age to 
overselectivity will hold up with multimodal 

«Stimuli, as in the first studies presented here 

(Lovaas & Schreibman, 1971; Lovaas et al., 

1971). Until studies employing such stimulus 

compounds are carried out, the status of 

overselectivity as an etiological factor in 
autism remains in doubt; as Schover and 

Newsom (1976) and Sivertsen (1976) have 

pointed out, young normals showing visual 

overselectivity seem intuitively to be far ahead 
of their autistic counterparts in linguistic, 
‘intellectual, affective, and social behaviors. 

Nevertheless, autistic or retarded children 

Who remain overselective long after their 

normal peers have moved beyond this mode 

of functioning are clearly at a disadvantage 

Ш acquiring new behavior, and overselectivity 

undoubtedly contributes to the maintenance 

of such children's behavioral retardation (Ross, 
... 1976; Wilhelm & Lovaas, 1976). That is, the 
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more limited the aspects of the environment 
become in controlling the children's behavior, 
the more retarded will be their behavioral 
development. This conclusion will become 
apparent when we discuss the relationship of 
overselectivity to learning situations that re- 
quire shifts in stimulus control and the 
generalization of acquired behaviors across 
environments. 


Implications 


Some of the implications of these findings 
for understanding autistic development seem 
obvious and were pointed out in an earlier 
article (Lovaas et al., 1971). We argued that 
many learning situations in life necessitate 
responding to multiple cues. Speech, for ex- 
ample, is a complex stimulus input for which 
adequate responding necessitates the child's 
attention to a number of stimulus dimensions 
(e.g, voiced vs. voiceless, tense vs. lax, 
volume vs. pitch). If a child responds to only 
one or two of these dimensions, he or she 
will not understand what is said. Reynolds 
et al. (1974) speculated that autistic children's 
deficiency in language may be based on their 
failure to respond to multiple inputs. Simi- 
larly, much of the meaning in language in- 
volves associating multiple inputs, as in 
associating language cues to a variety of 
sensory inputs such as sight and feel. We also 
speculated that stimulus overselectivity may 
underlie autistic children's deficiency in emo- 
tional behavior. Thus we pointed out that 
classical conditioning is considered by many 
to underlie the acquisition of emotional be- 
havior. The latter requires attention to two 
or more contiguous or nearly contiguous 
stimuli (the conditioned and the uncondi- 
tioned stimulus). In such a situation autistic 
children may well overselect, responding to one 
or the other of these stimuli, but not both. 
1f they fail to respond to both, they may fail 
to condition (Maltzman & Raskin, 1965). 

In the following section we try to support 
with data some of these speculations on how 
stimulus overselectivity interferes with autistic 
children's learning and with the generalization 
of that learning to new environments. 
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Observational Learning 


Undoubtedly, many of the complex and 
subtle behaviors that are shown by normal 
children are learned by watching social inter- 
actions of various kinds (Bandura, 1969). 
Children who do not naturally learn in this 
way can be expected to show considerable 
behavioral retardation. Observational learning 
seems to be an area in which autistic children 
are particularly handicapped, and their failure 
to learn through observation may well be basic 
to one's understanding of their behavioral 
deficiencies (Ross, 1976). Typically, observa- 
tional learning necessitates attention to mul- 
tiple cues, in observing both a model's behavior 
and the consequences of that behavior. 
Varni, Lovaas, Koegel, and Everett (1979) 
obtained data which suggest that stimulus 
overselectivity may prevent observational 
learning in autistic children. In this study, 
the autistic children sat at a table across 
from two adults (a model and a teacher). 
On the table in front of them rested two 
objects, each of which was used in the execu- 
tion of an associated response. The investi- 
gators assessed the extent to which the children 
could learn how to behave in this situation 
by merely observing the model handle the 
objects in accordance with the teacher's in- 
structions. The teacher gave the model a 
command (e.g., phone"), and the model then 
behaved accordingly (e.g., picked up the 
handset of the phone) and was rewarded by 
the teacher. A sequence of 20 observation 
trials was followed by one test trial to see if 
the children had acquired the task. During 
the test trials the children took the model's 
seat and were given the same command by 
the teacher. Correct performance on a test 
trial was taken to indicate that observational 
learning had occurred, since the children were 
not directly taught by the teacher to respond 
correctly. The observation trials continued 
until the children responded correctly on a 
test trial, or until they had experienced 1,000 
trials without showing evidence of learning. 
If this first task was mastered, a second 
response that involved a second object was 
presented (e.g., placing a ball in a toy dump 
truck when the teacher said " dump"); and 
if tests showed that the children mastered 
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both responses, they were given 10 additional 
test trials with the two verbal commands 
presented in random order. 1 

Data from this study showed that the 
autistic children usually learned only part of 
the response they observed. For example, 
a child might touch rather than pick up the| 
phone or merely touch or move the dump 


truck. In some cases the children learned the 
complete behavioral topography but did not 
associate it with the teacher's verbal com- 
mand. For example, a child might pick up 


the phone regardless of whether the teacher 
said “phone” or “dump.” In summary, the 


children’s failures could be related to their 
responses to only restricted portions of the 
complex stimulus situation that they had 
observed. 


Prompt Studies 


Another kind of learning in which stimulus 
overselectivity seems particularly handicapping 
is in the area of prompting and prompt fading. 
Prompts are usually extra stimuli added to 
the learning situation to ensure correct re^ 
sponding. In this kind of learning a teacher 
may help the children associate a response to 
а particular stimulus by first prompting the 
right response. Many learning situations in- 
volve such prompts or "guidance" because 
the teacher may not be able to wait for the 
children to give the correct response on their 
own. For example, in teaching children to 
read a new word, the teacher may prompt: 
them by also presenting a picture of the 
referent. If such learning is to be successful, 
such prompting or guidance must eventually 
be removed or “faded” so that the children 
behave on their own. Technically, such learn- 
ing is referred to as the acquisition of stimulus 
control, and involves shifts in stimulus control 
from prompt stimuli (e.g., picture) to training 
stimuli (e.g., written word). 

The stimulus overselectivity hypothesis sug” 
gests that the provision of additional stimuli 
in prompt procedures may prevent the 
children from learning. That is, most prompt 
procedures require the children to respond | 
to multiple cues (the prompt and training 
stimuli occur together) and should create 
Situations in which stimulus overselectivity 
is likely to occur. 
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Koegel and Rincover (1976) provided data 
that show the deleterious effects on learning 
hen extra cues are used as prompts. They 
pretrained autistic and normal subjects to 
respond differentially to two easily discrimi- 
nable colors (red and green). Once this dis- 
crimination was mastered, the colors were used 
as prompts and presented simultaneously with 
more difficult discriminations. (e.g. A low- 
pitched tone was presented concurrently with 
the color red and a high-pitched tone with 
the color green.) The colors were then elimi- 
nated gradually, the auditory stimuli alone 
remaining. The normal children learned the 
new (e.g., tone) discrimination this way, 
whereas the autistic children usually failed 
to transfer from the prompt to the training 
stimuli. Instead they continued to respond to 
the color cues even when they were faded to 
a barely recognizable level. However, the 
autistic children did acquire these new dis- 
criminations when prompts were not used in 
training. The Koegel and Rincover data sug- 
gest that the use of extra cues may make it 
more difficult for the autistic children to learn. 
Schreibman (1975) observed the same prob- 
lem. The autistic children responded selec- 
tively to the prompt as long as it was available 
but reverted to chance performance when the 
prompt was removed. 

Such problems with prompts may happen 
more often when the discrimination involves 
difficult rather than easy discriminations, as 
shown in a study by Russo (Note 1). He 
"'reported that autistic children shift from a 

prompt (finger pointing) to training stimuli 

more readily when the task is easy (e.g., black 

vs. white) than when it is difficult (e.g., 

vertical line vs. a slightly tilted line). 


Generalization Studies 


Restrictions on the number of stimuli that 
‘acquire control over behavior could cause 
Serious problems in stimulus generalization, 
that is, the extent to which a behavior learned 
in one environment transfers to other new 
environments. This relates to the familiar 
Problem of “undergeneralization” of thera- 
peutic gains—the failure of a behavior, ac- 
quired in a therapeutic setting, to transfer to 


а new "outside" environment. 
x 
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One can consider generalization to take 
place to the extent that there are common 
stimulus elements between the teaching situ- 
ation and outside situations. Amount of gen- 
eralization, then, may vary proportionately 
with the number of stimulus elements that 
controlled the behavior initially. The fewer 
the stimuli that had become functional 
in the original situation, the fewer stimulus’ 
elements that would control the behavior 
in the new environment; hence limited 
generalization occurs. This latter problem, 
that of limited generalization, has been clini- 
cally observed in all our work with autistic 
children. It is shown in a study by Rincover 
and Koegel (1975), in which stimulus over- 
selectivity seemed to directly limit general- 
ization. In this experiment, one teacher taught 
autistic children to perform a simple behavior 
on request (e.g. “touch your nose"). Im- 
mediately after each child had learned this 
behavior, a second teacher took the child into 
another environment and made the same 
request. Four of the 10 autistic children did 
not perform the relatively simple behavior in 
the new environment. Extensive single-subject 
analyses showed that those four children had 
failed to generalize the learned behavior be- 
cause they had selectively responded to ir- 
relevant stimuli during the original training 
and had not learned the response on the basis 
of the relevant cue. In one case, for example, 
the child's responding was controlled by inci- 
dental movements of the teacher's hand, and 
not by the relevant verbal cue. (i.e., Without 
the incidental hand movement the child would 
not respond to the verbal cue, whereas the 
child would respond to the hand movement 
either alone or with the verbal cue.) 

Note that this result is different from saying 
that the autistic children responded to too 
many cues (i.e., central plus incidental cues; 
cf. Tarver, Hallahan, Kauffman, & Ball, 1976). 
Rather, the autistic children responded to too 
few cues so that in some instances, they 
responded only on the basis of an incidental 
cue and not on the basis of the central cue; 
thus it appeared as if they failed to generalize 
when they were in environments in which the 
incidental cue was absent. Subsequently, 
when the second teacher simply introduced 
the incidental cue (e.g., raised a hand in a 
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similar way) in the outside setting, the children 
did generalize appropriately. That is, gen- 
eralized responding occurred only after the 
systematic exploration and isolation of the 
controlling stimuli in the treatment environ- 
ment and the introduction of these specific 
stimuli to new (outside) environments. 

Another study, designed to help understand 
‘the autistic child's problem with generalizing 
social stimuli, can also be understood as a 
problem caused by stimulus overselectivity. 
Specifically, Schreibman and Lovaas (1973) 
taught normal and autistic children to dis- 
criminate between lifelike male and female 
figures (dolls). Subsequent tests showed that 
normal children distinguished between the 
figures on the basis of a number of cues, 
including the figures’ heads; but autistic 
children used only some minor and unreliable 
feature, such as the figures’ shoes, and did not 
respond to reliable cues such as the figures’ 
heads. For example, when the investigators 
removed the shoes from the figures after a 
child had learned to tell the figures apart, 
the child suddenly lost the discrimination, 
only to regain it in additional training by 
using an equally unreliable feature. Parents 
of autistic children often report that even 
a minor change in their (the parents’) ap- 
pearance, such as a mother cutting her hair 
or the father removing his glasses, may cause 
an abrupt change in their child's behavior, 
such as treating the parents as strangers or 
responding with major emotional upheaval. 
Perhaps the problem autistic children have 
with stimulus overselectivity contributes to 
their social aloofness in the presence of the 
complex, multidimensional stimuli provided 
by human beings. 


Remedial Treatment 
Within-Stimulus Prompts 


The studies described above may help to 
explain why autistic children fail to develop 
adequately. The main Strength of these studies, 
however, lies in the guidance they provide in 
construction of more therapeutic environments 
for such children. For example, the autistic 
child's problems with situations that involve 
prompts may lead to the design of therapeutic 
learning environments that prevent the dele- 
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terious effect of such prompts. The therapi 
or teacher may avoid situations in which th 
autistic child would overselect a nondistinc 
tive (Le. irrelevant) stimulus element а 
thereby fail to learn the discrimination. 
recent studies investigated the possibility 0 
preventing the development of such un 
intended stimulus-response relationships. The 
employed procedures designed to establis 
control by the distinctive feature of the cor 
rect stimulus at the start of training and fi 
maintain this control while irrelevant feat 
were gradually faded in. 

In the first study, Schreibman (1975) wante 
to teach autistic children the difference b 
tween form stimuli that were difficult to te 
apart. A child's everyday environment abound: 
with difficult form discriminations, such as 
the difference between people who smile amt 
frown, a heater dial being turned on and of 
letters like 6 and d, and so on. Normal de 
velopment seems to require the acquisitiol 
of such discriminations. In one of the dis 
criminations employed by Schreibman, th 
children were required to identify one of tw 
very similar stick figures, shown in Figure 2: 
The figures were identical except that one 
had an arm raised, whereas the other ha 


component of the discrimination was th 
orientation of the arms; all other component 
were redundant. The child was asked to poin 
to the “correct” figure. For the purposes 0 
this experiment, for some of the children th 
figure with the raised arm was designate 
correct ; for others the other figure was cor ect, 
The autistic children failed to learn this dis- 
crimination even though the teacher wen 
through extensive and elaborate promp 

Procedures, such as carefully pointing to the 
Correct stimulus and then gradually removing 
the pointing cues, The autistic children did 
learn to discriminate between increasingly f 3 
“finger points,” but did not transfer their 
response from the teacher's finger to the stich 
figures. Schreibman then used a differe ni 
prompt procedure that involved altering amd 
exaggerating the relevant component of tht 
discrimination (arm orientation) and prompt- 
ing within the same stimulus dimension a 
the training stimulus. Figure 2 illustrates this 
Procedure. The children were initially prega 
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sented with two cards. One card was blank, 
and the other card had a heavy diagonal line 
across its surface, Step 1 of Figure 2 shows 
this discrimination. As the children learned 
this discrimination, size cues were gradually 
faded out (see Figure 2) until the children 
were responding to the line orientation (rele- 
vant cue). Then, the redundant components 
of the discrimination (head, body) were slowly 
faded in. Using this within-stimulus prompt 
fading procedure, the autistic children learned 
the discrimination that they had failed to 
learn with traditional prompting. The main 
difference in procedure was that the within- 
stimulus prompt did not require the children 
' to respond to multiple cues, since the prompt 
was within the training stimulus. 

Rincover (1978) extended Schreibman’s 
work with an analysis of four visual fading 
procedures and their effectiveness in teaching 
autistic children discriminations between three- 
letter words. Two variables were assessed: 
(a) distinctive versus nondistinctive feature 
fading, which signified whether a prompt was 
a feature contained only in the SP or con- 
tained in both the SP and 55, and (b) within- 
versus extrastimulus fading, which signified 
whether the prompt was superimposed on the 
S? during fading or presented spatially separate 
from the SP, The four fading procedures were 
all combinations of these two variables. As 
expected, the combination of distinctive- 
feature and within-stimulus prompting was 
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most effective in teaching the discriminations. 
It was suggested that this success was due 
to the fact that response to the stimulus 
component first encountered in training was 
all that was required of the child; this com- 
ponent was never removed during the fading 
progression and was still available in the 
criterion discrimination. Thus, neither response 
to multiple cues nor a shift in stimulus control 
was necessary, as they were in the other three 
training procedures. 


Multiple-Cue Training 


Techniques such as those suggested by 
Schreibman (1975) and Rincover (1978) work 
around the overselectivity problem by con- 
structing learning environments that allow 
the child to remain overselective, yet learn. 
However, it may also be possible to work 
directly on. the overselectivity problem, as- 
suming it is not permanent but itself amenable 
to learning. In our clinical work, as in the 
work of others (e.g. Risley & Wolf, 1967), 
autistic children eventually learned to “use” 
extrastimulus prompts rather than always be 
hindered by them. Although the exact reasons 
for this were not evident in the clinical work 
per se, the results of recent research suggest 
that overselectivity may be modifiable. 

Schover and Newsom (1976) directly at- 
tacked the problem of overselectivity by 
training autistic children to broaden their 


Fade in redundant 
components 


Figure 2. The within-stimulus prompt and fading steps used to teach discrimination between two stick 
figures. (Adapted from “Effects of Within-Stimulus and Extra-Stimulus Prompting on Discrimination 
Learning in Autistic Children” by L. Schreibman, Journal of Applied Behavior Analysis, 1975, 8, 91-112. 
Copyright 1975 by the Society for the Experimental Analysis of Behavior, Incorporated. Reprinted 


by permission.) 
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responding to include multiple cues. They 
examined the effects of overtraining an already 
learned discrimination between simple figures 
(such as a large green square and a small 
orange triangle). Their data showed that 
overtraining increased the number of cues 
responded to Бу autistic children. They con- 
cluded that the overselectivity shown by 
autistic children is probably not the result 
of a permanent disability, or that it may be 
permanent but treatable, like diabetes. 

A study by Schreibman, Koegel, and Craig 
(1977) further strengthens this inference. 
Using discrimination tasks, they found that 
simple overtraining (just exposure) did not 
help the autistic child to respond to more 
cues. However, prolonged testing with un- 
reinforced probe trials interspersed among 
reinforced training trials eliminated over- 
selective responding in 13 of 16 autistic 
children who were initially overselective. 
That is, children who selectively responded 
to one of the two available cues early in 
testing eventually responded to both cues as 
test trials continued. Since all of the stimuli 
in this study were in the visual modality, 
it is possible that the rapid reduction in 
overselectivity was a function of the stimuli 
used. 

In perhaps the most direct attempt to test 
the hypothesis that overselectivity could be 
modified, Koegel and Schreibman (1977) 
taught five autistic children a conditional 
discrimination requiring response to multiple 
cues. The results showed that although the 
autistic children appeared to have difficulty 
learning and did not learn in the same manner 
as normal children, they nevertheless did 
acquire the discrimination. Further, one child 
who was taught a series of nine successive 
conditional discriminations eventually ар- 
peared to form a set to respond to new dis- 
criminations on the basis of multiple cues. 
Although these results were preliminary in 
the sense that they were obtained with only 
certain types of stimuli, they nevertheless 
suggest optimism regarding the possibility of 
correcting overselectivity per se. This pos- 

sibility has numerous implications. Perhaps 
the most important are that (a) if over- 
selectivity were eliminated, autistic children 
might be more likely to benefit from more 
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traditional teaching procedures that typically 
require such responding, and (b) elimination 
of overselectivity may enable the children to 
respond to their environment in a more 
normal manner and thus facilitate large-scope, 
rapid changes in behavior. 

"These studies on eliminating the deleterious 
effects of stimulus overselectivity suggest that 
such overselectivity may be overcome or taken 
advantage of, as the situation requires, if 
sufficient effort and imagination are used in 
teaching autistic children. The findings pro- 
vide basis for optimism that an experimentally 
derived technology for successfully teaching 
autistic children could be forthcoming. 


Relation to Adult Schizophrenia, Learning 
Disabilities, and Mental Retardation 


The discussion of overselective responding 
in autism shows a striking similarity to certain 
research on adult schizophrenics. Researchers 
in the area of adult schizophrenia typically 
distinguish between acute and chronic schizo- 
phrenics on the basis of breadth of cue utili- 
zation. Acute schizophrenics are described as 
responding to too many cues in their environ- 
ment. They are not able to select out the 
relevant stimulation to the exclusion of the 
irrelevant. In contrast, chronic schizophrenics 
are more like autistic children in that they 
respond to a limited amount of available 
stimulation. The research methodologies used 
to establish these findings are diverse, but 
sometimes bear a striking resemblance to that 
employed in the studies we have reported. 
This is the case of Feeny's (1972) work, as 
reported in Broen (1973), in which the sub- 
Ject responds (with a microswitch) to multiple 
inputs (tones and lights) and in which the 
effects of single versus multiple stimulus 
Presentations are compared. Feeny’s study 
showed evidence of stimulus overselectivity 
in chronic schizophrenics. Some studies have 
been reported in which patients were trained 
to pay attention to multiple inputs (eg» 
Meiselman, 1971), further adding to the 
similarity in our approaches. Broen’s (1973) 
description of the findings on limited or nar- 
Towed attention in chronic schizophrenics 
Seems relevant to the research we have 
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reviewed on autism: 


Chronic schizophrenics do indeed show a narrower 
range of cue utilization than normals or acute schizo- 


` phrenics. Their ability to note and respond to relevant 


cues is especially impaired when the cues are located 
in more than one sensory modality. (p. 207) 


In the same article, Broen speculates on the 
effects of such restricted responding: 


A life style dedicated to limiting stimulation . . . in- 
creases the likelihood of a chronic inability to monitor 
the environment and adjust to its changing demands. . . 
[it] maintains the potential for disorganization in the 
face of complexity. (p. 193) 

If we consider the fact that many autistic 
children become diagnosed as chronic schizo- 
phrenics as adults, then our data may help 
to better understand the process of chronic 
schizophrenia. This seems particularly true if 
one assumes that children with a less com- 
plex history would show such problems as 
underlie schizophrenia in a more “риге” form. 
In any case, it seems striking as well as en- 
couraging that two independent areas of in- 
vestigation have produced such similar findings 
and reached such similar conclusions; one 
must, however, exercise considerable caution 
in drawing inferences common to adult chronic 
Schizophrenia and autism, considering the 
problems in diagnosis and the diversity in 
research methodology. Other reviews of the 
literature on cue utilization and schizophrenia 
have been provided by Lang and Buss (1965), 
McGhie and Chapman (1961), Silverman 
(1964), and Venables (1964). 

The concepts discussed here also appear 
similar to those discussed in the literature on 
learning disabilities in children. People in this 
field discuss auditory dominance (e.g., Senf 
& Treundl, 1971), visual dominance (Baker 
& Raskin, 1973; Gaines & Raskin, 1970), and 
sensory integration (e.g, Baker & Raskin, 
1973; Birch & Belmont, 1964; Chalfant & 
Scheffelin, 1969). Indeed a great deal of re- 
search in this area points to defective atten- 
tion as central to learning disabilities (e.g., 
Dykman, Ackerman, Clements, & Peters, 
1971; Luria, 1961; Ross, 1976; Senf & Treundl, 
1971; Strauss & Lehtinen, 1947; Trabasso & 
Bower, 1968; Zeaman & House, 1963). The 
Possibility exists, however, that the results 
applicable to learning disabled children may 
Dot be directly comparable to the results with 
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autistic children because of the many defini- 
tions of "learning disabled" and because of 
different methods of investigation. For exam- 
ple, Tarver et al. (1976) reported that some 
learning disabled children may be under- 
selective and respond to too many cues. 
However, Ross (1976) has pointed out several 
similarities between the difficulties manifested 
by autistic children and certain learning dis- 
abled children when required to respond to 
cross-modality multiple cues. He reviews 
several studies supporting this interpretation 
(e.g., Vande Voort, Senf, & Benton, 1972). 

In the area of mental retardation, investi- 
gators such as Zeaman and House (1963) 
have carried out extensive studies on various 
aspects of discrimination learning and have 
concluded that deficits in response to multiple 
cues may be a crucial factor in the poor per- 
formance of such children. When these in- 
vestigators compared children of various levels 
of retardation, they found that the lower level 
retarded children took longest to learn. How- 
ever, there were no differences between chil- 
dren in learning rates once improvement 
began. That is, rather than very slowly and 
gradually acquiring a discrimination, the more 
severely retarded children showed no increases 
in correct responding for many trials and then 
suddenly showed large increases in correct 
responding, at the same rate as normal chil- 
dren. The difference between the high- and 
low-level children was in how many trials 
they took before they started to show in- 
creases in correct responding. These investi- 
gators also found that they could shorten the 
number of trials it took the low-level children 
(i.e, decrease the differences between high- 
and low-level children) by directing their 
attention to a relevant cue in the task. The 
general conclusion drawn from experiments 
such as these has been that retarded children 
take a long time to learn because their at- 
tention is not focusing on a relevant aspect 
of the discrimination but that once they 
attend to a relevant cue, they can learn as 
fast as normal children. Relating this con- 
clusion to our results, we would speculate 
that the reason retarded children take so long 
to attend to a relevant cue is that they are 
sampling (responding to) fewer cues at a time 
than normal children, and therefore the 
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probability is lower that a relevant cue will 
be included in any given sample. 


Relation to Other Work on Attentional 
Problems in Autism 


Attentional abnormalities have been sus- 
pected by many authors on autism as the 
basic problem in the etiology of that disorder. 
The attentional problem is well illustrated 
in Kanner’s (1944) description of one of his 
patients: 

When spoken to, he went on with what he was doing 
as if nothing had been said. Yet one never had the 
feeling that he was willingly disobedient or contrary. 


He was obviously so remote that the remarks didn’t 
reach him. (p. 212) 


The clinical literature on autism abounds 
with similar descriptions, as when the parents 
suspect the child may be blind and deaf only 
to be puzzled at observing that sometimes 
their child can see and hear quite well. For 
example, the child may not respond to his 
or her name being called or startle at a loud 
sound like a slamming door, but may respond 
to a barely audible siren or the sound of 
candy being unwrapped. Koegel and Schreib- 
man (1976) tell of a child who showed this 
inconsistency in responding to a remarkable 
degree, It may be this inconsistency or vari- 
ability in responding that has led people to 
infer that the autistic child has a deficiency 
in perceptual or attentional mechanisms rather 
than a sensory deficit. 

The following is a brief discussion of the 
major theories that have been advanced to 
account for the attentional deficits in autism, 
It is surprising how many of these theories 
postulate some restriction in attention, analo- 
gous to the stimulus overselectivity hypothe- 
sis that we have proposed, as basic to autistic 
development. 


Psychodynamic Theories 


Psychodynamic theorists presented the first 
attempt to account for the autistic child’s 
unresponsivity, seeing such unresponsiveness 
as a defense against a hostile, threatenin 
world. For example, Bettelheim (1967) рго- 
posed that autistic children narrow the range 
of their attention because they feel it neces- 


O. LOVAAS, R. KOEGEL, AND L. SCHREIBMAN 


sary to shut out of awareness their disappoint- 
ing, unresponsive, and destructive parents. It 
seems clear that the psychodynamic orienta- 
tions talk of restricted or narrowed attention 
in the autistic child and that this is seen as 
a defense against interpersonal trauma. 
Empirical research designed to investigate 
attentional deficits based on social trauma 
has provided little support for the psycho- 
dynamic formulations. For example, Hermelin 
and O'Connor (1963) studied autistic and 
retarded children (matched on IQ) in a free- 
field situation designed to assess the amount 
of attention to visual, auditory, manipulative, 


or social stimuli and found no significant dif- | 


ference between the groups in the amount of 
attention to the different stimuli. In a later 
study O'Connor and Hermelin (1963) found 
no evidence of autistic withdrawal from social 
stimuli. These same investigators (O'Connor 
& Hermelin, 1967) compared visual fixations 
of normal, severely retarded, and psychotic 
children to two simultaneous card displays 
depicting social (faces) and nonsocial stimuli. 
They found that (a) compared to the other 
groups, the psychotic children spent less time 
looking at any display cards and more time 
in nondirected gazing, and (b) all children, 
including psychotics, looked more at a picture 
of a face than at the scrambled pictures of 
the same face. Similar findings are reported 
by Young (1970). Thus, the experimental 
evidence does not lend support to the notion 
that autistic children display a particular 
attentional deficit in relation to social stimuli. 


Developmental Theories 


Developmental theories of perceptual devia- 
tions in autism are based on observations of 
animal and normal human development 
(Sherrington, 1906; Zaporozhets, 1961), and 
hold that there is a normal transition from 
preference and dependence on the near re- 
Ceptors (tactile, kinesthetic, gustatory) in 
early life to the dominance of input from the 
far receptors (visual, auditory) in later life. 
It has been postulated that the autistic child's 
unresponsiveness to auditory and visual stimu- 
lation is due to a failure in development 
beyond the near-receptor stage (Goldfarb, 
1956; Schopler, 1965). 


y 
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Some data are consistent with this model. 
For example, Pollack and Goldfarb's (1957) 
ata suggested that autistic children were 
not making good use of added visual cues. 
Frith and Hermelin (1969) presented addi- 
tional evidence consistent with these findings. 
They studied normal, autistic, and retarded 
children to compare the relative use of visual 
and tactile cues. For example, in one experi- 
ment the children were tested on three tasks. 
One task could be solved on the basis of 
| visual cues only. The second task could be 
solved by either visual or tactile cues or both. 
The third task could only be solved using 
tactile cues. They found that all children did 
Better with the visual cues except the de- 
velopmentally backward autistic children. For 
them, providing visual information had no 
facilitatory effect. Goldfarb and Braunstein 
(1958) presented similar data, as in their 
report that normal children’s speech was dis- 
turbed while listening to delayed auditory 
feedback and the speech of childhood schizo- 
phrenics was unaffected. 

Schopler (1966) found that normal and 
retarded subjects showed a higher preference 
Íor visual stimulation than did the schizo- 
phrenic subjects. (Normals and schizophrenics 
were matched on chronological age [CA], 
retardates and schizophrenics on mental age 
[MA]) He also found that normal subjects 
increased in their visual preference with age. 
Similarly, Hermelin and O'Connor (1964) 
presented psychotic and normal children of 
"the same MA and CA with stimuli in dif- 
ferent modalities and found that the psychotic 
children responded much more often to touch 
and the normal children to sound. 

But much empirical research has failed to 
Support the developmental model of sensory 
deficit. In Schopler's (1966) study he did not 
find a difference between the normal, retarded, 
ànd schizophrenic subjects in their preference 

for tactile cues, nor did he find that the 
Normals decreased in their preference for 
tactile stimulation with age. Also, the retarded 
Subjects in that study showed an increase in 
tactile preference with a decrease in visual 
Preference with increase in MA. Both these 
findings are difficult to fit into a developmental 
Interpretation. Also, Goldfarb (1961) found 


No significant difference in the abilities of 
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normal and schizophrenic children in relation 
to visual, auditory, and tactile cues. In the 
Hermelin and O'Connor study (1963) cited 
earlier, autistic and retarded children did not 
differ in their responsiveness to visual, audi- 
tory, and tactile stimuli. The same investi- 
gators (O'Connor & Hermelin, 1965) failed 
to confirm their earlier (Hermelin & O'Connor, 
1964) findings of near-receptor preference in 
autistics. Finally, in our own studies (e.g., 
overselective response to three cues and to 
two cues) we failed to observe the autistic's 
preference for a particular stimulus hierarchy. 
The autistic subjects responded on the basis 
of either visual or auditory cues, and none 
of them responded to the tactile (пеаг- 
receptor) cue. 

In general, the literature on developmental 
receptor preferences does not provide a tenable 
explanation of sensory deficit in autism. But 
the data do support the notion that autistic 
children may act like a developmentally im- 
mature organism in their restricted response 
to multiple cues. Perhaps developmental vari- 
ables may sometimes be understood as related 
to "breadth of cue use" independent of 
sensory preferences. 


Arousal Theories 


Several theories of sensory dysfunction in 
autism have emphasized the physiological 
mechanisms related to arousal. Since the level 
of physiological arousal is an important de- 
terminant of how much the organism will be 
affected by environmental stimulation, it is 
reasonable to postulate that pathological 
mechanisms influencing arousal would interfere 
with normal sensory processes. There are 
three basic arousal theories represented in the 
literature. One position suggests that autistic 
children suffer from a chronically low level 
of arousal another suggests a chronically 
high level of arousal, and the third postulates 
alternating periods of high and low arousal. 

The main proponent of the underarousal 
theory is Rimland (1964). He hypothesized 
that autistic children are chronically under- 
aroused because of a dysfunction of the 
reticular activating system. A stage of under- 
arousal would account for the child's restricted 
attention to external stimulation. Rimland's 
use (pp. 201-204) of *narrow bands" (from 
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information theory) seems particularly close 
to our use of stimulus overselectivity in ac- 
counting for the behavioral peculiarities of 
autism. Metz (1967) provided some data 
consistent with the underarousal hypothesis. 
He found that when autistic and normal sub- 
jects were allowed to control the volume of 
an auditory stimulus, the autistics preferred 
higher levels of stimulation than did the 
normals. But in general the underarousal 
theory has little direct supporting evidence 
and has not been subjected to rigorous em- 
pirical investigation. 

The overarousal hypothesis is based on 
more empirical findings. Hutt, Hutt, Lee, and 
Ounsted (1965) hypothesized that in autistic 
children the nonspecific activity of the reticular 
activating system is sustained at a high and 
relatively inflexible level. They point to the 
possibility that the typical unresponsiveness 
displayed by autistics is a defensive function 
preventing excessive arousal. Hutt, Hutt, Lee, 
and Ounsted (1964) suggested that the over- 
arousal hypothesis is also consistent with the 
desire for sameness in the environment. 
Novelty leads to increased arousal and is thus 
avoided. Some support for the overarousal 
Position is provided by Connell (1966), who 
observed that autistics often require higher 
than normal doses of sedative drugs. Rutter 
(1968) described two important problems with 
an overarousal theory. First, level of arousal 
may be related to level of maturation or 
development. The Hutt et al. (1965) subjects 
were matched on CA but not MA. Thus it 
is possible that the lower MA of the autistics 
accounts for the differential electroencephalo- 
gram (EEG) patterns of the two groups. 
Second, high arousal may have developed as 
a secondary rather than a primary defect. 
Hermelin and O’Connor (1968) found no 
major differences in alpha rhythm in the 
EEGs of autistic, Down's Syndrome, and 
normal children in several conditions differing 
m amount of sound and light stimulation. 
Only when continuous noise was present in 
one of the conditions did the autistics display 

relatively more arousal than the other groups. 
This suggests that overarousal may be a sec- 
ondary response to environmental stimulation. 

A third arousal theory is the "perceptual 
inconstancy" hypothesis offered by Omitz 
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and Ritvo (1968). According to this theory; 
there is a specific pathological process thai 
begins in the first year of life when there i 
a failure of the central nervous system (CNS 
to develop adequate homeostatic regulatio 
of sensory input. Without control over in- 
coming sensory stimulation, the child ех 
регіепсеѕ random overloading and under: 
loading of the CNS. At it is now formulated, 
the perceptual inconstancy hypothesis does 
not relate clearly to our work on stimulus 
overselectivity. Although both viewpoints ad 
dress the problem of selective attention, our 
data suggest that the autistic children те 
sponded to only one stimulus component 
(auditory or visual) in a complex stimulus 
and that they responded to that same com- 
ponent consistently day after day. If, as 
Ornitz and Ritvo say, the underloading and 
overloading of the CNS is random, we should 
not have observed such consistent patterns 
of responding. 


Other Relationships | 


Much may be learned by relating the studies 
on stimulus overselectivity more closely to 
similar work with animals. A thorough review 
of that work is beyond the scope of this 
article, but it should be mentioned that 
Pavlov noted in 1927 that the conditioned 
Tesponse to one element of a complex stimulus 
was as large as the response to the com- 
plex, leaving the response to the other eles 
ments negligible. Similar early references 
to the distinction between “nominal” and 
“functional” or effective stimuli can be found 
in the work of Harlow (1945) and Warren} 
(1953). Reynolds (1961), for example, trained] 
two pigeons to discriminate between two} 
white forms on differently colored backgrounds 
(red or green). It was found that one pigeon 
responded only to the white form and 6 
other pigeon responded only to the colore 
background. Some persons have consider 
the underlying mechanism behind selectiva 
responding to be genetic (some cues are mor] 
dominant for some species than others) 
Other persons have attributed selective 16 
sponding to prior learning of one cue, whi 
then “blocks” (as in “stimulus blocking") 0 
inhibits responding to other cues that are al? 


available. Perhaps animal studies may pro- 
vide information about the basic cause of 
over) selective responding. 


Conclusion 


Looking across all of these studies, one can 
make several general observations. It is ap- 
| parent that under a wide range of different 
testing situations, low-level autistic children 
come under the control of an extremely re- 
stricted range of stimuli. At this point, how- 
| ever, it may be premature to label this phe- 
_ nomenon in other than a descriptive manner. 
Thus we have chosen to employ the term 
stimulus overselectivity, rather than other more 

inferential terms such as association deficiency 

or deficiency in selective attention. There are, 
however, many parallels between our results 
and those discussed in the discrimination 
learning literature in the area of selective 
attention. 

| We are limited in our inferences regarding 

| the mechanism behind the phenomenon, but 
we are still able to relate the children's over- 
selective responding to a number of their 
abnormalities. Specifically, we have reviewed 
findings relating overselectivity to deficiencies 
in (a) generalization, (b) the use of “extra- 
stimulus" prompts, (c) language learning, 

(d) social behavior, and (e) observational 

learning. We have also been able to relate 

Overselectivity to specific remedial procedures. 
«For example, the work on prompting has 

resulted in an extremely efficient technique 

(within-stimulus prompting) for teaching at 

least some behaviors. Our major observation, 
and our main point in this article, is that 
| Overselectivity frequently occurs in children 
diagnosed as autistic, that this characteristic 
can be reliably measured, and that a knowledge 
of its existence may be useful in planning for 
these children’s education and treatment. 


Reference Note 
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Two-Sample T" Procedure and the 
Assumption of Homogeneous Covariance Matrices 
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Results of an empirical investigation of the robustness of Hotelling’s two- 
sample T? test with respect to violation of the assumption of homogeneity of 
covariance matrices are presented. Empirical sampling distributions of the T? 
statistic were obtained from a large number of sets, each consisting of 2,000 
samples drawn from multivariate normal parent populations. Average sample 
size (n), extent of inequality of sample sizes, number of variables (р), and 
degree of inequality of covariance matrices were combined into 108 different 
conditions. Actual proportions of values that exceeded nominal а levels are 
presented. For equal ms, the procedure is shown to be generally robust. With 
unequal ms, the procedure is shown to become increasingly less robust as co- 
variance matrix heterogeneity and р increase. The results are related to earlier 


findings, and implications for the proper use of the 7° procedure are noted. 


Hotelling's (1931) two-sample 7* procedure 
for multiple dependent variables is widely 
used in the behavioral sciences today, being 
the uniformly most powerful test in the case 
of two-group p-variate simultaneous compari- 
son (Anderson, 1958, pp. 115-118). Textbooks 
dealing with multivariate statistical methods 
in the behavioral sciences (e.g., Harris, 1975; 
Morrison, 1976; Tatsuoka, 1971), however, 
typically provide the reader with at most a 
brief treatment of the robustness of the method. 
The one investigation that is often referred to 
is that of Ito and Schull (1964), who investi- 
gated the large-sample properties of the dis- 
tribution of 7?, showing analytically that for 
equal and large sample sizes, heterogeneity of 
covariance matrices has no substantial effect 
on the probability of Type I error. 

Although infrequently referred to, several 
Studies on this topic have been conducted. 
Two articles other than Ito and Schull's 
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(1964) contain analytical results: Mardia 
(1971), who examined the effects of multi- 
variate nonnormality (in the face of which 
the multivariate analysis of variance [MANova ] 
class of tests is relatively robust), and Pillai 
and Sudjana (1975), who examined the effects 
of covariance matrix heterogeneity on four 
test criteria. Although limited to the bivariate 
case, small ms, and unclear degrees of hetero- 
geneity, their results showed modest de- 
partures from nominal с values for minor 
degrees of heterogeneity and more pronounced 
departures with greater heterogeneity. 

Several empirical Monte Carlo studies have 
dealt with the robustness of the 7° test. Chase 
and Bulgren (1971) examined nonnormality, 
an issue not dealt with in the present article. 
Hopkins and Clay (1963) examined hetero- 
geneity of covariance matrices, although only 
for the bivariate case. Olson (1974), who com- 
pared six MANOVA criteria, dealt with only the 
equal-n case. These studies provided some in- 
sights into the robustness of the 7° test, al- 
though none illuminated the issue fully. 

One reasonably comprehensive empirical 
study was that by Holloway and Dunn (1967), 
who, in general, found that with equal ns 
and a fairly large ratio of subjects to dependent 
variables, the T? test is robust. As expected, 
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with unequal ns, the test was found to be 
less robust being conservative, when the larger 
sample was drawn from the population with 
generally greater dispersions, and liberal in the 
opposite situation. The main defect in this 
study, however, at least for the present pur- 
poses, was the unrealistically different hetero- 
geneity conditions (e.g. variables in Popula- 
tion 2 that had variances 10 or 100 times those 
in Population 1) and unequal-n conditions 
that were not disparate enough (1:2 ranging 
from 15:35 to 35:15). In addition, for m # m2, 
па + ny was fixed at 50, thus preventing the 
systematic examination of overall sample size 
as a factor in the issue of robustness. 
Although something is known therefore 
about the robustness of the T? test, it is not 
so clear at what degree of departure from 
optimal conditions (equality of sample s, 
homogeneity of covariance matrices, etc.) the 
procedure manifests an unacceptable actual а 
level. The purposes of the present study were 
(a) to examine simultaneously all independent 
variables relevant to robustness so that a 
comprehensive picture of each of these vari- 
ables could be obtained, in terms of both their 
main effects and their possible interactive 
effects; (b) to represent the independent 
variables so that they represented real-world 
behavioral data in terms of levels that are 
identifiable with characteristics of observable 
data variables; and (c) to distill the results 
necessary to provide guidelines for the proper 
use of the 7* procedure in psychological 
research. 


The Monte Carlo Investigation 
Independent Variables 


Sample size. To assure comparability 
among data sets with different numbers of 
variates, the ratio of average sample size, 
(т + па)/2, to number of variates was 
manipulated. Two ratios were used: 3 sub- 
jects per variable and 10 subjects per variable. 

Inequality of sample sizes. The ratios of 
the sizes of the two samples were also varied. 
Three such ratios were used: 1:1, 2:1, and 5:1, 

Number of variables. Three levels of this 
factor were used: 2, 6, and 10. 

Heterogeneity of covariance matrices. This 
factor was central in the present study. 
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Heterogeneity of population — covarian 
matrices can, of course, arise from many 
causes and in a vast number of forms and 
degrees. It seems true, however, that the 
most obvious occurrence of such heterogeneit 
is that in which a sample that is drawn from 
a population with normal dispersions of the 
variables is compared with one that is either 
implicitly or explicitly selected or restricted im 
some way. 

In the present study, the authors attempted 
to generate data that would pos: the same 
heterogeneity often found in real-world dat 
by introducing scale factors in four ways: 


1. The variables in Population 2 were re- 
scaled by 1.2, that is, whereas the variables 
in Population 1 were all set with о = 1, those 
in Population 2 had os of 1.2 and thus с? of 
1.44. This can be seen as a mild departure from 
homogeneity and is referred to as Hetero- 
geneity Condition 1. 

2. The variables in Population 2 were re- 
scaled by 1.5. This scaling (Heterogeneity 
Condition 2) represented a moderate to sub- 
stantial degree of heterogeneity. The Popula- 
tion 2 variances and covariances were thus 
2.25 times those for Population 1. 

3. The first 2/2 variables in Population 2 

were of the same scale as those in Population 
1, whereas the last 2/2 variables were Te 
scaled by 1.5. Heterogeneity Condition 3 
represented the situation of selection on some 
but not all of the variables. 
_ 4. The variables in the two populations were 
in precisely the same scale, that is, по re- 
scaling took place. This is, of course, the 
homogeneity condition. 


Population 1 covariance matrices (Zi) Were 
constructed for the three р levels, 2, 6, and 10. 
These =; matrices appear in Table 1. Popula- 
tion 2 covariance matrices for each value 0 
P (Z) were then obtained by 22 = пр 
where i = 1, ..., 4, and where ће diagonal 
D; matrices contained the scale factors Те 
ferred to above. Thus, for Heterogeneity 
Conditions 1 and 2, D; and Ds, respectively 
contained 1.2 and 1.5 in each diagonal post 
tion. For Heterogeneity Condition 3, Ds con- 
tained 1 in the first 5/2 diagonal positions 
and 1.5 in the last 5/2, and for the homo" 
geneity condition, D, was simply Íp- 


Population Covariance Matrices Used in the Study as Zi 


Table 1 


10 variables 


6 variables 


2 variables 


о 
= 


Variable 


1 


Variable 


2 


1 


Variable 


s 


100 
30 100 
20 100 
50 40 100 
30 30 50 100 
10 50 30 40 100 
20 30 20 50 30 100 
40 10 40 30 00 20 


50 
30 
40 40 50 
20 40 
40 00 
50 
30 
10 


30 
20 
00 


E 


30 
50 
20 
10 
20 
30 
50 


ss 


AIM кюю DAS 


100 
35 100 
25 10 100 
40 50 30 100 
30 00 40 15 100 


100 
40 
30 
40 
30 
20 


- 0 0m wo 


100 


Note. Entries are actual values multiplied by 100. The £+ matrices that were used in the study were obtained from those in Table 1 by means of rescalings 


described in the text. 
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Previous treatments, such as those of Ito 
and Schull (1964) and Pillai and Sudjana 
(1975), have operationalized covariance matrix 
heterogeneity in terms of the diagonal matrix 
of latent roots yielded by the product, 2.2,"'. 
Using the scheme for generating covariance 
matrix heterogeneity noted previously, it can 
easily be shown that since Z; = D;2,D,, the 
eigenstructure of ZZ; or, equivalently, of 
D,;ziD;Z;i" contains a diagonal matrix of 
latent roots equal simply to Ва. In Hetero- 
geneity Conditions 1 and 2, these roots will, 
of course, all be equal. 

Relationship between heterogeneity and in- 
equality of sample size. This concerns whether, 
when nı ¥ по, the larger sample was drawn 
from the population with larger dispersions 
(the positive condition) or smaller dispersions 
(the negative condition). Thus this two-level 
factor was not fully crossed with all other 
factors: It did not apply when m = n: or 
when Z; = 2». 

Summary of the independent variables studied. 
From the preceding, it can be seen that 
in all, 108 conditions were examined—90 
conditions in which 2; # 2. [2 Х 3X 5 (2 
positive 7:75, 2 negative m:n2, and 1 equal) 
X 3] and 18 in which Zi = Z: (2X 3 X 3). 
An understanding of the overall design of the 
study will be facilitated by examination of 
Tables 2, 3, and 4, which appear later in the 
article. 


Data Generation 


The generation of sample data was ac- 
complished by random number generation on 
the University of British Columbia IBM 
370/168 computer. Independent uniformly 
distributed random numbers on the interval 
(0, 1) were generated and then transformed to 
normally distributed random numbers with 
mean 0 and variance 1 by Marsaglia's rec- 
tangular-wedge-tail method (Knuth, 1968). 
Strings of length Np (where N = m + nj) 
of such independent random normally dis- 
tributed data points were generated and then 
partitioned into two data sets X (one, m 
subjects by p variates, the other тз subjects 
by № variates). With the string partitioned in 
this way, each л; X 1 (j = 1, 2) variate vector 
was normally distributed, with mean 0 and 
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Table 2 


Proportion of Values (Actual о Values) of the Two-Sample T? Statistic 
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Exceeding the Nominal 


a Values .01, .05, and .10 Under Various Conditions for the Case of p = 2 Dependent Variables 


and a True Null Hypothesis 


A ——— ——3 


Z heterogeneity 


Heterogeneity 1" 


Heterogeneity 2% Heterogeneity 3* 


Nomi- Homo- Posi- Nega- Posi- Nega- Posi- Nega- 
minè па! а geneity tive? tive" tive? — tive* tive’ — tive" 
Average n per sample — 6 
01 .010 O11 .012 :012 
6:6 .05 .049 .052 .050 :054 
40 404 .098 102 101 
01 .008 .007 — .021 .006 .023 008 .017 
8:4 .05 .047 030 .073 031 — .104 .035 .077 
10 .098 :077 433 .062 175 .081 — .142 
01 .009 .007 .020 .003 .045 .007 .035 
10:2 .05 .047 .027 .084 :014 — .148 :029 11 
40 101 4057 .147 .033 — .256 :070 .181 
Average n per sample — 20 
01 O11 .007 .013 01 
20:20 .05 .055 .050 .065 .046 
10 105 102 413 .094 
01 .010 008 — .019 .004 .029 .006 .016 | 
27:13 105 .049 445 — .073 .026 .087 034 .073 | 
A0 .097 :080 134 .054 — .155 975 — 4127 
| 
| 01 016 .003 .025 .001 .058 005 .035 | 
33:7 05 :060 025 .097 .007 163 4031 — .107 
A0 E 050 .170 .028 .238 061 .185 


c eL. "v" "" f 
Note. Each proportion is based on 2,000 pairs of samples. The various conditions involve 2; and £: and | 


magnitude and departure from equality of nı and na. 


* The degree of heterogeneity present is discussed in the text. 

represent, as closely as possible, three levels: 1:1, 2:1, and 5:1. 

in which the population whose covariance matrix contains the larger 
is drawn, where mı # па. The negative condition is that in which the 
trix contains the larger entries is that from which the smaller л is drawn. 
this dichotomy does not, of course, exist, and entries in Table 2 are given 


^ The min ratios are such that they 
* The positive condition refers to that 
entries is that from which the larger n 
population whose covariance matrix ci 
For situations where m = тз, 
between the Positive and Negative columns. 


variance 1 and was independent of every other 
vector. It can easily be demonstrated that the 
joint distribution thus arising is MVN(, I) 
(see, for example, Anderson, 1958, p. 19-27). 

So that each data matrix Y would represent 
a sample from a population with a known co- 
variance matrix Z, the following transforma- 
tion was applied. The desired population co- 
variance matrix was first canonically decom- 
posed as 2 = VAV’, and a “factor” matrix 
Е was obtained by F=VA. Next, the n;Xp 


MVN(0, I) data matrices X, described pre- 
viously, were postmultiplied by F’, yielding 
Y = ХР. This process can be considered to 
produce a sample data matrix that could have 
arisen from a population having covariance 
matrix Z, since 


E(YY) = E[(XF'(XE)] = Е(ЕХ'ХР) 
= FE(X’X)F’ = FF’ = VAN' = 2. 


In this way, 2,000 pairs of samples (from 
populations in which the null hypothesis w35 
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true) were generated for each of the 108 con- 
ditions studied. For each pair of samples, the 
№ T° statistic was computed and transformed to 
a value (F), which under a true null hypothesis 
is distributed as a central F variate with 
degrees of freedom р and (m + m, — p — 1). 
The obtained F values were then compared to 
the 90th, 95th, and 99th percentile points of 
the appropriate F distribution, and the per- 
centage lying above those points was tabulated 
for each condition. 


Table 3 
Proportion of Values (Actual o Values 
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Results 


The results of the described Monte Carlo 
analyses appear in Tables 2, 3, and 4; each 
table deals with a separate № value (2, 6, or 
10). The tabled proportions, based as they all 
are on 2,000 sample pairs, are subject to mild 
sampling error and should be interpreted 
accordingly. Using the standard error of a 
proportion and normal curve probabilities, we 
can set approximate .95 confidence intervals 


) of the Two-Sample T? Statistic Exceeding the Nominal 


а Values .01, .05, and .10 Under Various Conditions for the Case of p — 6 Dependent Variables 


and a True Null Hypothesis 


—— Са 


Z heterogeneity 


— — ——————— ——M—MÓ s 


Heterogeneity 1° 


Heterogeneity 2* Heterogeneity 3^ 


Nomi- Homo- Posi- Nega- Posi- Nega- Posi- Nega- 
mn nale geneity tive’ буе tive?  tive* tive* tive* 
Average n per sample = 18 

01 013 .006 011 -012 
18:18 .05 .048 -048 057 .064 
10 .098 .099 .109 .114 
:01 O11 .007 -020 -005 -043 .006 .018 
24:12 05 059 :035 .088 021 127 .028 :076 
10 106 -068 155 :051 .214 :072 .158 
.01 -011 .004 .036 000 103 .003 .046 
30:6 05 .057 018 117 .004 .249 :022 145 
10 .097 045 .202 .012 .358 .046 231 
Average n per sample = 60 
01 009 012 016 -008 
60:60 .05 .047 .056 .059 .040 
10 .096 :097 A11 .078 
01 O11 003 :015 .002 -040 .007 .021 
80:40 05 .050 .029 074 .011 4437 .026 .079 
40 .098 .065 141 .030 225 .059 441 
:01 .010 .003 .038 .000 — .132 .003 ^ .067 
100:20 .05 .044 :015 — .123 .003 .285 .021 179 
10 .096 035 .202 .008 .393 .046 — .258 


Note. Each Proportion is based on 2,000 pairs of samples. The various conditions involve £, and £, and 


bur 
a 


Ry 


agnitude and departure from equality of nı and mz. 
he degree of heterogeneity present is discussed in 


the text. 


eT € 71:71; ratios are such that they represent three levels: 1:1, 2:1, and 5:1. 


е Positive condition refers to that in which the population whose covariance matrix contains the larger 


entries is t 
Popul; 


Or si 


tween the Positive and Negative columns. 


hat from which the larger л is drawn, where m # па. The negative condition is that in which the 
ation whose covariance matrix contains the larger entries is that from which the smaller n is drawn. 
tuations where s, = mz, this dichotomy does not, of course, exist, and entries in Table 3 are given 
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Table 4 


Proportion of Values (Actual a Values) of the Two-Sa min 
а Values .01, .05, and .10 Under Various Conditions for the Case of p — 10 Dependent Variables 


and a True Null Hypothesis 


Posi- 
tive* 


Homo- 
geneity 


Nomi- 


mano? nala 


01 .009 .009 :010 013 
30:30 .05 .059 .049 .056 .058 
5 .106 .109 101 


01 O11 
100:100 .05 


* The degree of heterogeneity present is discussed іп 


b The пута ratios аге such that they represent, as closely as i ИЕ 2:1, 8 : 

* The positive condition refers to that in which the Sas shee eris on n arge 
entries is that from which the larger л is drawn, where m 2 па. The negative condition is that in which tht 
population whose covariance matrix contains the larger entries is that from which the smaller s is 
For situations where # = та, this dichotomy does not, of course, exist, and entries in Table 4 are given 


between the Positive and Negative columns. 


around each nominal « value tabled as (a) 
.10 + .013, (b) .05 + .010, and (c) .01 + .004. 
Thus, any value between, for example. .040 
and .060 can be considered to be within 
sampling error of the nominal value of .050. 

The results for the two-variable case appear 
in Table 2. We see, first, that the robustness 
of the T? test with equal ns extends to relatively 
small samples. As expected, the test is un- 
affected by inequality of ms with equal 25. 
When sample sizes are unequal, however, and 
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Heterogeneity 1* 


Nega- 


Average n per sample = 30 


Average n per sample = 100 
O11 


.049 
.098 


Note. Each proportion is based on 2,000 pairs of samples. Th: i itions i ге 2 1 
лавова ВА des IESU e m ples. The various conditions involve Z; and 22 


mple T? Statistic Exceeding the Nominal | 


Z heterogeneity 


Heterogeneity 2* Heterogeneity 3* 


Posi- 
tive" 


Nega- 
tive* | 


Nega- 
tive* 


Posi- 


tive* tive* 


the text. 


Zi ¥ >, the actual а level can differ consid 
ably from nominal, falling below nomi 

the positive case (nz > m) and exceeding 10 
in the negative (m > m). With the m: 
ratio fixed, the degree of departure from t 
nominal а level increases as the degree ! 
inequality of covariance matrices increast 
Also, with the degree of covariance mat 
heterogeneity fixed, the degree of discrepant 
between actual and nominal а values incre 
as the m:n ratio departs from one. And in 
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creasing the sample size while maintaining 
the ratio between the two sample sizes does 
ot help; if anything, the reverse is true. 

Tables 3 and 4 contain the results for, 
respectively, the 6- and 10-variable conditions. 
All of the earlier results generalize to condi- 
tions involving more variables. Again for a 
fixed 71:7 ratio and degree of heterogeneity, 
actual a values tended to be more discrepant 
for the larger average sample sizes than for the 
smaller. 

Examination of Tables 2, 3, and 4 reveals 
that with an increase in the number of depen- 
dent variables, p, the effects of these factors on 
the statistical test become more pronounced, 
in spite of the fact that the ratio of average 
sample size to number of variates was held 
constant (3:1 and 10:1). In the most extreme 
cases reported here, the departure from 
nominal o levels became large indeed, as is seen, 
for example, in Table 4. 


Conclusions 


è From these results, it is clear that the 7? 
procedure is generally robust with respect to 
violation of the homogeneity of covariance 
matrix assumption for equal sample sizes, even 
When the ratio of sample size to number of 
dependent variables is small, for example, 
three subjects per variable. Under these condi- 
tions, the 7° procedure has been shown to be 
able to withstand population scale differences 
of 1.5 on all variables (Heterogeneity Condi- 
tion 2)—with the number of such variables as 
large as 10—a scale factor that implies a be- 
tween-populations variance difference of 2.25, 
a difference that seems like a realistic extreme 
for behavioral data. Holloway and Dunn 
(1967) demonstrated that with massive popula- 
tion variance differences (e.g., 10:1 on all 
variables), sample size equality does not uni- 
formly produce a robust test with samples 
Smaller than about 50, although the number of 
Variates is relevant, since situations that in- 
volve two or three variates show robustness 
at ns of 25, but those that involve 10 do not 
become robust until about m = 100. For 
™ ту, however, the test moves rapidly 
towards unacceptable Type I error rates as 
the degree of population covariance matrix 

*terogeneity is increased. Even for relatively 
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mild between-populations dispersion differ- 
ences and particularly as р becomes larger, a 
sample size ratio of even 2:1 produces un- 
acceptable actual а values. With greater 
heterogeneity, the 2:1 m:n ratio yields values 
that are completely out of hand, which is also 
true for the case of mild heterogeneity but more 
severely unequal ns (5:1). Also important 15 
the fact that these results appear to be in- 
dependent of sample size. In summary, it is 
clear that the T? procedure is not robust in the 
face of covariance matrix heterogeneity coupled 
with unequal zs, even for relatively mild 
departures from equality of the covariance 
matrices, sample sizes, or both. 


Implications for Proper Use of the 7? Procedure 


Tt seems clear that prior to the 7° analysis, 
the sample covariance matrices S; and 5, 
should be inspected and inferentially tested for 
equality. The homogeneity issue has implica- 
tions for not only the 7? test but also for 
discriminant analysis—often employed to sup- 
plement 7* results—in which heterogeneous 
covariance matrices, in the two-group case, 
suggest a quadratic rather than a linear func- 
tion (Rao, 1965, pp. 488-489). 

Statistical tests of the hypothesis of equal 
covariance matrices have been available for 
many years, with Bartlett's (1947) modifica- 
tion of the likelihood ratio criterion and Box's 
(1949) improved chi-square and F approxi- 
mations (see Harris, 1975, pp. 85-86). These 
procedures, however, have been shown to be 
extremely sensitive to multivariate non- 
normality (Hopkins & Clay, 1963; Mardia, 
1971). Although little is known about it, a 
newer procedure due to Layard (see Timm, 
1975, pp. 252-253) may be more robust in this 
regard. 

Clearly, the user of the 7* procedure is 
almost certainly on safe ground if the two 
samples have equal ns. There is little chance 
that a Type II error in the homogeneity test 
will cause a seriously biased test of the central 
hypothesis. Greenstreet and Connor (1974) 
showed that even for small (and equal) ns, 
scale differences in the variables of from 2 to 3 
were detectable with high probability. And 
as noted, only for huge variance differences 
(e.g, 10:1) is the T? test biased. Thus, re- 
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jection of the homogeneity hypothesis will 
probably not be serious with equal ns. 

With equal ms and rejection of the equality- 

of-covariance matrix hypothesis, the question 
remains of whether in a given instance the T? 
test is likely to be biased. As demonstrated, 
an operationalization of heterogeneity is the 
magnitude of the latent roots of the product 
22r”. Given a strict rescaling of the variables, 
these elements are the variance ratios of the 
variables. For sample data, therefore, if D 
with diagonal elements di, is the diagonal 
matrix of latent roots of $817! (such that 
Id; > 1), then (II?d;;)'/? is a sample esti- 
mate approximately equal to a squared scaling 
factor, or an approximate overall estimated 
variance ratio. If this value is less than 5, for 
example—as will generally be true of real- 
world psychological data—then the user can 
safely ignore the inequality of the covariance 
matrices and proceed with the 7° analysis. 
For larger ms—SO or beyond, for example— 
the 7* will be sufficiently robust with hetero- 
geneity values (as noted) of up to 10. Thus, 
the user can ascertain whether rejection of the 
homogeneity hypothesis is relevant for the 72 
analysis; as demonstrated, such a rejection will 
usually be irrelevant. 

It is when m æ па that problems arise. 
If the test of covariance matrix homogeneity 
is nonsignificant, no problem exists; but if this 
test is significant, the user is faced with the 


, multivariate extension of the Behrens-Fisher 


problem. The following strategy—in the order 
of steps listed—seems to be a reasonable 
approach. 

1. Ascertain whether one is in the positive 
or negative condition. A direct assessment of 
this comes from comparison of the determin- 
ants (understood as generalized variances) of 
5; and Sa. If either n» по and {Si|> |So| 
or nı < mz and |8,| < |5, |, we have the posi- 
tive condition, whereas if the opposite obtains, 
we are in the negative condition. It is, of 
course, possible for |S;| to be equal to |5,], 
although S; Æ 8. In such a case, however, 
the above taxonomy does not apply, and the 
effects of such heterogeneity may not be 
serious. 

2. Given the positive condition, the 72 
test will be conservative. Thus the user should 
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run the test and, if it is significant, reject 
null hypothesis a fortiori. 

3. Given the negative condition, the T* tes 
will be liberal. Thus the user should run 
test and, if it is nonsignificant, retain the n 
hypothesis. 

4. If in the positive condition, 7° is ni 
significant or if in the negative condition, 
is significant, two possibilities exist: (a) If 
ns are not extremely different, they can 
equalized by random deletion of subjects froi 
the larger group. If in fact the null hypoth 
is false, the loss of power may not be too greal 
and the previously significant 7* (negati 
condition) may still be significant. The pi 
viously nonsignificant 7° (positive conditio 
may now be significant. (b) If the ns are sul 
stantially different so that equalization woul 
result in a massive loss of power or if th 
equalization performed when »s are not € 
tremely different results in nonsignificant 
sults, the user can employ one of several solu 
tions to the multivariate Behrens-Fish 
problem—reasonably precise approximation! 
that do not require deletion of subjects. Sud 
solutions can be found in an article by It 
(1969). 
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Cerebral Electrotherapy: Methodological Problems 
in Assessing Its Therapeutic Effectiveness 
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Cerebral electrotherapy (CET) appears to offer a safe and comfortable method 
of treating a variety of conditions, principally anxiety, depression, and insom- 
nia, Attempts to assess its therapeutic efficacy have yielded widely differing 
results, This may be attributable to considerable variation in the electrical 
characteristics of the apparatus used, the duration of the treatment, and the 
placement of the electrodes. In some double-blind studies, the placebo condi- 
tion has differed significantly from the treatment, and in most the ideal situa- 
tion with a blind machine operator, subject, and assessor, has not been achieved. 
Until the methodology for assessment improves and the treatment procedure is 
standardized, it will be impossible to determine if CET is an effective treat- 
ment, and if it is, whether its mode of action is attributable to a direct effect 
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on the brain or to relaxation, suggestion, or tactile stimulation. 


Reports from Eastern Europe and partic- 
ularly the Soviet Union suggest that cerebral 
electrotherapy (CET) or electrosleep treat- 
ment, holds considerable promise in the 
treatment of a variety of conditions. This re- 
view examines the methodological problems 
that have attended controlled studies that at- 
tempt to validate these claims. First, an over- 

| view of what the term CET implies is provided. 


Overview 


Electrosleep, or cerebral electrotherapy 
(CET), is a somatic therapy characterized by 
the passage of a low-amplitude, pulsating 
direct electrical current around and through 
the cranium. Originally, electrosleep was 
intended to induce a state of natural sleep, 
“a state of consciousness grossly indistinguish- 
able from ordinary sleep, produced by the 
direct action of a weak rhythmic current on 
the brain of a cooperative subject in a non- 
distracting environment” (Boblitt, 1969, p. 9). 


Requests for reprints should be sent to Clive S. 
Mellor, who is also at the St. Clare's Mercy Hospital 
St. John’s, Newfoundland, Canada АІС 588. џ 


Pavlov is said to have provided the concept 
and rationale for electrically produced sl 
therapy (Boblitt, 1969; Obrosow, 1959). 
concept of cerebral protective inhibition wi 
based on the idea that a prolonged, mono! 
onous, weak stimulus, such as a mild pulsati 
electric current, which is applied to the cen 
nervous system under conditions of comfor 
allows the brain cells to rest and perm! 
restoration of function. 

Investigations into the effects of dir 
electrical currents have been going on sind 
the 19th century. Included in the techniqu 
was that of electroanesthesia, or the depress! 
of consciousness, a state determined by Ui 
basic criterion of nonresponsiveness to pal 
and achieved by means of an electric curren 
usually in the range of 20 Hz to 1 K 
(Brown, 1975). A second technique, polariz 
tion, resembles and is often confused мій 
CET. The major differences involve the use 0 
constant rather than pulsating current all 
the positioning of electrodes on the arm 0 
leg rather than solely on the cranium. Polariz 
tion is thought to be useful in produ 
mood changes (Lippold & Redfearn, 1964). 

The modern view of electrosleep therapy # 
a technique distinct from electroanesth 
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began with the Soviet researcher Giljarowskii 
in the early 1950s (Lewis, 1966). 

, The view that the sole purpose of electro- 
sleep therapy was to induce a sleeplike state 
to promote functional recovery of cerebral 
cells influenced the type of research carried 
out in the Soviet Union and Europe. Most of. 
this research has been presented at the 
International Symposia for Electrosleep and 
Electroanesthesia, first held in 1966, with 
subsequent meetings in 1969 and 1972. It was 
after the first symposium that the notion of a 
therapeutic, protective, artificial sleep was 
gradually replaced by the idea that the direct 
action of the current itself was the healing 
force: 


Our experience has shown that sleep in the course of the 
individual session is not an absolute condition of success 
of therapy . ... The passage of the pulse current 
through the brain is of greater importance for a cura- 
tive effect than the achievement of the condition of 
sleep in one course of treatment. (Van Poznak, 1969, 
p. 507) 


Accordingly, Wageneder proposed adopting 
the term cerebral electrotherapy (CET) to 
replace electrosleep in that it more accurately 
reflected the type of treatment involved 
(Wageneder & St. Schuy, 1970). 

It was after the first International Sym- 
posium for Electrosleep and Electroanesthesia 
in 1966 that North American researchers 
became interested in CET. Before this sympo- 
sium, most of the work on CET had been 
done in the Soviet Union and Europe. Transla- 
tions of these works revealed sweeping claims 
for the beneficial effects of CET in a wide 
variety of disorders in the fields of psychiatry, 
Surgery, dermatology, obstetrics, and pediat- 
rics, but experimental controls were inadequate 
ог nonexistent (Van Poznak, 1969). The 
American interest in CET research marked 
the beginning of a somewhat more objective 
assessment of its effects. 

Whereas the Soviet and European re- 
Searchers had presented a united front in 
their favorable opinion of the beneficial 
effects of CET, the American investigators 
Were markedly divided in their opinions, in 
Spite of the fact that the focus of CET re- 
Search in America had been narrowed down to 
the main target disorders of anxiety, depres- 
Ston, and insomnia and to certain physiological 
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effects. In general, this division of opinion 
holds among authors of uncontrolled studies 
as well as among those who used double-blind 
procedures in their investigations. 

The difficulty in drawing any conclusions is 
compounded by the considerable differences 
in the research methods so that comparisons 
between studies are impossible. This is illus- 
trated by examining in detail eight recent 
Studies that were more rigorous than others 
because they at least used some kind of 
double-blind experimental procedure. 


Techniques of CET 


The technique of administering CET encom- 
passes a bewildering array of different pro- 
cedures, particularly when the type of electrical 
stimulation, the duration of the treatment, 
and the placement of the electrodes are varied. 


Electrical Paramelers 


Either a DC or an AC, in which the current 
flow regularly changes direction, has been 
used. Some researchers have added to these a 
DC bias (Itil, Gannon, Akpinar, & Hsu, 1971; 
Marshall & Izard, 1974). DC is most frequently 
used and adheres to the original electrosleep 
technique and theory in that a constant 
direction of current is thought to be essential 
for ensuring unidirectional cerebral cell reac- 
tions, an important factor in producing desired 
physiological modifications (Obrosow, 1959). 
The pulse frequency, measured in impulses 
per second or Hz, may vary from 30 Hz to 
100 Hz (Rosenthal, 19722), with a pulse width 
ranging from 1 to 2 msec (Straus, Elkind, & 
Bodian, 1964). The supply voltage may vary 
from 10 to 20 V. The current amplitude at 
which tingling is usually felt ranges from 
.1 to .5 mA. These current parameters depend 
on the type of CET device used, and no two 
American-made models seem to possess the 
same electrical characteristics (Brown, 1975). 
However, the combination of 100 Hz with a 
pulse duration of 2 msec and an amplitude of 
up to 1.5 mA are the most frequently used 
parameters in double-blind studies. 
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Table 1 


Frequency 
Authors Current (in Hz) 
Straus, Elkind, & Bodian рс 30-40 
(1964) 
Rosenthal (1972а) pc 100 
Weiss (1973) DC 45-240 
Feighner, Brown, & Oliver DC 100 
(1973) 
Tomsovic & Edwards AC 100 
(1973) 
Hearst, Cloninger, Crews, AC, DC 100 
& Cadoret (1974) 
Marshall & Izard (1974) DC 100 
Moore, Mellor, Standage, РС 100 


& Strong (1975) 


* The polarities of the electrode placements are not given. 
^ The duration of treatment was increased gradually. 


Duration of Treatment 


Treatment sessions have been known to 
vary from 30 minutes for 5 consecutive days 
(Rosenthal, 1972a) to 2 hours daily for a 
period of several months (Wageneder, Iwanov- 
sky, & Dodge, 1969). It has been reported 
that exposure to CET for more than 2 hours 
per session resulted in morning dizziness and 
a degree of unsteadiness in walking that lasted 
for a few hours (Iwanovsky & Dodge, 1968). 
The average duration of exposure time in 
most of the important studies was 30 minutes 
per session over an average of 10 sessions. 


Electrode Placement 


The felt pads attached to the electrodes are 
either soaked in water (Weiss, 1973) or a 
saline solution (Feighner, Brown, & Olivier, 
1973) or prepared with saline paste (Hearst, 
Cloninger, Crews, & Cadoret, 1974). A pair 
of electrodes is placed either directly on the 
eyelids (Lewis, 1966) or the brow (Brown, 
1975). A second pair is usually placed over 
the mastoids. The forehead electrodes are 
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Eight Double-Blind Studies and the Technique of Cerebral Electrotherapy 


Treatment session 


Pulse 
length Electrode Duration 
(in msec) placement No. (in minutes) 

1.8-2.0 Eyelid +, 6-12 30 
mastoid — 

1.0 Mastoid +, 5 30 
orbit — 

A-1,2 Brow, nape 24 1-5 
of neck* 2-10 

21-15» 

1.0 Eyelids, 20 30 
mastoids* 

2.0 Orbits, 5 30 
mastoids* 

2.0 Mastoids +, 5 30 
brow — 

1.0 Eyelid +, 5 30 
mastoid — 

2.0 Not given 10 30 


negatively charged cathodes and those at the 
mastoids are positively charged anodes (Rosen 
thal, 1972a), but in some cases their placement 
is reversed (Marshall & Izard, 1974). There 
is no firmly established rule about polarity; 
but the placement of negatively charged) 
electrodes on the forehead and positively 
charged electrodes over the mastoids is more 
frequent in the literature (Iwanovsky 
Dodge, 1968). However, the direction 0 
current flow appears to be an important facto! 
according to one of the original users of CET, 
Giljarowskii, who felt that current entering 
through the orbital fissure and leaving from 
the mastoid processes was the best method 0 
ensuring electrical penetration of the brain 
(Boblitt, 1969, p. 12). Since current conven- 
tionally flows from the positive to the negative 
electrodes, the anodes should be anterior 
relative to the cathodes. A significant reason 
for such a placement of electrodes is that the 
only investigation that has demonstrate 
intracerebral current flow (Dymond, Cogeh 
& Serafetinides, 1975) followed this procedur 
The differences in technique between the 88 

double-blind studies already referred to are 
set out in Table 1. у 
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Setting for Treatment and Method of 
Administration 


It is generally agreed that the best treatment 
setting is a quiet darkened room in which the 
patient lies comfortably on a bed. After the 
electrodes are applied, the current is turned on, 
and the amplitude is increased slowly to avoid 
any unpleasant sensation, until the level is 
reached at which a slight tingling sensation 
occurs. Some clinicians then reduce the current 
to a level at which the patient experiences no 
cutaneous sensation (see Table 2). 


Complications and Contraindications 


CET is an attractive treatment because it 
is said to have no cumulative side effects 
(Weiss, 1973), complications, or contraindica- 
tions, to be nontoxic and usable with drugs 
and other therapies, and to be a simple pro- 
cedure to carry out (Weinberg, 1969). How- 
ever, side effects such as blurring of vision, 
thought to be the result of electrode pressure 
on the eye, and dizziness as well as slight burns 
on the skin at the electrode sites have been 
reported (Frankel, 1974; Koegler, Hicks, & 
Barger, 1971; Rosenthal & Wulfsohn, 1970). 
Moreover, contraindications such as epilepsy, 
blood diseases, malignant tumours, cerebro- 
vascular disorders, and heart disease (Chuma- 
kova & Kirillova, 1976) as well as various 
forms of psychosis (Rosenthal, 1972b) have 
been reported. All of the double-blind studies 
excluded such patients. 


Mode of Action of CET 


Before double-blind designs can be discussed. 
the theories of the mode of action of CET 
should be considered. These theories must be 
taken into account when the qualifications of 
the different placebo conditions are reviewed. 

There are two main schools of thought about 
the action of CET: those researchers who 
postulate that it has a direct effect on the brain 
and those researchers who attribute its effect 
to other causes. 


Direct Effect 


The effects of CET treatment are attributed 
to the direct action of the current on the 
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cerebral cells. Rush and Driscoll (1968), 
their work on a theoretical model of curre 
flow in the human head, assumed that th 
current entered the cranium via the fronti 
surface electrodes. Their calculations suggesté 
that 45% of the electrical output actual 
entered the brain. Those who favor the direc 
effect theory hold that the current traversin 
the brain induces protective inhibition tha 
creates favorable recovery of cerebral cell 
along with sedation and normalization of th 
central nervous system (CNS) processe 
(Banshchikov, 1967; Brand, 1970). 

One criticism of the direct effect theory 
that the current is too weak to pass from th 
skin through the skull and other tissues 
effect changes in the brain. However, Dymond 
et al. (1975) provided direct evidence that CET 
is capable of producing electrical changes if 
the brain. The proponents of the direct eff 
theory have mainly been the Soviet and 
European researchers who were influenced b 
the original Pavlovian concept of protective 
inhibition. Kalinowsky (1969) suggested that 
the therapeutic usefulness of CET lay in the 
“rhythmic nature of a peripheral stimulation” 
(p. 172). Iwanovsky and Dodge (1968) 
described a Soviet work in which the autho 
conclusively stated that CET was a rhythm 
therapy in which the electrical current provided 
a kind of “electromassage” that polarized 
cells and normalized tissue metabolism. 


Indirect Effect 


The effects of CET treatment are due 10 
the indirect action of the current. The normal- 
ization of only the peripheral autonomic 
elements of the nervous system are involved, 
and the effect on the CNS is a secondary oné 
involving a variety of mechanisms (Iwanovsky 
and Dodge, 1968). 

_ The first mechanism is thought to be relaxa] 
tion, attributable to lying down in a quiet, 
comfortable, semidark setting. Some reviewers! 
of CET literature have repeatedly stress 
the importance of the elements of suggestion 
inherent in a procedure that calls for patiens 
to lie comfortably in а quiet darkened roo? 
and submit to a treamtment that they are tol 
will relax them (Boblitt, 1969; Frankel, 1974 
Lewis, 1966). | 
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A second mechanism is sensory stimulation, 

М whereby the rhythmic cutaneous sensations 

‘experienced in treatment may alone account 

for the clinical effects. Such sensory stimuli 

have been found to induce sleep (Lovell & 
Morgan, 1942; Oswald, 1960). 

A third mechanism is suggestion or a placebo 
effect, whereby patients who are referred by 
physicians to undergo a special type of 
therapy involving neither drugs nor psycho- 
therapy are under the impression that this 
represents a new type of cure for their specific 
ailment. The powerful suggestion inherent in 
electrical apparatus applied to healing has been 

"recognized since the time of Mesmer. The 
notion that a possible placebo effect played an 
unimportant role in CET research was pro- 
mulgated by Giljarowskii, a notion which 
according to Frankel (1974) probably en- 
,couraged many subsequent investigators to 
adopt a less rigorous approach to their 
experiment. 


Double-Blind Studies 


The ideal double-blind procedure is one in 
which the subject, the operator of the machine, 
and the individual assessing the effects are all 
blind. The methodological problems that occur 
when this is attempted are considerable. Only 
two of the eight studies listed in Table 2 were 
able to meet this ideal double-blind or more 

, accurately, triple-blind condition (Hearst et 
at., 1974; Weiss, 1973). 


Treatment Conditions 


In three studies the active and placebo 
CET conditions were identical (Hearst et al., 
1974; Marshall & Izard, 1974; Moore, Mellor, 
Standage, & Strong, 1975). In all of these no 
Statistically significant differences were found 

between active and placebo treatment (see 

Table 2). Two types of identical active and 
Placebo CET conditions were devised. One 
method was to lower the current to just below 

© Point at which tingling was perceived in 
active treatment. This would be done after 
subjects had been allowed to experience an 

initial tingling sensation. Thus an active 


Treatment condition experientially identical to 
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placebo CET, in which the current was turned 
off completely, was produced. 

An alternative method was that of Marshall 
and Izard (1974), which used positive and 
negative frontal electrodes and thus enabled 
the subjects to experience the cutaneous 
sensation. There was no active treatment 
because the current was flowing just through 
the inch of skin separating the electrodes. 
The electrodes were applied to the mastoid 
as usual but were not connected to the current. 
Although all of the studies purported to have 
kept subjects blind, the first five studies listed 
in Table 2 did not create active and placebo 
treatment conditions that were experientially 
identical to subjects. In some of these studies, 
subjects felt a tingling sensation throughout 
active treatment but experienced it only for 
a brief period at the beginning of placebo 
treatment. In the placebo treatment the 
machine would be turned off, persumably 
without the subjects’ knowledge. In other 
studies, a noise or light used in both treatment 
conditions was supposed to represent the 
giving of CET, although the tingling sensation 
felt in active treatment was absent during 
placebo treatment. All but one of these five 
investigations found a significant difference 
between the active and placebo treatment. 
The results of these eight studies suggest an 
association between identical peripheral stim- 
ulation and negative outcome and different 
peripheral stimulation and positive outcome, 
when active treatment is compared with 
placebo, although it does not reach statistical 
significance (р = .07, Fisher's exact test). 
Ensuring experimentially identical active and 
placebo treatment conditions may be one of 
the most important factors when the outcome 
of CET is being assessed. 

The implications of these results can be 
related to the two theories of CET mode of 
action. Two studies in which subjects felt 
tingling in neither active nor placebo treatment 
possibly represented a test of the direct effect 
theory: The intervening variable of peripheral 
or rhythmic sensation was not present ; there- 
fore, the direct effect alone was being tested. 
No statistically significant difference between 
active and placebo treatment was found. In 
a third study, in which subjects experienced 
tingling in both active and placebo treatment, 
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peripheral sensation was held constant. Al- 
though both active and placebo treatment 
groups improved, there was no significant 
difference between group improvement levels. 
Thus the indirect effect theory would appear 
to have been supported once more. 


Machine Operator 


An additional aspect to ensuring that 
subjects are blind, given experientially identical 
active and placebo conditions, is that the 
operator of the machine also be blind. Frankel 
(1974) has pointed out the possibility of 
either indirect verbal or nonverbal communica- 
tions between the operator and the subject, 
The application of active or placebo treatment 
may be identified by the subject from such 
subtle communications. Such procedures re- 
quire an intermediary who would only operate 
the machine while another individual would 
only interact with subjects. An ideal double- 
blind procedure is probably best achieved 
with the use of specially built machines with 
hidden switches (Frankel, 1974; Weiss, 1973). 
Perhaps because of the need for extra staff 
and equipment, only two of the eight double- 
blind studies were carried out under these 
conditions (Hearst et al., 1974; Weiss, 1973). 
In the Weiss study, significant differences 
between active and placebo CET were found; 
in the latter study no significant differences 
were noted. 


Assessment of Effects 


To prevent the influence of knowledge of 
type of treatment on assessment, objective 
psychological measures such as inventories 
completed by subjects or rating scales com- 
pleted by a blind clinical assessor should be 
used. All of the eight double-blind studies 
under discussion complied with one or both 
of these conditions. 

All these studies used relatively small 
numbers of subjects. This raises the problem 
of substantial differences between treatment 
and control groups before the treatment is 
administered. To some extent this problem 
can be obviated by using a crossover design 
like that employed by Feighner et al. (1973) 
and Moore et al. (1975). An important aspect 
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to assessment of CET effects, and one that 
has thus far not been explored in conjunction 
with psychological and clinical effects, is that 
of physiological effects. There is a body of 
research on CET that deals exclusively with 
physiological effects of CET and does not 
concern itself with psychological or clinical 
effects. Probably the most thorough method of 
properly evaluating the therapeutic effective- 
ness of CET would be to design a study in 
which CET’s effect on a target disorder is 
assessed with objective psychological, physio- 
logical, and clinical measures and in which 
subjects serve as their own controls in a 
crossover design. Such a study has been 
completed by the authors and will be published 
in the near future. 


Conclusion 


It is clear that the confusion about the 
actual therapeutic value of CET stems in 
part from the diversity of methodological 
approaches and from the varying degrees of 
scientific rigor that have been used in CET 
research, 

Any further research on CET would be of 
little value unless the scientifically sound 
components of all these studies are combined 


R produce a design that allows no room for 
jas. 


Suggestions for Future I. nvestigations of CET 


Based on this discussion of the i 
techniques and methodologies of CET Alay 


the designers of future studii i 
the following points: 1 ae 


1. Electrical Parameters have been limi 
to a pulse frequency of 100 Hz and a ад 
of 2 msec. Little consideration has so far been 


given to the effects f i pulse 
frequency and duration. о, 


not been Systematica]] i 
ly studied, 
Electrode Placement and Polarity, 
4. Treatment conditions —Id, 


jects, the operator 
clinical ot the ma 


CEREBRAL ELECTROTHERAPY 


should be identical under all experimental 
«adiens 
S 5. Assessment—Objective psychological 
and physiological measures should be used, 
| . along with blind assessment of the changes in 
| the subjects' clinical condition. 
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Temporal Versus Spatial Information Processing 
Theories of Hippocampal Function , 


Раш Е. Зојотоп 
Williams College 


A series of articles by Black, Nadel, O'Keefe, and their co-workers propose 
that the primary function of the hippocampus is to process spatial information. 
Although the spatial information processing view. of hippocampal function ac- 
counts for much of the available data, it cannot account for the data from the 
classically conditioned rabbit nictitating membrane response preparation. The 


present article reviews these data and suggests that the hippocampus is in- 
volved in the processing of temporal as well as spatial information. 


A number of recent studies by Black, Nadel, 
O'Keefe, and their co-workers propose that the 
primary role of the hippocampus is to process 
spatial information (see Nadel & O'Keefe, 
1974; Nadel, O'Keefe, & Black, 1975, for 
reviews). According to this theory, the hip- 
pocampus acts as part of a neural system that 
forms a cognitive map of the environment. 
Central representations of separate places in 
the environment as well as the relationship of 
one place to any other place are represented in 
this system. Thus once an animal has located 
itself in the environment by using the available 
cues, it can use its spatial mapping system to 
locate other places. 

Central to the spatial hypothesis of hip- 
pocampal function is that the role of the hip- 
pocampus is to process spatial but not tem- 
poral information. As O'Keefe and Black 
(1979) point out, cues are only critical to the 
extent that they allow the animal to identify a 
starting point (e.g., the start box in a maze). 
Once this point has been identified, the animal 
can find any place in the environment by using 
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the mapping system. Furthermore, once the 
starting point is perceived, the animal's spatial 
map is not sensitive with respect to its own 
body position or orientation, nor is the map 
sensitive to cues in the environment (O’Keefe, 
1976). 

To support the spatial information proces- 
sing view of hippocampal function, its pro- 
ponents have presented data from both elec- 
trophysiological (O'Keefe, 1976; O'Keefe #57 
Black, 1979; O'Keefe & Dostrovsky, 1971; 
Olton, Branch, & Best, 1978) and lesion (Black, 
Nadel, & O'Keefe, 1977; O'Keefe & Black, 
1979; O'Keefe, Nadel, Kieghtly, & Kill, 1975; 
Olton, Walker, & Gage, 1978) studies. The 
data and logic from both these lines of research 
strongly implicate the hippocampus in spatial 
mapping. The authors, however, overlook an 
accumulating body of literature on the role oft 
the hippocampus in aversive classical condi- 
tioning of the rabbit's nictitating membrane 
response (NMR). In this article I argue that 
place learning plays little if any role in classical 
conditioning of the rabbit's NMR and if this 
is the case, the data pertaining to hippocampal 
function in this preparation cannot be ex- 
plained by the spatial information processing 
hypothesis. 


The Spatial Information-Processing 
Hypothesis and Classical Conditioning 


The Proponents of the spatial information- 
Processing view of hippocampal function do not 
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TEMPORAL INFORMATION 


directly address the question of what changes in 
aversively motivated classically conditioned 

bo might be expected in animals with 
hippocampal lesions. They do, however, point 
out that place learning does not involve 
learning about the temporal relationships be- 
tween stimuli : 


When an animal is learning about the spatial relation- 
ships among stimuli, we assume that it employs place 
strategies. When it is learning about the temporal re- 
lationship between stimuli we assume that it employs 
cue strategies. (Black et al., 1977, p. 1108.) 


The authors assume that place learning in- 


volves the acquisition of knowledge of the 
3 "spatial relationships between two stimuli and 


that this knowledge is acquired independently 
of (and via different neural substrates than) 
knowledge of temporal relationships (cue 
learning). The hippocampus, in their view, is 
critical for learning the spatial aspects of a 
relationship between stimuli but unimportant 
for learning the cuing of temporal relation- 
ships. Black et al. (1977) make this explicit in 
reviewing the data that indicate no deficit in 


| jrtaste aversion learning in animals with hip- 


pocampal lesions: "This result is to be ex- 
pected, since this procedure (taste aversion) 
does not seem to involve spatial strategies, at 
least as it has been employed so far” (p. 1123). 

The strong implication here is that hip- 
pocampal lesions should not affect the tem- 
poral associations made in a classical condi- 
tioning preparation such as the rabbit NMR. 
{As I point out in subsequent sections, although 

ere are some tasks in which hippocampal 
lesions do not seem to affect the classically con- 
ditioned NMR, there are others in which 
animals with hippocampal lesions differ from 
controls. Furthermore, the electrophysiological 
data and the data from brain stimulation 
studies also implicate the hippocampus in 
classical conditioning of the NMR. 


75 Absence of Spatial Cues in the 
Rabbit NMR Preparation 


Black et al. (1977, p. 1108) state that when 
an animal is learning spatial relationships be- 
tween stimuli, it is using place strategies, and 
when it is learning temporal relationships be- 
tween stimuli, it is using cue strategies. The 


authors also point out that, although an animal 
* 
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performs successfully in a particular situation, 
it may be difficult to determine to what degree 
each strategy is being used. Consequently, it 
may be necessary to make a priori assumptions 
about each task. 

Although it is possible that spatial learning is 
involved in classical conditioning paradigms 
such as the conditioned emotional response 
(CER) (see Black et al., 1977, p. 1114), it is 
unlikely that spatial cues play any role in the 
rabbit NMR preparation. Here, the animal 
remains virtually motionless throughout the 
conditioning session, and the conditioned 
stimuli (CS) and unconditioned stimuli (UCS) 
are delivered in the same spatial locations at 
all times. Thus any associations, for example, 
between the CS and the UCS, would neces- 
sarily be temporal in nature. 

The rabbit NMR preparation used in most 
laboratories is a variation of that described by 
Gormezano (1966). The rabbit is restrained in 
a Plexiglas box with an adjustable plate and 
ear clamp securing the head and a second plate 
placed over the animal's back to restrict body 
movement. Some experimenters further re- 
strain the animal's head by implanting long 
bolts in the animal's skull and fastening them 
to the plexiglas box (e.g., Frey, Maisiak, & 
Dugue, 1976; Mis, 1977). Animals are typically 
run in individual sound-attenuated and dark- 
ened chambers. A panel in front of each 
chamber contains two lights that serve as visual 
CSs and speakers for delivering auditory CSs. 
Several laboratories (see Wagner, Rudy, & 
Whitlow, 1973) also use a vibratory CS de- 
livered to the back of the animal. 

The UCS is typically a 1-3 mA infraorbital 
shock, although some experimenters prefer an 
air puff. The unconditioned response (UCR) 
and conditioned response (CR) are lateral 
movements of the nictitating membrane. This 
response is recorded by attaching the shaft of 
a potentiometer to the NM, thus transducing 
the response into a DC signal. A typical session 
in our laboratory consists of 100 CS-UCS 
pairings in a 50-minute session. 


Lesion Studies 
Since acquisition of the rabbit's NMR is not 


dependent on spatial strategies, the spatial 
information processing theory would predict 
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that hippocampal lesions should have no effect 
on this behavior. This, in fact, seems to be the 
case. Several studies (Schmaltz & Theios, 
1972; Solomon, 1977; Solomon & Moore, 1975) 
reported no diflerences in acquisition of the 
NMR in animals with bilateral aspiration 
lesions of the dorsal hippocampus, animals with 
lesions of the overlying cortex, or unoperated 
controls. The dependent measure in these 
Studies was either total CRs or trials to cri- 
terion. This, however, does not rule out the 
Possibility that other more subtle aspects of 
the nacent CR, such as amplitude, latency, or 
interstimulus interyal (ISI) shifts, could have 
been affected.! In addition, there are electro- 
physiological data (see the Electrophysiological 
Studies section) which suggest that the intact 
hippocampus is involved in acquisition of the 


Although the acquisition data may be con- 
sistent with the spatial Processing view of hip- 
pocampal function, there are data from other 
tasks conducted in the rabbit NMR prepara- 
tion that are not. Like simple acquisition, these 
tasks require that the animal learn a temporal 
relationship between a CS (or a series of CSs) 
and a UCS (or the absence of a UCS). Unlike 
acquisition, these tasks require that the animal 
learn not to respond in certain circumstances. 
In these situations the animal must learn 
whether a particular CS is relevant, that is, if 
it uniquely predicts the occurrence or nonoc- 
currence of the UCS. These tasks require the 
animal to learn temporal relationships between 
CSs and UCSs and thus, according to spatial 
information processing theory, should be per- 
formed equally well by animals with and with- 
out the hippocampus. 

Latent inhibition (LI) is one behavior that 
is disrupted in animals with hippocampal 
lesions. In this paradigm the animal is pre- 
exposed to the to-be-conditioned CS, and this 

preexposure results in retarded acquisition of 
the CR when the CS is subsequently paired 
with the UCS in a conditioning paradigm. In 
one study conducted on the LI effect (Solomon 
& Moore, 1975), we reported that whereas 
450 tone preexposures resulted in a decrement 
in conditioning for normal rabbits and for 
rabbits with cortical ablations, animals with 
dorsal hippocampal ablations showed no such 
decrement; that is, they conditioned as fast as 
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nonpreexposed controls. Our interpretation of 
these data was that the hippocampus is part 
of a system involved in learning to ignore ir- 
relevant stimuli. To further test this view of 
the role of the hippocampus, we investigated 
the effects of dorsal hippocampal ablations in 
Kamin’s (1968, 1969) two-stage blocking 
paradigm. 

The typical blocking paradigm in the rabbit 
NMR preparation consists of a two-group 
design (cf. Marchant & Moor ^ 1973). In 
Stage 1 the blocking group is presented with 
à tone that is paired with an eyeshock UCS 
until the CR is well established. Animals in 
the control condition are yoked to the blocking 
animals and simply sit for a corresponding 
amount of time with no CS or UCS presenta- 
tions. In Stage 2 both groups are conditioned 
to a compound CS that consists of the tone 
from Stage 1 plus a light. After both groups 
display a high level of conditioning to the 
compound, the test phase is introduced. During 
testing, all animals are presented with non- 
reinforced presentations of the tone inter- 


| 
| 


| 


spersed with nonreinforced light presentations: | 


In general, whereas animals in the control 
condition give CRs to both the tone and light, 
animals in the blocking groups respond only to 
the tone (Marchant & Moore, 1973). 
Although Kamin (1968) initially suggested 
that prior conditioning to the tone caused the 
animal not to notice the light when it was pre- 
sented in compound with the tone, more recent 
evidence (Kamin, 
1975) suggests that the animal does initially 
attend to the redundant CS but does not condi- 
tion to it, since the light provides no new in- 
formation regarding the reinforcing event. 
Thus the paradigm is similar to LI in that the 
animal must learn to ignore an irrelevant 
stimulus. If a tuning. out process similar to the 
one in latent inhibition js operating here, hip- 
pocampal lesions should disrupt the blocking 


finding. Although normal rabbits and rabbits 
i a showed the typical block- 
ing effect, hippocampectomized rabbits did not. 
Animals with dorsal hippocampal ablations re- 


* I am grateful to Michael M. Patterson (Note 1) of 


Ohio Universi! for = 
possibility, ty for suggesting this as yet untested 


1969; Mackintosh, 1973... 


f 


Solomon (1977) reported just this? 
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sponded to both the tone and light during 
testing. 

Conditioned inhibition (CI) is yet another 
behavior in which an animal must learn not to 
respond in certain situations. Both we 
(Solomon, 1977; Moore, Mis, & Solomon, 
Note 2) and Black et al. (1977; Nadel et al., 
1975) agree that the hippocampus is not 
critical to inhibitory processes. But whereas 
Black et al. argue against the inhibitory hy- 
pothesis of hippocampal function in favor of 
a spatial hypothesis, we maintain that CI, un- 
like blocking and LI, does not involve learning 
to ignore an irrelevant stimulus and thus 

ch should not be affected by hippocampal damage. 

Although a number of theoretical accounts 
of hippocampal function have stated that the 
hippocampus plays a direct role in Pavlovian 
conditioned (e.g., Kimble, 1968) or internal 
(e.g., Douglas, 1967) inhibition, only recently 

' has this been directly tested. In the Pavlovian 

CI paradigm, the animal is taught to discrimi- 

nate between CSa, which is always followed by 

the UCS, and a compound consisting of 

Ba СБА + Св, which is never followed by the 

| UCS. Since CS; is negatively correlated with 

the UCS, it should become inhibitory (Hearst, 

| 1972; Rescorla, 1969). That is, when CS, is 

| presented alone and reinforced, acquisition of 

the CR should be retarded relative to controls. 

Experimenters using the rabbit NMR prepara- 

tion have indicated that this is the case in 

normal rabbits (Marchant, Mis, & Moore, 

1972; Marchant & Moore, 1974; Mis, 1977), 

and Solomon (1977) showed that this was also 

true for rabbits with dorsal hippocampal 
ablations. 

Black et al. (1977) would likely argue that 
the failure to find a disruption of CI indicates 
that no spatial cues were involved. This ap- 
pears to be true; however, this is not the factor 
that separates this paradigm from blocking 
and BI. Rather, we propose that the difference 
~ between CI, in which hippocampal lesions have 
? no effect, and blocking and LI, in which they 

do, is that CI does not involve the tuning out 
of an irrelevant stimulus. In the Pavlovian CI 
paradigm, despite the fact that CSp is not 
followed by the UCS, С5в cannot be ignored. 
ТЕ the animal attempts to tune out CSs, it 
could not make the distinction between CSa 
and CS4,. Thus the failure to find a disrup- 
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tion of CI, at least from our theoretical 
perspective, was not surprising. 

In summary, the data on the role of the 
hippocampus in acquisition of the classically 
conditioned NMR may be consistent with the 
spatial hypothesis of hippocampal function 
(but see the next section). Hippocampal 
lesions do, however, disrupt blocking and LI 
of the rabbit’s NMR, but they do not seem to 
affect CI in this paradigm. Since each of these 
tasks involves temporal relationships between 
stimuli, it is not likely that a spatial hypothesis 
of hippocampal function can account for the 
differences. Rather, it appears that these data 
are best explained in terms of the role of the 
hippocampus in learning to ignore irrelevant 
stimuli. 


Electrophysiological Studies 


As I indicated earlier, much of the support 
for the spatial mapping hypothesis stems from 
electrophysiological data (O'Keefe & Black, 
1979; O'Keefe, 1976; O'Keefe & Dostrovsky, 
1971; Olton et al, 1978). These studies in- 
dicated that single-unit activity in the hip- 
pocampus was related to place but not to cue 
learning. There is, however, substantial electro- 
physiological data from the rabbit that im- 
plicates the hippocampus in classical condi- 
tioning of the rabbit’s NMR. This work has 
been carried out by Thompson and his co- 
workers (Berger, Alger, & Thompson, 1976; 
Berger & Thompson, 1978a, 1978b; Berry & 
Thompson, 1978; Thompson, 1976). 

Berger et al. (1976) reported increased 
neural activity in the hippocampus during 
pairings of a tone CS and corneal air-puff UCS. 
They found an increase in neural activity in 
both the pyramidal and granule cell layers of 
dorsal hippocampus that correlated with the 
behavioral CR. This increased neural activity, 
which the authors suggest may be one of the 
earliest neuronal indicators that learning has 
occurred, begins early in conditioning (in fact, 
during the first eight, CS-UCS pairings) and 
precedes the behavioral response by 35-40 
msec. Initially, the neuronal response precedes 
the UCR, but as CRs begin to occur, the 
neuronal activity both increases and moves 
forward in the CS-UCS interval and always 
precedes the behavioral response by 35-40 
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msec. Furthermore, there was no increase in 
activity in unpaired controls, ruling out the 
possibility that the activity was due to 
pseudoconditioning, sensory registration, or 
motor output. A subsequent study (Berger & 
Thompson, 19782) investigated single-unit ac- 
tivity in hippocampal pyramidal cells. As in 
the case of multiunit activity, the behavior of 
these cells was well correlated with the condi- 
tioned NMR. 

Additional evidence for the role of the hip- 
pocampus in classical conditioning of the 
rabbit’s NMR comes from a recent study by 
Berry and Thompson (1978). In this experi- 
ment, the authors were able to predict the rate 
of acquisition of the NMR by examining the 
hippocampal  electroencephalogram (EEG) 
prior to conditioning. They found that animals 
that displayed relatively high levels of activity 
in the high-frequency category (8-22 Hz) 
conditioned more slowly than animals that 
showed a higher proportion of activity in the 
low-frequency range (2-8 Hz). Berry and 
Thompson suggested that this may be an 
indication that behavioral state," as indicated 
by the hippocampal EEG, is an important 
factor in learning. Interestingly, Moore, 
Goodell, and Solomon (1976) reported that 
the cholinergic blocker scopolamine, which dis- 
rupts hippocampal activity in the low-fre- 
quency range (see Stumpf, 1965), greatly 
retards conditioning of the NMR to a tone or 
a light CS. 

At first glance, the electrophysiological data 
on acquisition of the NMR may seem incon- 
sistent with the results of lesion studies: If a 
structure is involved in a behavior, should not 
removal of the structure affect that behavior? 
Not necessarily. The finding that removal of 
a structure does not disrupt conditioning should 
not be interpreted to mean that the structure 
does not participate in the behavior in the 
intact animal. Furthermore, as "Thompson 
(1976) pointed out in regard to the electro- 
physiological data, there is the possibility that 
brain structures besides the hippocampus (e.g., 
brainstem) may show similar changes in ac- 
tivity early in conditioning. Thus it may be 
that the hippocampus is monitoring the ac- 
tivity of other areas. If this is the case, hip- 

pocampal lesions would not necessarily disrupt 
acquisition of the CR. Finally, as mentioned 
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earlier, the effects of hippocampal lesions have 
only been investigated in terms of crude de 
pendent measures such as the presence or 
absence of a CR. A more fine-grained analysis 
of the effects of hippocampal ablation on con- 
ditioning of the NMR may yield differences. 


Electrical Stimulation of the Brain 


Like the data from lesion and electrophysio- 
logical studies, research using electrical stim- 
ulation of the brain also implicates the hip- 
pocampus in conditioning of the rabbit's NMR. 

A recent study by Salafia, Romano, Tynan, 
and Host (1977) found that posttrial stimula- 
tion of the hippocampus immediately following 
the UCS offset retarded conditioning of the 
NMR. Once CRs began to occur, however, 
posttrial stimulation was without effect. The 
level of electrical stimulation in these studies 
did not produce convulsions, but it was suffi-' 
cient to evoke poststimulation seizure activity 
in hippocampus. Based on these data, Salafia 
et al. (1977) concluded that the disruptive 
effects of posttrial hippocampal stimulation Р 
primarily affected association processes, as 
opposed to either the registration of the stimu- 
lus or the execution of the response. Subse- 
quent work by Salafia (Salafia, Chiaia, & 
Ramirez, in press) indicates that the develop- 
ment of the CR can be as severely retarded if 
subseizure stimulation is used in either the 
dorsal hippocampus or amygdala. 


Spatial and Temporal Information Processing 
in the Hippocampus? 


Black et al. (1977) propose that the hip- 
pocampus is involved in the formation of 
spatial maps of the environment. The data and 
arguments presented by these authors as well 
as the data presented by researchers working 
in similar paradigms (e.g., Olton et al., 1978) 
Suggest that this is indeed one function of the 
hippocampus. Nevertheless, the evidence from 
the rabbit NMR preparation indicates that at 
least in the rabbit, the hippocampus is in- 
volved in other processes. 

Since most of the data that support the 
spatial information processing theory are based 
on the rat, it is tempting to speculate that the 
apparent discrepancies between these argu- 
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ments and those presented in this article are 

‘entirely due to species differences. Although 
« differences in species and perhaps more im- 
portantly, differences in how dependent each 
species is on spatial information, are likely to 
account for some of the variability, it is equally 
likely that these differences do not account for 
the entire discrepancy. For example, Black et 
al. (1977, p. 1113) cite many studies that show 
facilitated avoidance learning in hippocampec- 
tomized animals. To explain this facilitation, 
they argue that these animals are not subject 
to interference from spatial cues. This facili- 
tated avoidance, however, occurs not only in 
rats but also in hippocampectomized rabbits 
(Papsdorf & Woodruff, 1970). Similarly, we 
(Solomon & Moore, 1975) claimed, on the basis 
of rabbit NMR data, that the disruption of LI 
and blocking in animals with hippocampal 
lesions was due to the animal's inability to 
! learn to ignore an irrelevant stimulus. These 
findings are not unique to the rabbit, as similar 
results have been reported for the rat in LI, 
using both the two-way avoidance (Ackil, 
Melgren, Halgren, & Frommer, 1969) and 
taste aversion paradigms (McFarland, Kostas, 
& Drew, 1978) and in blocking in the CER 
paradigm (Rickert, Bennett, Lane, & French, 
1978). Furthermore, Patterson, Berger, and 
"Thompson (1979) found that hippocampal-unit 
activity in the cat during NMR conditioning 
is similar to that found in the rabbit. This sug- 
gests that the electrophysiological evidence in 


the rabbit NMR preparation is also not species 


«ser dependent. Thus, although it is possible that 


the rat is more of a spatial animal than the 
rabbit, especially the restrained rabbit, this 
potential difference cannot explain the ap- 
parent discrepancies in hippocampal function 
between the two species. (But see Winson, 
1972, for a discussion of species differences and 
hippocampal function.) Rather, it seems that 
although the hippocampus may be involved in 
spatial information processing, it may be im- 


b plicated as well in the processing of temporal 


information about stimuli. 

Without attempting to add yet another all- 
encompassing theory of hippocampal function, 
I would like to suggest that in addition to the 
formation of spatial maps (and no doubt 
participation in a variety of other functions; 
see Isaacson & Pribram, 1975, pp. 429-442; 
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Nadel & O'Keefe, 1974, for a review of the 
reviews) one function that the hippocampus 
might participate in is a type of * temporal 
mapping." By this I simply mean registration 
of the temporal sequence of events. In the 
rabbit NMR preparation, this would be a` 
coding of the relationship between the CS or 
the CSs and the UCS. Thus in LI the hip- 
pocampus receives information that a CS has 
occurred and that it is not followed by а 
relevant (i.e., motivationally significant) event, 
namely, the UCS. Consequently, the hip- 
pocampus participates in a process whereby 
the irrelevant stimulus is tuned out. One in- 
teresting way to test this idea of hippocampal 
function in LI would be to record from the 
hippocampus not only during stimulus pre- 
exposure but perhaps more importantly, during 
conditioning following the stimulus preex- 
posure. Although this has yet to be done in the 
rabbit, Best and Best (1976) presented data 
from an LI paradigm in the rat that showed 
increased firing in hippocampal cells during 
conditioning for nonpreexposed animals but 
decreased activity in animals that had been 
preexposed. 

In blocking, the animal learns a slightly more 
complex temporal sequence: CS, always pre- 
dicts the UCS, and any additional information 
(e.g., CS) in regard to the UCS is redundant 
and, like a preexposed stimulus, should be 
tuned out. 

The notion of temporal information proc- 
essing may also be consistent with the data on 
excitatory conditioning despite the finding 
that hippocampal lesions do not disrupt this 
form of learning. In excitatory conditioning the 
hippocampus may receive information that a 
CS has occurred and that it is closely followed 
by a relevant event, the UCS. This temporal 
sequence is coded in the hippocampus and this 
coding may account for the change in neural 
activity reported by Thompson and his co- 
workers during CS-UCS pairings. In fact, 
Thompson notes (cf. Berger & Thompson, 
1978b) that hippocampal-unit activity actually 


2I am grateful to John W. Moore of the University 
of Massachusetts at Amherst for discussions on tem- 
poral information processing in the hippocampus. See 
Moore (1979), who proposes a neural model to account 
for many of the phenomena discussed in this article. 
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forms a temporal model of the temporal form 
of the behavioral NMR. Since the CS is paired 
with a relevant event, the UCS, the hip- 
pocampus should not act to tune it out. 
Furthermore, although a hippocampectomized 
animal loses the ability to tune out irrelevant 
stimuli, this would have no effect on simple 
acquisition, since tuning out an irrelevant 
stimulus is not required. Thus, whereas the 
hippocampus receives information about the 
Sequence of events transpiring during excita- 
tory conditioning, it need not initiate action, 
since the CS is relevant. Consequently, hip- 
pocampal lesions have no effect on acquisition 
of the CR. This is not to say that the hip- 
pocampus is not critical to acquisition of the 
CR under certain circumstances, For example, 
animals with hippocampal lesions should be 
more susceptible to the deleterious effects of 
external inhibitors (distractors) on condition- 
ing, and there is some preliminary data to 
suggest that this is the case (Solomon & Moore, 
1975, Experiment 2). 

Salafia et al.’s (1977) posttrial stimulation 
data also fit nicely into this scheme. Hip- 
pocampal stimulation could result in the tuning 
out of all stimuli, both relevant and irrelevant, 
and would thus have the effect of retarding 
conditioning. 

In summary, it may be that the hippocampus 
is capable of establishing the relationship (or 
to use Black et al.’s terminology, “maps” the 
relationship) between many kinds of stimuli. 
Certainly the seemingly redundant lamellar 
organization of the structure (e.g., Anderson, 
Bliss, & Skrede, 1971) coupled with its ana- 
tomical relationships to sensory and '5motiva- 
tional" systems of the brain suggests this 
potential. What type of mapping occurs, how- 
ever, may depend on what animal is being 
investigated, what type of stimuli are available, 
and what type of problem the animal is asked 
to solve. 
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In his article, Solomon (1979) commentson (р. 1272), ће argues that “it cannot account 
our spatial map theory of hippocampal func- ог the data from the classically conditioned 
tion and points to what he sees as its inade- rabbit nictitating membrane response prepa- 
quacies in dealing with a certain set of data. ration" (p. 1272). After reviewing data from 
Although he allows that "the spatial informa- both lesion and electrophysiological experi- 
tion processing view of hippocampal function ments on the rabbit's nictitating membrane 
accounts for much of the available data" response (NMR), he concludes that our 

theory needs to be modified: In addition to 
аа. АЕА E deo mapping function, the hippocampus 

The research of J. O'Keefe was su must also participate in a type of temporal 
Medical Research Council of England, and the коза Mapping. During NMR conditioning а tem- 
of L. Nadel was supported by the Natural Sciences and poral map would encode the temporal relation- 


o ен Nun of ae pCa А Ship between the conditioned stimulus (CS) or 
collaborated on the research described herein. This CSS and unconditioned stimulus (US) and ^ 
reply to a critique of an article written with Abe Black Participate in a process Whereby irrelevant 


Cerebral Functions Group, Department of Aun. forcement contingencies) are ignored or “tuned 
University College London, Gower Street, London Out.” He thus seeks to broaden the information 


WCIE 6BT, England. content of the map and at the same time re- 
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strict its output function to the inhibitory one 
of filtering out sensory stimuli. 
b Although we are not against modification 
nd extension of the theory in principle (see, 
for example, our discussion of semantic maps 
in O’Keefe & Nadel, 1978), we feel compelled 
to resist Solomon's particular modification. 
The reasons for doing so are set out in an 
earlier article on the distinction between 
theories and hypotheses (Nadel & O'Keefe, 
1974). In brief, the power of the theory as it 
now stands derives from its ability to make 
clear predictions that can be tested and judged 
to be correct or incorrect. Solomon’s modifica- 
~ ^ would overextend the theory's explanatory 
ower at the expense of its predictive power. 
Let Solomon say it: “Та summary, it may be 
that the hippocampus is capable of establishing 
the relationship (or to use Black et al.’s ter- 
minology, "maps" the relationship) between 
many kinds of stimuli . . .. What type of 
mapping occurs, however, may depend on 
what animal is being investigated, what type 
of stimuli are available, and what type of 
Ne the animal is asked to solve" (pp. 
278). In short, the new hybrid theory has 
| been reduced to an hypothesis, all of the im- 
portant questions to be answered by ad hoc 
postulates. Before allowing that to happen, 
we must examine carefully Solomon's asser- 
tions and the data on which they rest. We try 
to show that these data are only apparently in 
conflict with the cognitive map predictions. 
| is article is divided into four parts and we 
| p that order in our comments. In the first. 
part (the section on The Spatial Information 
Processing Hypothesis and Classical Condi- 
tioning and the section on Absence of Spatial 
Cues in the Rabbit NMR Preparation), Solo- 
mon discusses our spatial map theory and con- 
cludes that there is little or no role for the 
mapping system in classical conditioning of the 
rabbit NMR. We are at pains to deny this and 
int to several ways in which the map could be 
"involved and how these ideas can be tested. In 
the second part of his article (the section on 
Lesion Studies), Solomon reviews experiments 
on the effects of hippocampal lesions on 
classical conditioning, especially that of the 
NMR. He concludes that lesions do not affect 
the acquisition of simple excitatory condition- 
ing or the development of conditioned inhibi- 
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tion but that they do disrupt latent inhibition 
and blocking. Here we look more closely at 
the role of background cues (ie. places) in 
these paradigms and conclude that the data 
do not support Solomon's interpretations. In 
his third part (the sections on Electrophysi- 
ological Studies and Electrical Stimulation of 
the Brain), Solomon cites recent experiments 
that show changes in multiple-unit and single- 
unit recordings from the hippocampus during 
classical conditioning of the NMR. We have 
commented briefly on these studies elsewhere 
(O'Keefe & Nadel, 1978, pp. 190-194) and 
extend those comments here. In a final part 
(the section on Spatial and Temporal Informa- 
tion Processing in the Hippocamus), Solomon 
gives a resume of his own notion of a temporal 
map. We feel that the notion as presented has 
little explanatory power and constitutes an 
unjustified and unnecessary extension of the 
hippocampal cognitive map theory. 


Place Cues: Their Role in Classical 
Conditioning 


Solomon asserts that place learning plays 
little if any role in classical conditioning of the 
rabbit NMR because “the animal remains 
virtually motionless throughout the condition- 
ing session, and the conditioned stimuli (CS) 
and the unconditioned stimuli (UCS) are 
delivered in the same spatial location at all 
times” (p. 1273). Solomon may be forgiven 
for misunderstanding our position, since we 
have not emphasized the role of place learning 
in classical conditioning, but we are somewhat 
surprised that he has chosen to ignore the 
recent experimental and theoretical literature 
that points to the importance of background 
cure or places in classical conditioning. 


116 is not clear whether Solomon intends the re- 
striction of the output function to be applied only to 
the enlarged map's temporal functions or to the spatial 
functions as well. We have presented the arguments 
against a general inhibitory function for the hippo- 
campus elsewhere (e.g., Nadel & O'Keefe, 1974; Nadel, 
O'Keefe, & Black, 1975) and will not repeat them here. 

? One gets the impression that the term background 
cues” is used to refer to a collection of individual cues 
that, although unidentified, do not differ in principle 
from the specific foreground CSs. We have argued 
that this view is only partially correct: In most experi- 
mental situations the major role of the background 
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According to the cognitive map theory, 
there are three different ways that the hippo- 
campus can be involved in learning and 
performance. 

1. Learning about places: Organisms built 
maps of environments. An environment or 
specific places within it can be tagged as 
dangerous or as containing rewards such as 
food or water. 

2. Strategies using places: Using its map, 
the animal can identify its position within the 
environment as well as the distance and 
direction to other places. 

3. Attention to places: The mapping system 
can aid in learning about a stimulus in an 
environment either by directing the animal's 
attention to particular places in which that 
stimulus occurs or by assessing the stimulus 
as novel, that is, one that is not represented 
in the mapping system as occurring in that 
place in the environment. 

One way in which Pavlovian tasks differ 
from instrumental tasks is that they rule out 

the second use of the mapping system as a 
successful strategy for solving problems; by 
definition, nothing the animal does can affect 
the contingency between the CS and US. 
This does not mean that the animal won't try 
to use such place strategies; under certain 
circumstances they may actually conflict with 
what the experimenter is measuring as learning. 

Several experiments have pointed to the 
role of place factors in Pavlovian conditioning. 
Sheffield (1965) reported that during classical 
conditioning of the salivary response, dogs 
would often emit bursts of salivation during 
the intertrial interval: 


"These bursts are such an ever-present nuisance that the 
experimenter, who is watching the dog, the polygraph, 
and the tape that programs trials, is kept in a continual 
state of anxiety lest a burst start just ahead of CS, 
spoiling the record of what otherwise might be a clear 
cut CR. Also it is often difficult to tell whether a “CR” 
was actually a response to the CS or whether it was a 
lucky burst, accidentally timed with the CS. Moreover, 
the bursts between trials look so much like the response 


cues is to identify environments or specific parts of 
an environment. These places have some properties in 
common with foreground CSs but differ in many im- 
portant respects (see O’Keefe & Nadel, 1978, pp. 
80-101). Here we assume that most of the effects 
attributed to background cues are due to their role in 
defining places. 
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to CS that one is obliged to raise the question of 
whether the bursts also have a “CS” and if so, whai 
the nature of this unobserved CS is. (p. 314) ` 


These bursts were maximal early in condition- 
ing, declining somewhat later in training. 
Sheafor (1975) has studied such pseudo- 
conditioned responses during classical condi- 
tioning of the rabbit’s jaw movement using a 
water US. He concluded that they resulted 
from an association of the background cues 
with the US and were not due to trace or 
temporal conditioning to the specific fore- 
ground CS. Part of his evidence was that these 
pseudoconditioned CRs could be extinguished 
by leaving the animal in the test situation 
without presentation of the CS or US. 

The most straightforward way to test for 
the development of a place hypothesis during 
classical conditioning is to carry out probe 
trials during which the animal is tested in a 
different environment or is freed from re: 
straint and put either in the usual testing 
place or in a new nearby place. Zener (1937) 
was the first to do these latter tests and some 
of his results might be interpreted as support 
for place learning. When put unrestrained on‘ 
a new testing stand and presented with a 
CS signaling food, the dog left the new stand, 
went over to the usual stand and waited for 
the food to be delivered. There is some indica- 
tion that the animal would have moved from 
the new table to the old in the absence of the 
CS but was prevented from doing so by Zener. 
In response to a different CS that signaled 
acid in the mouth, the same animal first leff- 
the new platform and returned to the usual 
testing place but then turned around and 
walked away from it. 

In certain classical conditioning paradigms, 
this kind of place learning seems to conflict 
with conditioning to the experimenter-desig- 
nated CS. Rescorla and Wagner (1972) have 
suggested that a particular US has a limited 
amount of associative strength and that this, 
must be shared amongst all of the stimuli 
conditioned to it. The amount of conditioning 
to each stimulus depends on how good it is at 
predicting the US in relation to other stimuli 
in the situation. In the standard classical 
conditioning paradigm, this associative strength 
is, at least initially, shared between the back- 
ground and a compound consisting of the 
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background cues and the specific foreground 
€S. Manipulations that alter the strength of 
«conditioning to the background cues should 
ave an inverse effect on conditioning to the 
specific foreground CS and vice versa. Whether 
one accepts the notion that there is a limited 
amount of associative strength to be shared 
(see Mackintosh, 1975, for an alternative 
view), it seems clear that spatial context can 
influence conditioning to specific foreground 
CSs. Several experiments provide support for 
this idea. 
Ina conditioned emotional response (CER) 
study, Dweck and Wagner (1970) showed that 
the fear associated with the 
und cues by exposing the animal to 
the environment in the absence of the CS 
and US during training increased conditioned 
j suppression to the specific 
(1978) confirmed this finding in a somewhat 
* different situation, whereas Odling-Smee (1975) 
showed the converse: Increasing the correla- 
tion between the specific CS and US reduced 
the aversiveness of the place in which condi- 
tioning occurred, even though establishing à 
perfect correlation between CS and US did not 
totally eliminate fear of the training environ- 
ment. Taken together, these studies support 
the notion that fear can be conditioned to the 
environment and suggest that the degree of 
such conditioning is influenced by the reliability 
with which the specific stimuli and the spatial 
context (i.e. background cues) predict the US. 
;. Tn addition to participating in conditioning 
the manner previously described, the mapping 
system also can become involved through its 
misplace subsystem. This component detects 
discrepancies between the stored representa- 
tions of the spatial array of stimuli that occur 
at a particular place in an environment and 
those that the animal perceives at any given 
moment. Mismatches generate exploration 
that serves to incorporate the changes into the 
Anew representation of that place. We have 
argued that the introduction of a new stimulus 
into an environment has two kinds of effects: 
One effect is due to its novelty, which generates 
a mismatch in the hippocampus, and the other 
is due to its noticeability, which depends on 
the properties of the sensory analyzers, how 
intense the stimulus is, how recently it or а 
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similar stimulus has been experienced, and 
so forth. 

Unlike noticeability, the novelty of a stimu- 
]us depends on the environmental context in 
which it occurs. This being the case, the 
mapping system should be involved in those 
aspects of Pavlovian conditioning for which 
novelty is à critical factor. In latent inhibition 
(LI), for instance, experience with a stimulus 
and hence à reduction in its novelty retards its 
subsequent associability with a US (eg, 
Lubow, 1973). This loss of associability seems 
limited to the context in which exposure to the 
stimulus occurred. Lubow, Rifkin, and Alek 
(1976) looked at the effect of exposure to 
olfactory stimuli in the same or different en- 
vironments. LI was only obtained when the 
CS was preexposed in the same environment 
in which conditioning subsequently occurred. 
There was no retardation of conditioning to 
a CS that had been preexposed in a different 
environment. These data suggest that the 
hippocampal mapping system is involved in 
LI: By defining the context of occurrence, the 
map makes novelty possible. 

In sum, the hippocampal cognitive map 
theory predicts that the hippocampus is in- 
volved in classical conditioning, often in rather 
subtle ways. We can now turn to the changes 
that Solomon cites as inexplicable within the 
framework of the theory as it stands. 


No Change in NMR Acquisition After 
Hippocampal Lesions 


Solomon concentrates on the effects of 
lesions on various conditioning paradigms in- 
volving the rabbit NMR but also makes 
reference to other conditioning studies. He 
admits that hippocampal lesions do not affect 
acquisition of the NMR, which is in line with 
the mapping theory; but suggests that more 
sensitive response measures might reveal subtle 
changes in amplitudes, latency, oF interstimulus 
interval (ISI) shifts. In view of Sheafor’s 
(1975) evidence that CRs can be elicited and 
influenced by background cues, we concur. It is 
not immediately apparent, however, how 
Solomon’s ideas would explain such effects. 
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Hippocampal Lesions, LI, and Blocking 


We now turn to a closer examination of the 
conditioning paradigms in which hippocampal 
lesions do produce deficits and to the inter- 
pretation Solomon places on these data. We 
begin with an analysis of LI, briefly discuss 
blocking, and then move on to conditioned 
inhibition, which provides a critical test of our 
respective positions. Our argument throughout 
Will be that (a) place learning does have a 
role in certain Pavlovian procedures, and (b) 
the effects of a hippocampal lesion on an 
animal's performance in any given paradigm 
can be predicted from a knowledge of the 
role of place learning in that procedure. 

Solomon and Moore (1975) have shown that 
LI in NMR conditioning is disrupted by hippo- 
campal lesions. Solomon (1979) cites two 
other studies that have found a failure of LI 
in different paradigms as evidence of the 
generality of the effect (Ackil, Mellgren, 
Halgren, & Frommer, 1969, in two-way 
avoidance, and McFarland, Kostas, & Drew, 
1978, in taste aversion learning’), 

Our theory already predicts an effect of hip- 
pocampal lesions on LI insofar as it is de- 
pendent on environmental context; there is 
no need to extend the theory in the way 
suggested by Solomon. Moreover, taken as it 
stands, the cognitive map theory Suggests a 
more comprehensive interpretation of this 
phenomenon. One LI study that Solomon 
failed to cite was reported by Olton and 
Isaacson (1968). Superficially similar to the 
Ackil et al. study cited by Solomon in that 
it used a two-way avoidance task, this study 
found that preexposure to the CS facilitated 
rather than retarded subsequent learning in 
intact rats. In neither study did preexposure 
influence learning in the lesioned rats. The 
failure to obtain preexposure effects in the 
lesioned animals is explicable; the opposed 
effects of CS preexposure in the intact rats in 
these two studies is less so. 

Analysis of the two-way task in terms of the 
tole of place and other kinds of learning helps 
to clarify this apparent discrepancy, The 
normal intact animal has considerable difficulty 
in learning two-way avoidance. In Black, 
Nadel, and O'Keefe (1977) we argued that this 
is due to the fact that place hypotheses hinder 
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learning, since (a) they promote freezing in 
inescapable, dangerous places, and (b) they 
retard movement back into a dangerous com- 
partment in which the animal has recently 
been shocked. In support of this view, it has 
been shown that manipulations that increase 
the discriminability of the two compartments 
of the shuttle box will exacerbate the intact 
animal's difficulties (see O'Keefe & Nadel, 
1978, p. 301). The robust facilitation of learning 
the two-way task that occurs after hippo- 
campal lesions are made offers added support; 
without the possibility of (maladaptive) place 
Strategies, these rats acquire the task easily, 
In contrast to this debilitating effect of place 
learning, specific foreground CSs, especially 
if they act as directional cues that the animal 
can approach or avoid, will enhance learning, 

The effects of preexposure on learning in the 
two-way task must be interpreted in terms of 
what is being preexposed and in terms of the 
role this component typically plays in learning. 
Preexposure to the place will reduce the likeli- 
hood of place Strategies being used in any 
subsequent conditioning: By virture of this 
Preexposure the place becomes a relatively 
unreliable predictor of the US. Thus place pre- 
exposure should enhance solution of the two- 
Way avoidance task. Preexposure to the 
Specific cues will reduce their role in condition- 
ing within that context because they will have 
been incorporated into the hippocampal repre- 
Sentation of that environment, and there will 
be no misplace output to identify them as 
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novel when they аге reintroduced with the“ 


S. To put it another way, they will have been 
absorbed or embedded into the context and as 
а consequence, they are unlikely to be rapidly 
associated with the newly introduced US. Since 
such CSs normally enhance two-way avoidance 
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* McFarland et al. compared preexposed and non- 
preexposed hippocampals and controls on their ability 
to form an aversion to a walnut i i 
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learning, their preexposure should act to 
retard it. 

x One major difference between the Ackil et al. 
(1969) and the Olton and Isaacson (1968) 
studies was in their control treatments for pre- 
exposure. In both studies, the experimental 

| group was preexposed to background cues 

and specific foreground cues. However, this 

preexposed group was compared to different 

control groups in the two studies. Olton and 

Isaacson used a no-exposure control, whereas 
| Ackil et al. used a place exposure control. 
| | This can be schematized as follows: 


Preexposed components 


Place Place + CS Nothing 


Olton & 
Isaacson = Experimental > Control 
Ackil et al. Control > Experimental — 


f This layout shows the conditions used in the 
two studies, and the signs indicate the results. 
Preexposure to the place facilitates learning; 
preexposure to both place and CS facilitates 
earning somewhat less. This pattern of re- 
sults makes sense within cognitive map theory, 
as does the absence of any preexposure effects 
in rats with hippocampal lesions. 

Another paradigm discussed by Solomon is 
blocking. The basic procedure here involves 
first conditioning to one CS (CS, — US) and 
then presenting an added CS in compound 
with the pretrained CS during subsequent 

tials (CSACSs — US). The typical finding is 
p conditioning to Сбв is weaker than would 
‘occur in the absence of prior conditioning to 

CSa. It is often said that because CS; fails to 
predict anything not already predicted by 

CSa, it is tuned out, and little conditioning 
occurs. There are defects in blocking in hippo- 
campally lesioned subjects, as Solomon notes 

(Solomon, 1977; Rickert, Bennett, Lane, & 

French, 1978). There is as yet no fully ac- 
cepted explanation of blocking. In particular, 
the role of context in the development of 
blocking has yet to be determined. The 
present analysis predicts that it should have 
an important role, similar to that seen in LI. 
We treat this specific issue and provide further 
discussion of several Pavlovian paradigms, in 
another article (Nadel & Willner, Note 2). 
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Hippocampal Lesions and Conditioned 
Inhibition 


A conditioned inhibitor (CS7) is defined as 
a stimulus that as a result of certain prior 
experiences (specified later) (a) detracts from 
the efficacy of a CS known to elicit a condi- 
tioned response, and (b) itself conditions more 
slowly than in the absence of this prior experi- 
ence (Rescorla, 1969). Rescorla and Wagner 
(1972; Wagner & Rescorla, 1972) have speci- 
fied the conditions necessary to produce a 
CS-: A CS will become a CS- when it is 
negatively correlated with the US. That is, a 
stimulus will become an inhibitor when it 
signals a lower probability of US occurrence 
than the animal would otherwise expect. 

According to Solomon (1979), it is pre- 
cisely the fact that the CS- signals a change 
in the conditions of reinforcement that pre- 
vents it from being tuned out, and this in turn 
prevents hippocampal lesions from having any 
effect on the establishment of conditioned in- 
hibition. Solomon (1977) tested this notion in 
rabbits and found no differences between 
lesioned and control subjects. So far, these 
data appear to support Solomon's assertions. 
He also assumed that the spatial mapping 
theory predicts no general defects in this task. 
We agree with Solomon insofar as we feel that 
the hippocampus has no general inhibitory 
function. However, this does not mean that 
the hippocampus has no role in the learning 
process that underlies conditioned inhibition. 
To the extent that (spatial) context is im- 
portant in the conditioning of inhibition, we 
expect defects in this paradigm in lesioned 
subjects. There are in fact a variety of pro- 
cedures that will turn a CS into a CS-, and 
it is useful to look at thése procedures with 


4 Failures of blocking (“unblocking”) can occur when 
the added CS is correlated with some change in stimulus 
conditions. For example, changes in US intensity 
(Kamin, 1969), or in the number of USs (Dickinson, 
Hall, & Mackintosh, 1976) produce unblocking. Brief, 
nonreinforced presentation of the CSaCSp compound 
also produces unblocking (Gray & Appignanesi, 1973). 
Interestingly, changes in the temporal duration (Kohler 
& Ayres, Note 1) or temporal patterning (Gray, 1978) 
of the CSs fail to produce unblocking. This set of results 
is not particularly congenial to a temporal mapping/ 
tuning out interpretation of the hippocampal role in 
blocking. 
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an eye towards the role that place learning 
plays in each of them. 

Solomon (1977) generated conditioned in- 
hibition through the use of the classical 
Pavlovian procedure (Pavlov, 1927). In this 
procedure the animal receives trials in which 
CS, is always followed by the US, inter- 
spersed with trials in which a compound of 
CS, and CS; is never followed by the US. Asa 
result of this training, the added CS, acquires 
substantial inhibitory properties (Le, be- 
comes a CS-). In this preparation, an expec- 
tancy of the US is generated by the presenta- 
tion of CS, (the CS*). When the compound of 
CS, and CS; is not followed by the US, this 
expectancy is not confirmed. The added cs 
becomes an inhibitor because it is correlated 
with the omission of an otherwise expected US. 
Background cues play little role in this case 
and hippocampal lesions accordingly have 
little effect. 

However, there are other paradigms gen- 
erating conditioned inhibition in which spatial 
context seems to be of greater importance, such 
as those in which the CS and US are explicitly 
unpaired or in so-called discriminative classical 
conditioning. In their analysis of conditioned 

nd Rescorla ( 1972) argued 
that the development of inhibition in these 
by the context 
(place) in which conditioning occurs. That is, 
generated by the 


we predict deficits in 
these paradigms are 


They then 
baseline. In 
normal rats the CS* facilitated avoidance re- 
sponding (presumably through enhanced fear), 
whereas the CS- suppressed responding. Hip- 
pocampal rats, on the other hand, showed in- 


other words, when places are important in the 
levelopment of conditioned inhibition, hippo- 
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campal lesions produce profound defects. It is 
important to note that tuning out plays no role 
in any of these conditioned 
digms, according to Solomon.* 
Thus, the available data support Solomon's 
contention that the hippocampus is involved 
in certain Pavlovian procedures but fail to 
Support his assertion that this involvement 
demands an enlarged view of hippocampal 


function that goes beyond space into time, 
Except for the untested case of blocking, 
spatial context has been implicated as an im- 


portant factor in those conditioning paradigms 
in which hippocampal lesion effects have been 
demonstrated. Furthermore, a defect was Seen 
in a version of conditioned inhibition training 
in which place learning is important but in 
which the tuning out of irrelevancy is not. 
In sum, then, the lesion data discussed by 
Solomon offer no Strong reason to abandon 
the parsimonious view that the hippocampus 
is merely a spatial mapping System. Rather, 
the lesion data argue for a closer examination 
of the role of Spatial context in tasks that 
have hitherto been thought of as relatively, 
free of such influences. 


Hippocampal Recording During NMR 


Most recent studies on 
Ranck, 1978; O'Keefe, 1976; 
О'Кееје & Conway, 1978; O'Keefe & Dostrov- 
sky, 1971 ; Olton, Branch, & Best, 1978) report 
that many hippocampal cells fire preferentially 
to one part of an environment (place cells). In 
Studies on the rabbit do not report 
Place cells but concentrate on the changes 
in unit firing during classical conditioning 


M 


* This analysis of the 
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Micco and Schwartz study 
what Previously seemed to us an 
(O'Keefe & Nadel, 1978). 
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(Thompson’s group) or on the response of 
hippocampal units to simple sensory stimuli 
such as flashing lights and pure tones (e.g 
Vinogradova, 1970, 1975, see also the discus- 
sion in Elliott & Whelan, 1978, рр. 173-175, 
192, 197). There might be several reasons for 
this discrepancy. One is that the hippocampus 
has a different function in different species. 
Winson (1972) has produced а variation of 
this argument with respect to hippocampal 
theta. This appears to be the most radical 
solution, and we agree with Solomon that itis 
unlikely to be the case. More probably, the in- 
vestigators of rabbit and rat are looking at 
different aspects of the cell responses or are 
looking at the same cells but interpreting their 
responses differently. Two concrete possibilities 
immediately spring to mind. One possibility 
is that the groups working on the rabbit 
record cells while the animal is restrained ina 
small featureless box. If there are place cells 
in the rabbit hippocampus, this is the worst 
possible environment in which to find them, 
and even if they did have fields in these en- 
vironments, the testing procedures would not 
allow the experimenters to see this. A second 
possibility is that the theta cells in the rabbit 
their firing rates during arousal 
movement. Support for this 
from the observation that 

the rabbit in response to 
as well as during movement 
1971; Kramis, 


idea can be 
theta occurs in 
sensory stimuli 
(e.g., Harper, 
Bland, 1975). 

possibilities, O’Keefe (Note 
3) has recorded units from fields CA1 and FD* 
of the hippocampus of the freely moving rabbit. 
results show that there are both 
the rabbit hippo- 
identical 
to those recorded in the rat, whereas the theta 
cells in the rabbit fire during arousal as well 
as during movement, in contrast to the rat's 
theta cells. Berger and Thompson (19782) 
the physiological 
characteristics of the hippocampal cells that 
were involved conditioning. Most of the 
units that were involved could be antidrom- 
ically activated by electrical shocks to the 
fornix and i the histograms pre- 


sented) appeared to $ à 
rates. On the other hand, units not involved 
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in conditioning could not be antidromically 
activated from the fornix; some were ortho- 
dromically activated while the rest were not 
affected at all. This latter group “tended 
to have very low spontaneous rates, sometimes 
showing interspike intervals as long as 60-120 
sec” (p. 1574). The spontaneous rates suggest 
that this last group is composed of place cells, 
whereas the high firing cells involved in condi- 
tioning are primarily theta units. However, the 
finding that some of the units involved in 
conditioning can be antidromically activated 
from the fornix and are therefore probably 
pyramidal cells is not consistent with this 
analysis. There is evidence in the rat that the 
complex spike cells (into which category the 
place cells fall) and not the theta cells are the 
pyramidal cells (Fox & Ranck, 1975, 1977). 


Hippocampal Stimulation During NMR 


Finally, Solomon cites the stimulation 
studies of Salafia and his colleagues (Salafia, 
Chiaia, & Ramirez, 1979; Salafia, Romano, 
Tynan, & Host, 1977) as support for his tuning 
out hypothesis. Salafia found that stimulation 
of the hippocampus (and more recently the 
amygdala) after each CS-US pairing retarded 
conditioning of the NMR. Solmon thinks that 
stimulation could result in the tuning out of all 
stimuli, relevant or irrelevant, and in this way 
retard conditioning. How sound is this reason- 
ing? Since hippocampal lesions have no effect 
on acquisition, Solomon must think that the 
stimulation is not producing @ temporary 
lesion but is actively blocking sensory trans- 
mission. But in Salafia’s studies the stimula- 
tion is given after both the CS and US have 
been presented and cannot have any role in 
tuning out either on that trial. The experi- 
ment that would support Solomon’s tuning out 
hypothesis would be one in which’ hippocampal 
stimulation during OY just prior to the CS 
retarded conditioning, whereas stimulation at 
other times had no effect. Stimulation after 
the trial would be a control condition designed 
to rule out general stimulation effects, con- 
solidation effects, prograde effects, and so 
forth. In general, we suggest caution in the 
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6 CA1 = cornu ammonis 1, ЕЮ = fascia dentata. 


Solomon has called attention 
on classica] conditioning that 
We have not Previously dealt with 

lay outside the 


Which an episode occurred might 
be indirectly the mapping system 
(see O'Keefe & Nadel, 1978), this isa different 
type of temporal coding from 
temporal Sequencing between stimuli invoked 
by Solomon, 

We have looked carefully at the Conditioning 
literature that Solomon cit 
that with the exception 


the results ade- 
t can also handle 
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Type I Error Rate of the Chi-Square Test of Independence 
in RXC Tables That Have Small Expected Frequencies — . 


Drake В. Bradley, T. D, Bradley, Steven G. McGrath, and 
Steven D. Cutcomb 
Bates College 
Sampling experiments are Teported which show that the uncorrected chi-square 
test of independence is exceptionally robust with respect to small expected fre- 
quencies in R x С contingency tables, In2x2,2 X3$,3x3,3x 4, and 4 x 4 


criteria usually resulted from actual error rates being smaller, not larger, than 

the nominal level. A distinction js made between accuracy of approximation 
Il and contro] of the Type I error rate as Considerations dictating the advisability 
t. 


In 1949 Lewis and Burke Published an abilities in the Population, the number of cells 
article entitled "The Use and Misuse of the in the contingency table, and the level of 

hi-Square Test.» These authors argued that Significance employed. Consequently, it ise 
Social scientists frequently misused the chi- extremely difficult to formulate simple rules of " 


Square test and that the most common error thumb that indicate when the approximation 
i it is not, 


resulted from “the use of extremely small Cochran (1952) Suggested that the approxi- 
theoretical frequencies? (Lewis & Burke, 1949, mation is Satisfactory if the actual Type I 
P. 460). Rebuttals by Peters (1950), Pastore error probabilities remain between ,04 and .06 
(1950), and Edwards (1950) did little to for tests Conducted at а = .05 or between 
resolve the developing controversy, and recent 007 and .015 for tests Conducted at a= .01. 
publications (Tate & Hyer, 1973; Tate & Given any such specific Criteria 


accept Hays’s Tecommendation (1963, p. 584, 
597) that for applications involving one 
degree of freedom, the minimum should be 

The authors wish to acknowledge the excellent E at 10, Whereas for other applications a 
Service provided by the Dartmouth Time-Sharing minimum of 5 is sufficient. Under some 
System, which was used to conduct the Computer circumstances a mini 
simulations reported in this article, fi ] А 1 
Requests for reprints should be Sent to Drake R, Seven factional expectations may be permis. 
Bradley, Department or Psychology, Bates College, sible according to some authors (Cochran, 


Lewiston, Maine 04240 1952, 1954; Good, 1961; Slakter, 1965, 1966, 


adit Р" 


Vessereau, 1958; Wise, 1963; Yarnold, 1970). 


Н age it has been recently suggested that 
a minimum expectation of 20 is necessary to 
| ensure that the approximation will be accurate 


under all circumstances (Tate & Hyer, 1973). 
Unfortunately, the absence of a strong 
consensus on this matter makes it extremely 
difficult for the researcher to know if he or she 
is employing the chi-square statistic properly. 
| A related issue concerns the use of a con- 
| tinuity or similar correction to improve the 
| approximation. There has been a continuing 
controversy over whether such corrections are 
necessary or desirable in all instances, and 
assuming that they are, over which method 
of correction provides the best approximation 
(Boschloo, 1970; Camilli & Hopkins, 1978; 
Conover, 1974; Garside, 1971, 1972; Garside 
& Mack, 1970, 1976; Grizzel, 1967; Larntz, 
1978; Mantel, 1974; Mantel & Greenhouse, 
*1968; Miettinen, 1974; Nass, 1959; Plackett, 
1964; Starmer, Grizzel, & Sen, 1974; Yates, 
1934). Part of the controversy results from 
differences in opinion as to what constitutes a 
а igood approximation. Some authors insist that 
|" Ease correction is used, the actual Type I 
1 error probability should be maintained at à 
level less than or equal to a (Mantel & Green- 
house, 1968). Others argue that the correction 
should simply minimize the absolute difference 
between the true and nominal values. The 
latter technique, although it permits some 
inflation in the Type I error rate, may often 
_ be more accurate than the former. There has 
also been disagreement as to whether so-called 
«exact tests provide the appropriate standard 
for evaluating the ability of approximate 
tests to control the Type T error rate at the 
nominal level (Starmer et al., 1974). This issue 
arises because the exact test may not provide 
a test with a predetermined significance level 
as high as the desired nominal value (Garside 
& Mack, 1976; Starmer et al., 1974) unless 
it is supplemented by randomization tech- 
iques (Tocher, 1950). If so, continuity 
corrections based on the exact test as the 
stantlard may be unnecessarily conservative. 
Indeed, a number of Monte Carlo studies 
have shown that the chi-square test is relatively 
robust with regard to violations of the min- 
imum expected frequency requirement, even 
when the test is uncorrected. Slakter (1966) 
4 
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found that the chi-square test was robust in 
goodness of fit to uniform applications even 
with fractional expected frequencies. Lewontin 
and Felsenstein (1965) investigated ће2 X K 
case in a uniform population and where both 
row and column marginal frequencies remained 
fixed and found that the chi-square test was 
robust over the range of values tested. Roscoe 
and Byars (1971) extended Slakter’s findings 
to a wider range of N (sample size) and K 
(number of categories) as well as to nonuniform 
distributions. In addition, they extended 
Lewontin and Felsenstein's results to con- 
tingency tables that had rows and columns in 
all combinations of 2 to 5, although they 
elected to fix only the row marginal frequencies. 
They found that the chi-square test was 
exceptionally robust in goodness of fit to 
uniform applications and in tests of indepen- 
dence with more than one degree of freedom. 
Finally, Bradley and Cutcomb (1977) and 
Camilli and Hopkins (1978) have shown that 
for selected sets of 2 Х 2 tables in which 
neither row nor column marginals were fixed, 
the chi-square test of independence is highly 
robust to violations of minimum expected cell 
frequency. The results of these various Monte 
Carlo studies suggest that the minimum 
expected frequency requirement can be relaxed 
considerably. 

In this article we report the results of a 
comprehensive set of sampling experiments 
that evaluate the Type I error rate of the 
chi-square test of independence in АХС 
contingency tables. The simulations reported 
here differ from previous numerical or Monte 
Carlo studies on R X C contingency tables 
(Camilli & Hopkins, 1978; Garside & Mack, 
1976; Kurtz, 1968; Larntz, 1978; Lewontin & 
Felsenstein, 1965; Roscoe & Byars, 1971; 
Starmer et al., 1974) in that (a) the sample 
sizes and marginal probability distributions 
that were selected range over all values likely 
to arise in practice, and/or (b) neither the 
row nor column marginal frequencies were 
fixed. The second feature is intended to model 
the most common application of chi-square to 
R X C tables; that is, ЈУ elements are randomly 
sampled from a population and classified on 
each of two polychotomous variables, thereby 
producing a table in which the row and 
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column marginals may vary from one sample 
to the next, 


Method 


he chi-square test of independence 
(Hays, 1963, p. 591) was conducted on the resulting 


Results 


Due to space limitations, complete tables of 
the Type I error rate of chi-square in2x 2: 
2X3, 3X3, 3X4, and 
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those conditions in which the errors of а Pprox- 
imation might be considered excessive, Note 
that we have not employed Cochran's (1952) 
Suggested interval of ,04 06 in constructing 
Table 1, Although the use of an asymmetrical 


interval (.03-.06) was Motivated partly by the 
requirement to Present the data in highly 
condensed form (errors of approximation in 
the .03-.04 region were fairly common for 


N = 20), the asymmetry also parallels that у 


implicitly endorsed by most investigators, 
namely, that errors of approximation of any 
given magnitude are generally more tolerable 
when they err in the direction of conservatism! 
АЗА ЗЕМЕ 


! On occasion, a sampled table might contain one or 


more expected frequencies equal to zero; if so, the 
Corresponding cells were bypassed in the computation 
of the total value ОЁ chi-square to Prevent division 
zero. 
3 Some of the data discussed in this report arkę 


Bradley, McGrath, and Bradley (1978), and were 
presented at the 1977 Eastern Regional Meeting in 
hapel Hill, North Carolina, and the 1978 Annual 


t San Diego, California, of the American 
Statistical Association, 


3 :5: manual expansion of th 


which agrees with the exact value within the limits of 
FA ( = Ур — 2)/К, where К = 10,000 trials). 


-Of course, the investigator must also be concerned 
with maini Е adequate power in conducting tests 
of independence in Situations in which 


nominal power, Bradley and Seely (1977) have devel- 
oped power tables for de 2X da еа 
A E 


agreement with the, 


hdi — - = = LE 
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The increased probability of making a Type I 
etror is usually of greatest concern, and for 

is reason we have used the same upper limit 
(.06) for our interval as that used by Cochran 
(1952). 

Several trends may be noted with respect to 
the data presented in Table 1. First, excessive 
inflation in the Type I error rate (p > .06) 
tends to be a problem when both marginal 
probability distributions of the R X C table 
are highly skewed (e.g., rows 1, 2, 4, 9, and 14). 
Conversely, excessive deflation in the Type I 
error rate (р < .03) tends to be a problem 
when one marginal distribution of the R X C 


=i table is highly skewed and the other is rela- 


tively uniform (e.g., rows 15, 17, 22, 27, and 
36). Finally, marginal probability distributions 
that are uniform (or nearly uniform) on both 
rows and columns do not result in excessive 
errors of approximation, even when minimum 
‘expected frequency requirements (of 5 or 10) 
are violated in all R X C cells of the table, as 
often occurs for У = 20. [Theoretical expected 
frequencies may be obtained by computing 

(R) P(C;)N for each cell of the table. ] 

The error rates represented in Table 1 
pertain to tests conducted at @ = .05. As 
noted earlier, simulations were also conducted 
for a = .01 and .10 for 2 X 2 tables. Although 
the trends discussed previously were clearly 
evident in the error rate data for а = .01 
(N > 20), they were less apparent in the data 
for o = .10.° 


Discussion 
f 


These results might or might not imply a 
“liberalized” policy with regard to using the 
chi-square test on tables that have small 
expected frequencies. Since errors of approx- 
imation can be fairly substantial, with actual 
error rates ranging from 10187 to .0891 for 
N > 20anda = .05 (rows 17 and 4 of Table 
1), some investigators might elect to maintain 
à conservative policy with respect to using 
the chi-square test (see Tate & Hyer, 1973). 
However, traditional rules of thumb based 
on minimum expected frequency, without 
regard to the marginal distributions, do not 
provide selective protection against errors of 
approximation where such protection 15 needed 
most (Table 1). "That is, these rules of thumb 
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will often prohibit the use of the chi-square test 
in situations in which it provides a satisfactory 
approximation. Furthermore, it can be argued 
that accuracy of approximation per se is not 
the central issue in determining the advisability 
of using the chi-square test. Rather, the key 
issue for many investigators is the ability of 
this test to control the Type Ierror rate at or 
below some acceptable upper limit, æ’, relative 
to the nominal level, o. 

'The sampling experiments reported here 
show that for the large majority of applica- 
tions likely to arise in practice, the actual 
Type I error rates will not exceed o^ = .06 
for tests conducted at the nominal level of 
о = .05.° This is true without any correction 
for continuity and regardless of the size and 
number of small expected frequencies in the 
R X С table. For the few exceptions that are 
noted in Table 1 (р > .06), in which row and 
column probabilities are both highly skewed, 
the investigator can simply select a more 
conservative alpha level for conducting the 
test. Since the maximum inflation in error 
rate for N > 20is less than double the nominal 
value, using an adjusted level of a/2 should 
suffice in most cases for the provision of 
adequate protection at the original level 
desired. 

The preceding arguments do not, of course, 
imply that when the investigator consults a 
table of chi-square to determine the specific 
level of significance of an outcome, the tabled 
value will provide an accurate approximation 
to the exact multinomial probability. Tate 
and Hyer (1973) are correct in noting that 
these values can differ substantially. Never- 


5 То investigate in more detail the effect of small 
sample size on errors of approximation, additional 
sampling experiments were conducted at а = .05 for a 
2 X 2 table with marginal probabilities of .8, .2 and .9, 
4. Actual Type I error rates were obtained for all 
sample sizes between and including М = 4 and N = 
40. The Type I error rates ranged from .0203 (N = 7) 
to .0589 (N = 20). 

6 For a = .01 (2 X 2 tables) a parallel conclusion is 
reached, namely, that for the large majority of applica- 
tions likely to arise in practice, the actual Type I 
error rates will not exceed a^ — .015 (Cochran's upper 
limit) for tests conducted at the a = .01 level. As in 
Table 1 (row 1), the only exception to this general- 
ization was the .9, 1 and .9, 1 marginal probability 
distribution. 
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tested at some maximum Specifiable level of оѓ the power of the chi-square test о indepen 
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i isi to Contingency tables with sm; 
making a decision about the presence of wae Ge les. Развод S 
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estimating exact cumulative multinomial prob- Cochran, W. G. Th 
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A Review and 
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Over the Past 10 to 15 years, the role of the signed to modify classroom behaviors, Basic- 
behaviora] Psychologist Who works with ally, with this Procedure the teacher js 
Children has gradually changed from that of Tesponsible for Specifying the classroom rules, 
direct change agent to that of Consultant. for determining rule Violations, and fors 
ore and more frequently, Parents are trained communicating these to the parent. At home 
as behavior therapists for their child’s Prob- the parent js Tesponsible for consistently 
lems, €cent reviews indicate that this is an ISpensing rewards and sanctions to the child, 
effective and efficient treatment approach based on the teacher’s report, z 
(Berkowitz and Graziano, 1972; Graziano 9r purposes of Presentation, the articles 
1977; Johnson & atz, 1973. O'Dell, 1974), Teviewed have been organized into two areas, 
In the field of education, a Similar shift has based on the Classroom behaviors that were 
DOW substantial experi- targeted for treatment. he first section 
mental evidence which indicates that teachers reviews the Tesearch on home-based reinforce- 
themselves can function as behavior Modifiers ment programs designed to modify disruptive 
in the Classroom and can deal with а Wide ehaviors in the classroom, and the second 
variety of Problems (O'Leary & O'Leary Section Teviews those esigned to change 
1976, 1977 academic behaviors in the classroom, Following 
More Tecently, an increasing Number of € presentation of the Current research, the- 
Studies have tried to join the teacher and ird. section of th article examines the 
Parent in а Cooperative effort to reduce methodology used in the Studies to evaluate 
Problems in the schoo] Setting. The present е reported Tesults and to determine future 
article reviews the current Tesearch on the use areas of Tesearch, 
of home-based reinforcement Programs de- 

This article is an expanded version of Part of an Disruptive Behaviors in the Classroom 
article by Atkeson and Forehand entitled "Parents As х 

havior Change Agents Witt School-Related Prop Although. specific behaviors labeled as 
ems," which appeared in Education ang Urban Society Isruptive may vary from teacher to teacher, 
(40, Seen us ui xe in genera] they are behaviors that the teacher 

аг; о; 15 изст] Wi Si rte "n = А : 

pua Ys vides Joe of Mental ро Gane ae ў disruptive to the learning p Es in 
MH28859.01. Or her Classroom ; these may include 

Requests for Teprints should be addressed to Rex making p » talking Without Permission 
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Unfortunately, these behaviors often demand 

a agreat deal of the teacher’s time. In fact, when 
k teachers were asked to identify classroom 
behaviors to change, 80% chose to reduce the 


occurrence of a “bad” behavior (O'Leary &' 


O’Leary, 1977a). 

Teachers have successfully used a number 
of behavioral techniques to decrease disruptive 
behaviors in the classroom. These include (a) 
increasing incompatible behaviors (e.g., on- 
task behavior, sitting still, correct academic 
performance) through the use of contingent 
tangible rewards and/or social praise, (b) 
punishing disruptive behaviors (e.g, with 


"7". loss of privileges and/or brief isolation of the 


child), and (c) dispensing tokens (e.g., check 
marks or chips) for following classroom rules 

or for meeting a specified level of academic 
performance, redeemable for backup reinforcers 

in the classroom (O’Leary & O’Leary, 1977b). 

* Although highly effective, the above procedures 
do have some disadvantages. To be successful, 
each procedure may require a great deal of 
time and effort on the part of the teacher and 

ss behavioral consultant (Schumaker, Hovell, & 
= Sherman, 1977). Often the teacher must alter 
her/his own teaching style to reduce the 
misbehavior of only a few students (Schumaker 

* et al., 1977). In addition, the tangible rewards 
or backup reinforcers available are often 
limited in the classroom (Ayllon, Garber, & 
Pisor, 1975; Schumaker et al., 1977). In 
contrast, parents often have access to a wide 
variety of privileges (e.g., allowances, TV 


поште, movies, skating) not available to most 


i teachers (Ayllon et al., 1975; Bailey, Wolf, & 
Phillips, 1970; Karraker, 1972). 

Тп an attempt to lessen the above disadvant- 
ages, psychologists have used some variation 
of a home-based reinforcement program to 
reduce the disruptive behavior of one or more 
children in a classroom. In these studies the 
content or degree of specificity of the teacher’s 

| “report to the parents has varied from global 
Sto detailed. Some teachers were asked to 
send a note stating only whether the child 
was good (Ayllon et al., 1975; Heaton, Safer, 
Allen, Spinnato, & Prumo, 1976). More 
frequently, the note home listed the classroom 
rules and whether the child followed each of 
the rules (Hawkins, Sluyter, & Smith, 1972, 
Experiment 4; Lahey et al., 1977; Schumaker 


ne 
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et al., 1977). Although both the global and 
specific reports to the home produced the 
desired change, there have been no studies 
that directly compared the two approaches. 
Each, of course, has its own advantages and 
disadvantages. The more global report may be 
somewhat quicker for the teacher to complete 
but may also be less objective. Stating each 
classroom rule on the note communicates 
more information to both the child and the 
parents. Three studies (Bailey et al., 1970; 
Clark, 1972; Kirigin, Bailey, Phillips, Fixen, 
& Wolf, Note 1) provided a compromise 
between global and detailed reports. At the 
beginning of the program, the teacher listed 
the classroom rules on the board so that they 
were clearly stated for the child. The daily 
report home, however, indicated only whether 
the student obeyed class rules. 

In most studies, the teacher-parent com- 
munication occurred daily. Three studies, 
however, sent notes home only when the child 
was good (Ayllon et al., 1975; Hawkins et al., 
1972, Experiment 4; Heaton et al., 1976), 
the rationale being that a note communicating , 
misbehavior might never reach home (Ayllon 
et al., 1975). Only one study limited the 
parent-teacher communication to a weekly 
contact (Coleman, 1973). Although this was 
successful at maintaining the student’s good 
behavior, the teacher had previously used 
other behavioral techniques to establish the 
desired behaviors in the classroom before 
instituting the weekly communication home. 
One would expect that the more frequent 
the communication between school and home, 
the more rapid the behavioral change. 

Several studies did report that some students 
lost their notes before reaching home. The 
approach by Ayllon et al. (1975), Hawkins 
et al. (1972, Experiment 4), and Heaton et al. 
(1976) of sending notes home only when the 
report is good is one way to circumvent this 
problem, since such a procedure should 
increase the child's motivation for carrying 
the note home. However, this technique does 
not guarantee that a note will reach home. 
If the rewards dispensed at home are not 
potent enough, the note's importance to the 
child is diminished, and the note may be lost 
(Schumaker et al., 1977). Of course, the parents 
can be required to call the teacher to obtain 
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the necessary information if the note fails to 
reach home (Kroth, Whelan, & Stables, 1970). 


did not Include sanctions at home for mis- 
behavior at School (Hawkins et al, 1972, 
Experiment 4; Lahey et al, 1977 When 


and Alexander C 1976), 3 Successive “undesir- 
able” days resulted in a 1-дау Suspension from 
School, a rather extreme punishment jn our 
opinion, 
In the Study by Lahey et al, (1977), con- 
tingent Parental praise at home, implemented 
с 


effects of 
praise, tangible rewards, and Sanctions when 
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as been used Successfully with 
children of al] ages, from elementary School 
The students in the studies 


y 
behind in academic skills (Ayllon et al., 1975; 


Schumaker et al., 1977), had Previous records > 


1976), 
emotionally disturbed 


and 
regular classrooms (Bailey etal., 1970; Martin, 
Tharp, & Thorne, 
; Kirigin et al., Note 1). 


Ward materials, completing 
Classwork neatly and correctly, and achieving 


fie., grades) Appropriate to Potential. As 
Wit disruptive aviors, teachers have 
Successfully u a number of behavioral 


cher-parent com- 
n (ер. Bailey et al., 1970; Cohen, 


+ 
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Keyworth, Kleiner, & Libert, 1971; Lahey 
et al., 1977; Schumaker et al., 1977). 

As might be expected from the large number 
of academic behaviors described earlier, the 
studies reviewed differed widely in their 
criteria for *good academic behavior." Some 
studies focused on behaviors that are con- 
sidered conducive to learning (e.g., orienting 
eyes and head to work, looking at teacher, and 
responding to questions); a good teacher-to- 
parent report in this case indicated that the 
student had engaged in specific appropriate 
academic behaviors during class (Bailey et al., 
1970; Coleman, 1973; Kirigin et al., Note 1). 


"i Other studies chose completion of classwork as 


+ two of the three 


" the target for improvement and required the 
student to complete all classwork to receive 
a good report (Cantrell, Cantrell, Huddleston, 
& Wooldridge, 1969, Case 1; Harris, Finfrock, 

_ Giles, Hart, & Tsosie, 1975; Lahey et al., 

41977). Several studies required not only 
classwork completion but also a certain level 
of achievement for a good report (Hawkins 
et al, 1972, Experiments 1-3; Stuart, 1971; 


~ Kirigin et al., Note 1; Kirigin et al., Note 2). 
; > Although potentially effective, the use ofa 


certain level of achievement on classwork as 
the criterion for good academic behavior at 
‘school and as the basis for parental con- 
sequences at home can produce certain 
problems. If the level of achievement is set 
too high, the student may be unable to meet 
the criterion. As a consequence, he or she has 


1 been placed in a "can't win" situation and, no 


matter how motivated initially, may never 

„ђе able to receive any of the home rewards. 
Of course, a teacher-parent communication 
of this type would be unlikely to bring about 
the desired changes in the student's academic 
behavior. 

Several studies have successfully handled 
the problems associated with the use of achieve- 
ment level in the teacher-parent communica- 
«tion. In the study by Kirigin et al. (Note 2), 

female delinquents had no 
difficulty when good academic behavior and 
subsequent rewards at home were based on 
number of problems correctly done. In fact, 
the two girls improved their daily grade from 
aD toa Banda D toa C, respectively. The 
third female delinquent, however, showed no 
change in her academic behavior, even after 
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the teacher-parent communication had been 
in effect for a week. When tutoring was 
included in the program, her accuracy on 
daily assignments quickly improved from an 
F to an A. 

The home-based reinforcement program 
used by Hawkins et al. (1972, Experiments 
1-3) provides another possible solution for 
using a criterion of achievement level with 
students performing below grade level. In 
their study the initial accuracy criterion for 
good academic behavior was placed at a level 
commensurate with the student's current 
performance. Each day the criterion was 
gradually adjusted upward with remarkable 
success. 

As with disruptive behaviors, the content or 
degree of specificity of the teacher's report to 
the parents varied from global to detailed. 
Usually it was global, with teachers indicating 
only whether the student “studie[d] the whole 
period” (Bailey et al., 1970), “did very well” 
(Hawkins et al, 1972, Experiments 1-3; 
Heaton et al, 1976; Karraker, 1972) or 
“completed all his work” (Lahey et al., 1977). 
The study by Cantrell et al. (1969, Case 1) 
is an example of the other extreme. The 
teacher-parent communication indicated 
whether the student's class assignments were 
completed, were well-done, and had no more 
than two careless errors, whether the student 
listened to and complied with directions, and 
what daily grades were earned by the student. 
As with disruptive behaviors, there have been 
no studies that systematically examined the 
importance of degree of specificity of the 
teacher's report on the student's academic 
improvement. 

In most studies that dealt with academic 
behaviors in the classroom, the teacher-parent 
communication occurred daily (e.g., Bailey 
et al., 1970; Karraker, 1972; Kroth et al., 1970; 
Schumaker et al., 1977; Kirigin et al., Note 1, 
Kirigin et al., Note 2). Some studies sent notes 
home only when the student displayed "good" 
academic behavior (Hawkins et al., 1972, 
Experiments 1-3, Heaton et al., 1976); others 
limited the teacher-parent contact to once a 
week (Coleman, 1973; Harris et al., 1975; 
McKenzie, Clark, Wolf, Kothera, & Benson, 
1968). Again, there have been no studies that 
examined the frequency of teacher-parent 
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contact on the effectiveness of the home-based 
reinforcement programs. 

A recent study did compare the effects of 
fixed-time and variable-time teacher-parent 
communications on the academic behavior of 
26 third-grade students (Saudargas, Madsen, 
& Scott, 1977). Under the fixed-time condition, 
each student received a report for his/her 
parents on Friday that indicated the quantity 
and quality of her/his work completed that 
week. Under the variable-time condition, the 
Same information was communicated to 
parents, but seven to nine students were 
randomly selected each day to receive a report 
to take home. The results show that more 
assignments were correctly completed when 

the students were unsure whether they would 
receive a report that day; that is, their 
academic output was higher under the variable- 
time condition. A similar study comparing 
daily to weekly teacher—parent communica- 
tions is certainly needed. 

The rewards and sanctions received by 
Students at home for good academic behavior 
at school were similar to those in the previous 
section. Nearly all of the Studies used home 
privileges (e.g., allowance, TV time, snacks, 
late bedtime) to reward a good teacher report 
(e.g, Bailey et al., 1970; Karraker, 1972; 
Schumaker et al., 1977). Two studies limited 
the rewards received at home to allowances 
(Coleman, 1973; McKenzie et al, 1968). 
This is perhaps simpler for parents to operate 
or at least less disruptive to family routines in 
that parents do not have to monitor such 
activities as TV time, bedtime, and so forth. 
Sanctions, when included, were limited to loss 
of privileges and were always used in combina- 
tion with a reward System. As with the 
consequences for disruptive behaviors, more 
Studies are needed to examine the effectiveness 
of different reward and punishment programs 
in changing academic behaviors. 

Home-based reinforcement programs have 
been successful in increasing the academic 
behavior of a wide variety of students. As 
might be expected, the students in the studies 
reviewed were frequently described as under- 
achievers or as poorly motivated (Hawkins 
et al., 1972, Experiments 1-3; Karraker, 1972). 
Often the students were behind more than 
1 year in several academic subjects (e.g. 
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Schumaker et al., 1977). Of particular interest! 
is the study by McKenzie et al. (1968). The й 
students in their study had all been labeled as- 
having learning disabilities. F 

Home-based reinforcement programs have 
also been used to increase academic behavior 
with students whose major difficulties may not 
be academic achievement. Several studies 
included students who had been labeled 
emotionally disturbed by teachers and/or 
mental health professionals (Coleman, 1973; 
Kroth, 1970). Other types of students included 
those considered to be moderate to severe 
discipline problems at school (Heaton et al., 
1976; Schumaker et al., 1977; Strober ка 
Bellack, 1975) and those considered to be 
delinquents by the community (Bailey et al., 
1970; Harris et al., 1975; Martin et al., 1968; 
Stuart, 1971; Kirigin et al., Note 1, Kirigin 
et al., Note 2), 


Methodological Evaluation 


All of the studies reviewed present results, 
that support the position that home-bas 
reinforcement. programs provide an effective 
treatment approach to modify both disruptive 
and academic behaviors in the classroom. This; 
type of treatment has the added advantages of 
minimizing the time required by the teacher 
for implementation and involving the parents 
as change agents for their child's classroom 
behaviors. But before a complete endorsement 
of this treatment approach can be made, а 
more rigorous assessment of the research is, 
needed to evaluate the validity of the results 
and to delineate areas for further investigation. 
To. organize the critique of the studies re- 
Viewed, six categories (i.e., adequate design, 
Systematic variation of treatment, multiple- 
outcome measures, follow-up, School program 
monitored, and home program monitored) 
that were deemed critical for sound research, | 
Were determined, and the articles were 24 
evaluated in terms of whether they met the 
requirements of each category. Four additional 
descriptive Categories (i.e., sample size, grade, 
classroom зе пр, and target behavior) were 
included to assess the applicability of the 
results to different Populations and settings. 


Summary of this evaluation is presented in 
Table 1. 
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| Descriptive Categories 
pog 
^. Regarding the descriptive categories, the 
umber of subjects participating in the 
experiments varied from 1 to 124. The mean, 
median, and mode were 10.8 subjects, 2 
subjects, and 1 subject, respectively. Subjects 
were from the primary grades (kindergarten 
through 6th) in 45% of the experiments and 
from the secondary grades (7th-12th) in the 
remaining 55% of the investigations. Sixty- 
eight percent and 32% of the experiments 
| occurred in regular and special (e.g., remedial, 
emotionally disturbed, learning disability) 
; classrooms, respectively. In 39%, 19%, and 
| >42% of the investigations, academic (e.g. 
listening to presentations by teacher, asking 
| and answering questions, working with eyes 
| and head oriented towards materials, complet- 
| ing classwork, achieving, making good grades), 
&disruptive (e.g., making noise, talking without 
permission, physically disturbing other chil- 
dren, getting out of one's seat without permis- 
sion), and combined academic and disruptive 
behaviors, respectively, were targeted. These 
ata indicate that most of the studies have 
involved few subjects; however, the programs 
have been implemented in all grades, in 
«regular as well as special classrooms, and with 
both academic and disruptive behaviors. 


Adequate Design 


A major concern in evaluating the soundness 
f the studies reviewed is the adequacy of the 
«experimental design that was used to examine 
treatment effects. The use of any of three types 

of designs was considered an adequate basis 
for the conclusion that the results were due 
to the experimental treatment. These were 
(a) ABA design (or some variation thereof 
with baseline and reversal), (b) multiple 
baseline design, and (c) group design with 
4 ‘appropriate control group. If the experimental 
“design of a study was one of the three above 
and there was not the confound of multiple 
interventions (e.g., home- plus school-based 
programs), the design was considered rigorous 
enough for the conclusion that the results 


were indeed due to the home-based reinforce- 


ment treatment program and not due to 


alternative factors. 


E 
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Only 63% of the studies reviewed had 
adequate designs without multiple interven- 
tions (Table 1). Of these, the most frequently 
chosen design (10 of 17 or 58.8%) was some 
variation of the ABA design (e.g, Ayllon 
et al., 1975; Bailey et al., 1970). Only one 
study (Harris et al., 1975) selected a group 
design with an appropriate control group to 
examine the effects of home-based treatment 
programs. The studies with inadequate designs 
were usually studies with baseline and treat- 
ment data but no treatment reversal (e.g., 
Cantrell et al., 1969; Hawkins et al., 1972; 
Strober & Bellack, 1975). One study was 
considered inadequate because the home-based 
reinforcement program was confounded by 
being part of a larger multiple intervention 
treatment program. Conclusions about the 
effectiveness of home-based reinforcement 
were reached in spite of the confound (Cole- 
man, 1973). 


Systematic Variation of Treatment 


A second criterion that was selected to 
evaluate the reviewed studies is the inclusion 
of systematic variation of treatment variables 
in the experimental design (i.e., component 
analysis). This is necessary if one is to differen- 
tiate which of the many variables that make up 
a treatment program are actually responsible 
for the observed changes (e.g., social vs. 
tangible rewards). Only when this type of 
information has been determined can a treat- 
ment program be refined to maximize its 
effectiveness while minimizing or eliminating 
unnecessary features. 

Of the studies reviewed, 39% did include 
some systematic examination of the contribu- 
tions of different treatment components, 
although their manipulations were by no means 
exhaustive. Two studies (Ayllon et al., 1975; 
Bailey, et al., 1970) investigated the effects 
of contingent and noncontingent reports 
home. In the noncontingent condition, the 
students received “good” reports and sub- 
sequent home-based rewards regardless of their 
school behavior. No treatment effects were 
observed until a good report was made 
contingent on actual school behavior. It 
appears that contingent consequences for 
target behaviors are necessary for change in 
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those behaviors. Two other studies (Hawkins 
€t al, 1972; Karraker, 1972) examined a 
Somewhat similar variable, feedback versus 
feedback plus home-based consequences, 
Teacher feedback to the child and his/her 
parents concerning the child's school behavior 
Was not effective unless the feedback was linked 
With consequences at home. Other treatment 
components that have been studied are 
Schedule of report home (Saudargas et al., 
1977), types of home-based consequences 
(Schumaker et al., 1977), and types of be- 
haviors targeted for treatment (Kirigin et al., 
Note 1). 


Multiple-Outcome M. easures 


Another criterion deemed necessary to 
evaluate treatment effectiveness is the inclusion 
of multiple-outcome measures. Different out- 
come measures frequently lead to different 
conclusions (Forehand & Atkeson, 1977). To 
provide a complete evaluation of the effective- 
ness of a home-based Program, one should not 
only measure i 


output (e.g., 
but should 


parents’ (Lahey et al., 1977) perception of the 
child. The remaining studies that used 
multiple-outcome measures limited their аѕ- 
sessment of treatment effect to two measures 
of the child: (a) changes in the child's class. 
room behavior and (b) changes in her or his 
academic output. 


Follow-Up 


The fourth methodological criterion is the 
inclusion of follow-up measures to determine 
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enough to adequately assess temporal genera 
ity. For example, 6 
treatment effects following termination bui 
& Bellack, 
1975), and when follow-up data were presented, 6 


reinforcement programs produce lasting effects. 
variables that might 
enhance temporal generality (е.р., fading of 
treatment) need to be systematically examined. 
School Program Monitored and Home Program £ 
Monitored 


contact with the parents (e.g., Ayllon et al., 
iodic conferences (e.g., 
Cohen et al., 1971; Heaton et al., 1976; Kroth 
€t al, 1970; McKenzie et al, 1968), and 


ment setting (eg. Bailey et al., 1970; Lahey 
; , 1977). For 


In the school setting 39% of the studies 
teviewed had observers present. In contrast, 
lnone of the studies reviewed monitored 
treatment implementation in the home with 
observational data. This lack of control makes 
it difficult to infer conclusively that the 
behavioral changes observed in the classroom 
were in fact due to the treatment and not to 
other factors in the home. Although this 
omission in experimental design probably 
arises from the many difficulties associated 
¿with placing trained observers in the home, 
researchers cannot ignore the necessity for 
some type of assessment of program imple- 
mentation in this setting. One solution to this 
dilemma is currently being examined by 
Bostow (Note 3). In his study, parents are 
instructed to audiotape the parent-child 
interaction when the child receives her/his 
home consequences that are based on the 
teacher communication. 


РЭ 


Conclusions 


The need to involve parents in the school- 
related problems of their children is evident. 
Parents are reported to be experiencing а loss 
of influence over their children as schools are 
assuming roles once reserved for parents 
(Woodward, 1978). Home-based reinforcement 
offers one way in which parents can be in- 
corporated into programs designed to manage 
the school difficulties of their children. The 
procedure permits parents to receive regular 
feedback concerning their child's school be- 
havior. Furthermore, home-based reinforce- 
ment encourages frequent teacher-parent com- 
munication. From the school's perspective, 
home-based reinforcement eliminates many 
of the ethical issues associated with school 
behavior modification programs, since neither 
disciplinary procedures nor tangible positive 
reinforcement have to be used in the school. 
the procedure circumvents the 


Furthermore, c t 
У time-consuming and difficult process of setting 


ора behavioral classroom. program. d 
Tn all of the studies reviewed, the conclusion 
reached by the respective authors Was that 
home-based reinforcement Was effective in 
changing classroom behavior. It is encouraging 
to note that this conclusion Was consistent 
across a wide range of grades, regular and 
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special classrooms, and academic and disrup- 
tive behaviors. The replication of the effective- 
ness of home-based reinforcement across this 
variety of ages, settings, and behaviors attests 
to the general impact of the procedure. 

Unfortunately, an analysis of the method- 
ology used in the studies that examined home- 
based reinforcement yields a less positive 
picture. Appropriate designs have been used 
in less than two-thirds of the studies. Further- 
more, monitoring of the school and home 
programs has rarely occurred. Behavior modi- 
fiers typically have prided themselves in being 
well grounded in experimental methodology. 
Indeed, articles and books have been devoted 
to the topic (e.g., Birnbrauer, Peterson, & 
Solnick, 1974; Hersen & Barlow, 1976). 
Unfortunately, in the case of home-based 
reinforcement programs, the methodology 
often has been less than adequate. Other 
aspects of assessment that are receiving in- 
creasing emphasis in behavior modification, 
such as multiple-outcome measures (e.g. 
Atkeson, & Forehand, 1978; Turkat & Fore- 
hand, in press) and follow-up measures (e.g., 
Forehand & Atkeson, 1977), have been ignored 
in the home-based reinforcement studies. 
Obviously, unless adequate methodologies are 
employed, our conclusions about the effective- 
ness of home-based reinforcement procedures 
will be limited. 
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Sex Roles and Psychotherapy: A Current Appraisal 


Bernard E. Whitley, Jr- 
University of Pittsburgh 


This article reviews the current status of research on the effects of sex role 
stereotypes on mental health judgments. Studies in this area have addressed 


three questions: (a) Are 


there different, sex-role-r 


elated standards of mental 


health for men and women? (b) Do violations of sex role norms result in ad- 
verse mental health judgments? (c) Do therapists set sex-role-related goals for 


cues for nonprofessionals, 
mental health judgments, 


their clients? It is concluded that sex role stereotypes are strong mental health 
with violations of sex role norms leading to adverse 


but that whereas professionals share the sex role 


stereotypes of their lay contemporaries, the professionals are unaffected by 


them in making mental health judgments 
and behavior may be due to any of three 


discrepancy between stereotypes 


and in setting therapeutic goals. This 


factors: the methodological limitations of the studies, actual differences jn men- 


.tal health between men and women, 


One of the more popular villains in con- 
temporary psychology is the psychotherapist 
as a covert (if unwitting) agent of social con- 
trol and the.status quo (e.g, Hurvitz, 1973; 
Leifer, 1970; Szasz, 1961). This thesis has been 
most vigorously expounded by à number of 
feminist writers (e.g. American Psychological 
Association, 1975; Chesler, 1972; De Beauvoir, 
1949/1967; Tennov, 1975) who have charged 
that therapists define mental health in terms 
of sex role stereotypes and impose those stereo- 
types on their clients under the guise of 
therapy, thereby inhibiting rather than facili- 
tating mental health. If true, these charges are 
indeed serious, for research has shown that 
overadherence to stereotyped sex roles is as- 
sociated with psychopathology (e.g., H. Gold- 
berg, 1976; Gove, 1972; Gove & Tudor, 1973) 
and low self-esteem (e.g, Bem, 1977; Spence, 
Helmreich, & Stapp, 1975). 

These allegations of sex role bias were sup- 
ported only by intuition and case histories until 
the publication of a study by Broverman, 
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or normal attitude-behavior discrepancies. 


Broverman, Clarkson, Rosenkrantz, and Vogel 
(1970). Broverman and her collegues found 
that psychotherapists shared the sex role 
stereotypes of their society and that they in- 
cluded those stereotypes as part of their 
definition of mental health, thus providing 
some evidence of therapeutic bias. They did 
not, however, investigate the extent to which 
this attitudinal bias resulted in discriminatory 
behavior. Subsequent investigation of such bias 
and behavior has generally found no such dis- 
crimination and casts doubt on the generality 
of the bias. This article reviews that subsequent 
research. 


Questions 


Within this research, three questions have 
been of special importance: (a) Are there 
different, sex-role-related standards of mental 
health for men and women? (b) Do violations 
of sex role norms result in adverse mental 
health judgments? (c) Do therapists set sex- 
role-related goals for their clients? Although 
this article reviews the current status of re- 
search on these questions, it should be noted 
that this review only considers research on sex 
roles in mental health judgments, not on simple 
sex-of-client effects. Research on the latter's 
relationship to mental health judgments, test- 
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als as subjects 


Table 1 
Studies That I, nvestigated Differential Mental Health Standards 
Study Instrument Findings Remarks | 
Mental health profession 


Broverman, Broverman, Clarkson, 
Rosenkrantz, & Vogel (1970) 


BSRQ 


+ Dichotomous scoring 


Fabrikant, Landau, & Rollenhagan ACL + 
(1973) FR - 
Fabrikant (1974) ACL + 
FR - 
Johnson (1974) BSRQ + 
Anderson (1975) BSRQ + 
Maxfield (1976) BSRQ +/- + for dichotomous 
scoring only 
Aslin (1977) BSRQ T/- + for males rating 
females only 
Mental health trainees as subjects 
Terrill (1972) BSRQ + 
Maslin & Davis (1975) BSRQ —/+ + for males rating 
s females only 
Harris & Lucas (1976) BSRQ =/+ + for males rating 
females only 
Other subject groups 
Nowacki & Poe (1973) BSRQ + College students 
Kravetz (1976) BSRQ -= College women only 
Note. BSRQ = Broverman et al. (1970) sex role questionnaire; ACL = Adjective Check List; FR = PR 
res] 


ponse; + = Supports hypothesis; — = does not support hypothesis, 


ing, and treatment has been briefly reviewed 
by Abramowitz and Dokecki (1977) and 
Zeldow (1978), 


Subject Populations 


Researchers have used three Populations of 
subjects in Studying the telationship Of sex 


Stereotypes As Standards of Mental Health 


Twelve studies haye tested the hypothesis 
that there are different standards of mental 


health for men and women, which are based on 
Subjects in 


€se studies have tended to Support the 
hypothesis, but 
their validity, 


Most cited, stud 
to mental health Standards 


leur 


1 Not included in this review is а study b i 
1 У by Shapiro 
(1977) that Used subjects from a Population that the 
author described as Potentially biased. 


SEX ROLES AND PSYCHOTHERAPY 


by Broverman et al. (1970). The psychiatrists, 
WA and social workers who par- 
icipated in the study responded to a modified 
version of a sex role stereotype questionnaire 
developed by Rosenkrantz, Vogel, Bee, 
Broverman, and Broverman (1968). The 
modified questionnaire consisted of 122 bipolar 
adjective pairs, of which 27 had been deter- 
mined to be stereotypically masculine and 11 
to be stereotypically feminine. Subjects com- 
pleted the questionnaire by indicating “the 
‘pole to which a mature, healthy, socially com- 
petent” adult, man, or woman would be closer 
(Broverman et al., 1970, p. 2). Masculine and 
; feminine health scores were computed relative 
to the traits assigned to the adult, and signifi- 
cant differences were found between the mean 
male and female health scores. 
Replications of the Broverman el al. study. 
\ Fabrikant (1974) and his collegues (Fabrikant, 
Landau, & Rollenhagen, 1973) attempted to 
replicate the Broverman et al. (1970) study 
using an independently developed checklist 
and free-response items. These studies found 
sex-role-related mental health standards on the 
checklist but not on the free-response items. 
The results of studies that used continuous 
, scales are mixed. Although Anderson (1975) 
and Johnson (1974) found stereotyped mental 
health standards using the Broverman et al. 
(1970) sex role questionnaire (BSRQ), as did 
Aslin (1977) for male therapists rating women, 
Maxfield (1976) did not. 


io aal у 
" Professional Trainees 
4 


Attempts to identity stereotypic mental 
health conceptualizations among professional 
trainees have also met with mixed results. 
Terrill (1972) elicited such stereotypes using à 
continuous-scale BSRQ with counselor train- 
ees, but Maslin and Davis (1975) and Harris 

^ and Lucas (1976) found few such stereotypes 
«with counselor and social work trainees, Te 

d spectively- Their only statistically significant 
difference was between male conceptualiza- 

tions of females and the other subject-stimulus 


combinations. 


Other Subject Populations 
3) investigated the 


Nowacki and Poe (197. 
, mental health concepts of college students 


"у 
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using à continuous-scale BSRQ. Differences 
were found for the concepts of mentally 
healthy man and woman; differences among 
male subjects were more extreme. Kravetz 
(1976) employed a continuous-scale BSRQ 
with a sample of college women. No differences 
were found between the concepts of mentally 
healthy man and woman on 73% of the‘ stereo- 
typic" items, nor were there overall differences. 
Self-reported membership of some of the sub- 
jects in the women’s liberation movement had 
no effect on the results. 


Moderating Variables 


Variables that have been postulated to in- 
teract with the sex of the target to produce 
stereotyping of mental health concepts include 
the judge’s sex, personal style, and role (thera- 
pist vs. nontherapist). 


Sex of Judge 


As noted above, several studies have found 
sex of subject effects in conceptualizations of 
mental health (Aslin, 1977; Harris & Lucas, 
1976; Maslin & Davis, 1975; Nowacki & Poe, 
1973; also Delk & Ryan, 1977). The usual 
result is that men tend to stereotype to & 
greater degree than do women, particularly 
when men rate women. 


Personal Style 


A-B therapist status. Delk and Ryan (1975, 
1977) have also studied the effects of A-B 
status on stereotyping. Among therapists, As 
are more successful at treating schizophrenics, 
Bs are more successful at treating neurotics, 
and As tend to attribute more feminine charac- 
teristics to themselves than do Bs (Delk & 
Ryan, 1977). The studies found that As also 
tend to stereotype more than do Bs. 

Personal stereotypes. Comparing therapists’ 
personal sex role stereotypes with the cultural 
stereotypes, Billingsley (Note 1) found four 
types of therapists: (a) those whose personal 
stereotypes accurately reflected the cultural 
stereotype but included a large number of 
other beliefs, (b) those whose personal stereo- 
types were largely inaccurate compared to the 
cultural stereotypes and included a moderate 
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number of other items, (c) those who had a 
moderately accurate personal stereotype and 
included a moderate number of other items, 
and (d) those who had an inaccurate personal 
Stereotype with few outside items. Of the four 
groups, the third differentiated more between 
mentally healthy men and Women, as measured 
by a continuous-scale BSRQ, than did the 
other three groups. However, the significance 
of the differences was not reported, 


Subject Population 


Delk and Ryan (1977) found that mental 
patients tended to differentiate between the 
traits assigned to mentally healthy men and 
Women more than did Students, who did so to à 
greater extent than did therapists. The differ- 
ence between the patient group and the others 
may have been due to education as wel] as 
patient status, since they were mostly from the 
lower and lower-middle Socioeconomic classes, 


Methodological Limitations 


Measurement of Sex Role Stereotypes 


The goal of 
determine the ‘ 


stereotype measurement is to 
‘set of beliefs a 


In the 
subjects describe the 
or woman using their 
own words, whereas in the forced-choice situa- 
tion, they describe the mentally healthy man 
or woman using a list of traits Provided for 
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them, some of which are sex role Stereotypic, 


The forced-choice response format maximizes. 


Stereotypic responses, since the subjects are 
allowed to respond only with the items listed 
on the questionnaire, regardless of whether 
they would ordinarily use them (cf. Lunneborg, 
1970), Frieze (Note 2), for example, found that 
college men and women both gave primarily 
nonstereotypic responses to stimuli such as 
"I believe that most women. . ..” This phe. 
nomenon was also described by Fabrikant 
(1974; Fabrikant et al., 1973), who found less 
stereotyping by therapists with free-response 
items than with an adjective checklist, 

The phrasing of the 
items on a Stereotype questionnaire as either 
specific behaviors or general traits can affect 
Stereotype measurements, Komarovsky ( 1976) 
i (1966) found that male 


motherhood, and work out- 
side the home, Thus the use of trait scales could 
reduce measured stereotyping. 

Use of the BSRQ. Overreliance on the 
BSRQ itself is a major limitation of the studies 
described, As а 


Q was used in 10 of the 
it hardly represents a 
Consensual definition of Sex roles. Sex role in- 


В : Spence et al. (1975), or a 
combined questionnaire could be different from 
one using the BSRQ. 


* 
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Scoring the Stereotype Measures 


/ Forced-choice questionnaires can be scored 
:on either a dichotomous or a continuous scale. 
Using a dichotomous scale, subjects can report 
whether they think a trait is characteristic of 
a man or a woman but not the degree to which 
it is characteristic. This scoring system was 
used in the Broverman et al. (1970) and adjec- 
tive checklist portions of the Fabrikant (1974; 
Fabrikant et al., 1973) studies. This procedure 
tends to inflate the extremity of scores by elimi- 
nating the possibility of a qualified or neutral 
rating. Investigating the implications of a 
dichotomous versus а continuous scale, Max- 
field (1976) found a marked reduction in the 
rating of male-female differences when they 
were measured on а continuous-scale version 
of the BSRQ, with no statistical difference on 
59% of the stereotypic items. It would thus 
appear that the Broverman et al. (1970) and 
Fabrikant (1974; Fabrikant et al., 1973) check- 
list findings should be interpreted with a great 
deal of caution. : 
Even when continuous scales are used, there 
у are problems of interpretation, since most sex 
differences found on individual items of mental 
health questionnaires are of degree rather than 
kind (e.g, Johnson, 1974; Kravetz, 1976; 
Maslin & Davis, 1975; Maxfield, 1976). That 
is, given а bipolar adjective continuum, men 
and women are usually rated on the same side 
of the neutral point. Although the pictures of 
the mentally healthy man and woman may 
differ statistically under these conditions, the 
conceptual differences to the respondant may 
not be so great. 


Summary 

Although the findings of the studies that 
tested the hypothesis that there are different, 
sex-role-related standards of mental health for 
men and women are generally positive, meth- 
odological shortcomings cast doubt on their 

y ^ validity. At best, it 
equivalent measures \ 
professionals share the sex role stereotypes 0 


their lay contemporaries. 


Effect of Violations of Sex Role Norms on 
Judgments of Mental Health 


If the hypothesis that there are different, sex- 
| role-related standards of mental health for men 


|: 
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and women is accepted, then а second hypothe- 
sis follows: The exhibition of cross-sex-role 
behavior leads to adverse judgments of mental 
health. This hypothesis has been tested in the 
24 studies listed in Table 2. Research in this 
area has been of two types: analogue studies, in 
which subjects base their judgments on ficti- 
tious cases specially constructed to coincide 
with the independent variables of the study, 
and field studies, in which actual therapist- 
client relationships are examined. Generally, 
the hypothesis has been supported for non- 
professionals but not for mental health 
professionals. 


Analogue Studies 


In analogue studies, subjects are typically 
presented witli a written description of a 
stimulus person whose sex role orientation is 
varied between judges. These stimulus persons 
are then rated on adjustment, mental health, 
or similar measures. There are three major sub- 
categories of analogue studies: those that use 
only female stimulus persons, those that use 
both male and female stimulus persons and in 
which only one or two characteristics are varied 
to manipulate sex role orientation, and those 
that use both male and female stimulu s persons 
and in which several characteristics are varied. 


Mental Health Professionals As Subjects 


Female stimulus persons only. The four 
studies that used only female stimulus persons 
manipulated sex role orientation by varying 
one trait: Thomas and Stewart (1971), 
Abramowitz et al. (1975), and Hill, Tanney, 
Leonard, and Reiss (1977) used career choice 
as the critical trait, whereas Abramowitz, 
Abramowitz, Jackson, and Gomes (1973) used 
political activism. The results of these studies 
were generally negative, with the exception 
that Abramowitz et al. (1973; Abramowitz et 
al., 1975) found that subjects with more tradi- 
tional attitudes tended to rate nontraditional 
stimulus persons lower on adjustment than did 
less traditional subjects. 

Male and female stimulus persons, single trail. 
In these studies, sex role orientation of the 
stimulus persons was manipulated by varying 
the traits of independence and achievement 
(Pringle, 1973) and active versus passive 
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Table 2 
Studies That Investigated Mental Health Judgments 


Ca A me — 99 — — -o- и j 
Study Findings Remarks | 
| 
Analogue studies Е 
Mental health professionals as subjects 


Thomas & Stewart (1971) 


Abramowitz, Abramowitz, Jackson, & Gomes (1973 —/+ + for subjects with traditional attitudes? 

Bilick (1973) = 

Pringle (1973) = Я д д: у 
bramowitz et al. (1975) -/+ + for subjects with traditional attitudes 


Chasen (1975) – | 
Chasen & Weinberg (1975) 

Berland (1976) 

Fischer, Dulaney, Fazio, Hudak, & Zinotofsky 


Bete dt 


(1976) 
Gomes & Abramowitz (1976) - 
Maxfield (1976) = 
Hill, Tanney, Leonard, & Reiss (1977) – 

Tribich (1977) —H o for males Tesponding to a crisis in a 

feminine manner 

Mental health trainees as subjects | 

Feinblatt & Соја (1976) ЎН — for diagnosis, + for two prognoses 

Other subject groups | 

Burhenne (1972) + College students | 

Coie, Pennington, & Buckley (1974) College students «d 


+ 
Costrich, Feinstein, Kidder, Marecek, & Pascale (1975) +/+ College Students; two experiments 


Derlega & Chaikin (1976) + College students 
Feinblatt & Gold (1976) + arents 
Zeldow (1976) —/+ College Students; + for males rating ў 


` femal | | 
Tribich (1977) males only | 


Collins & Sedlacek (1974 i E 
Feinblatt & Gold (1976) i Mid 


Levy & Doyle (1974) + Intervi 
Cowan (1976) = Sages 
Note. + = Supports hypothesis; — = does not Support hypothesis, 


personal style (Chasen, 1975; Chasen & 1976). However, Tribich 
Weinberg, 1975; Fischer, Dulaney, Fazio, 
Hudak, & Zinotofsky, 1976). None of these 2 
studies supported the hypothesis that cross- disturbed than a 
sex-role behavior leads to adverse judgments same manner and 
of mental health, 

Male and female Stimulus persons, multiple other “masculine” ауіогѕ, In addition 
traits. Once again, the hypothesis was gen-  Feinblatt and Gold (1976) found that al- 
erally not supported (Berland, 1976; Bilick, though graduate students in clinical and school 
1973; Gomes & Abramowitz, 1976; Maxfield, Psychology did not rate the severity of a 

f) 


problem greater in cross-sex-role children, they 
s gid make a stronger recommendation for treat- 

ment and a bleaker prediction for future ad- 
justment if the behavior continued. 


Other Subject Populations 


Male and female stimulus persons, single trait. 
Studies that used college students as subjects 
found evidence in support of the hypothesis. In 
two experiments that used the traits active and 
«passive, Costrich, Feinstein, Kidder, Marecek, 
and Pascale (1975) found that cross-sex-role 
stimulus persons were rated as being in greater 
t need of therapy than their in-role counterparts, 
and Derlega and Chaikin (1976) found lower 
adjustment ratings for violators of self-dis- 
closure norms. In a complex study of variables 
that affect mental health judgments, Coie, 
>a Pennington, and Buckley (1974) found mar- 

ginally greater ratings of mental disturbance 

for female stimulus persons who reacted ag- 
gressively to stressful situations and for male 
stimulus persons who reacted to the same situa- 
"tions with somatic complaints. 
^ Male and female stimulus persons, multiple 
traits. Using college students as subjects, 
Burhenne (1972) and Israel, Raskin, Libow, 
and Pravder (1978) found lower mental health 
ratings for cross-sex-role stimulus persons. 
Zeldow (1976), however, found sex role effects 
only for male subjects rating female stimulus 
persons. Feinblatt and Gold (1976) had a 
sample of parents rate the same cases as di 
"the graduate students in another part of their 
dstudy and found ratings of greater problem 
severity and less future adjustment for cross- 
sex-role children. Tribich (1977) also had non- 
professional adults rate the same cases as did 
his therapist subjects and again found that 
feminine behaviors in men led to more severe 
mental health ratings. 


м Moderating Variables 


. Generally speaking, sex of 
affected judgments of mental 
health in rela 
(Bilick, 1973; Chasen, 1975 j Chasen & 
berg, 1975; Derlega & Chaikin, 1976; Feinblatt 
& Gold, 1976; Gomes & Abramowitz, 1976; 
Maxfield, 1976; Tribich, 1977). However, 


y, 
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woman therapists tend to be slightly more ac- 
cepting of counterstereotypic behavior than 
men (Chasen, 1975; Chasen & Weinberg, 
1975) and more lenient in their ratings of all 
stimulus persons (Abramowitz et al, 1975; 
Harris & Lucas, 1976; Maxfield, 1976). Israel 
et al. (1978), however, found female college 
students to make more severe ratings than male 
students. 

Altitudes. Abramowitz and his collegues 
(1973; Abramowitz et al., 1975), working with 
female stimulus persons only, found that 
counselors with more traditional attitudes 
tended to attribute less adjustment to women 
with deviant career goals and political atti- 
tudes, Other studies (Chasen, 1975; Gomes & 
Abramowitz, 1976), however, have failed to 
replicate the attitude-sex role interaction for 
stimulus persons with deviant and conforming 
sex role traits. Although these differences may 
be due to differences in population (counselors 
vs. psychologists), the differences in stimulus 
material are also striking: The counselors in 
the first studies rated behaviors, whereas the 
psychologists in the attempted replications 
rated traits. 

Situations. Although Coie et al. (1974) 
found marginal support for the use of sex role 
stereotypes in mental health judgments made 
by college students, they also found that the 
situations that elicited the judged behaviors 
and the situation-behavior interaction were 
more important cues than the stereotyping of 
the reaction. For example, 


relatively little disorder was attributed for social with- 
drawal compared to aggression in the context of exam 
pressure, whereas social withdrawal was seen as eyi- 
dencing at least as much disorder as aggression in the 
context of rejection. (p. 563) 


This finding suggests that situational context 
as well as behavioral description may be an 
important factor in the experimental materials. 


Field Studies 
Archival Studies 


Two of the field studies of sex roles and 
judgments of mental health are archival. That 
is, they examined the records of a child 
guidance clinic (Feinblatt & Gold, 1976) and 
a university counseling center (Collins & 
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Sedlacek, 1974) for patterns of referrals related 
to sex role behavior. It should be noted that 
these studies examined requests for treatment 
(made directly by college students or by 
parents for children), not acceptance for treat- 
ment, and thus they reflect nonprofessional 
judgments of mental health. Collins and 
Sedlacek's analysis of Screening interviews 
found that male students tended to seek 
counseling for vocational, educational, and 
underachievement problems, whereas female 
students sought counseling for emotional and 
social conflicts. Feinblatt and Gold found that 
3390 of the children seen at a guidance center 
were referred for sex-role-related behaviors 
and that 89% of these were for cross-sex-role 
behaviors. Thus failure to meet behavioral 
sex role expectations may be seen as a mental 
health problem by nonprofessionals, 


Surveys 


Cowan (1976) surveyed a sample of pri- 
marily male consulting 
them complete the BSRQ 
and female patient, “indicating the extent to 
which they think one of 
the greater problem” 
female clients were rated as being too feminine 


were not viewed in sex role 

terms as measured by the BSRQ. A measure- 
ment qualification arises from the fact that the 
psychologists, in Tesponse to another question, 
reported that sex role expectations underlay 
the problems of their male and female clients 
to the same extent, a finding that may reflect 
the inadequacy of the BSRQ asa definition of 
sex roles. 


Interviews 


On the other hand, Levy and Doyle (1974) 
found a slight tendancy for staff members at a 
drug abuse rehabilitation center to see resi. 
dents' problems in sex role terms. These data, 
however, were scored dichotomously and reflect 
the views of nonprofessionals as well as pro- 
fessional staff, both conditions tending to in. 
crease the degree of stereotyping found. 
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Methodological Limitations 


to disconfirm the hypothesis that violations of 
sex role norms result in adverse mental health 
judgments made by professionals and to sup. 
port it for judgments by nonprofessional 
judges. There are, however, aspects of the 
stimulus material and dependent variables in 
the analogue studies that merit a closer look, 


Independent Variable M. anipulation 


Specific sex-role-deviant behaviors seem to 
lead to more adverse judgments than trait 
lists. For example, Abramowitz et al. (1973; 
Abramowitz et al., 1975) found an interaction 
between counselor attitudes and stimulus be- 
havior that led to lower adjustment. ratings, 
Whereas studies that used traits did not 
(Chasen, 1975; Gomes & Abramowitz, 1976). 
With nonprofessional subjects, Derlega and 
Chaikin (1976) found a much stronger effect 
using a specific behavior than did the other 
investigators who used traits. In 
these, behavioral 
and dimension 
For example, 
Covering a wide variety of behaviors, whereas 
the description 
of stairs" gives a More precise measure of 


Gold (1976) found that 
Psychology graduate stu- 
ted children Е 


role behavior 


The weight of experimental evidence : 


showing in- and CTOSS-SeX- з 


4 


fu 
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and so needs treatment, This situation suggests 
4 the need for multivariate dependent measures 
Kos mental health that assess present percep- 
tions and future expectations. 


Summary 


Mental health professionals are relatively un- 
influenced by violations of sex role expectations 
in making mental health judgments. On the 
other hand, nonprofessional judges are affected 

* by such violations when making similar de- 
cisions. These therapist-lay differences may 
be due to the fact that therapists stereotype to 
а lesser degree than do other groups (Delk & 
Ryan, 1977) or to the fact that the most 
common nontherapist subject population con- 
sists of college students. The latter explanation 
is supported by Tribich's (1977) finding of 

in the mental health 
judgments of either therapists or adult white- 
collar workers. It is also possible that the 
source of the differences lies in the experi- 
mental procedures and that 
havioral rather than trait descriptions and the 
use of multivariate dependent measures could 
affect the findings among clinicians. 


Treatment Goals 


Five studies have investigated the hypothesis 
that treatment goals tend to be sex role 
related (Table 3). These studies have generally 
"failed to support this contention, 
4. some specific goals may be sex typed. 


Analogue Studies 


Billingsley (1977) and B. J. Goldberg (1976) 
compared the treatment goals set for clients 
who varied in sex and problem. Neither study 


a linity or femininity of goals depended 


type of P! f 
recommended for potentially 


(Billingsley, 1977). Such goals suggest that the 
ideal person may be seen as a mixture of the 
masculine and feminine and that both extremes 


are undesirable. 
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Table 3 
Studies That Investigated Therapy Goals 


Study Findings 


Analogue studies 
Pringle (1973) A 
B. J. Goldberg (1976) E 
Billingsley (1977) = 
Field studies 


Fabrikant (1974) = 
Levy 8: Doyle (1974) + 


Note. + = supports hypothesis; — = does not 


support hypothesis. 


Investigating counselor reactions to high 
and low dependence and achievement in male 
and female clients, Pringle (1973) found that 
both male and female counselors expressed a 
greater desire to change the behavior of low- 
achieving male clients than that of low- 
achieving female clients. In addition, male 
counselors recommended less change for de- 
pendent women and all high achievers than 
did female counselors. 


Field Studies 


Fabrikant (1974) surveyed patients in psy- 
chotherapy about their perceptions of their 
therapists’ sex role goals for them. Only per- 
ceptions of male therapists were reported, and 
these perceptions generally failed to support 
the hypothesis of sex-role-related treatment 
goals. A sizeable minority (27%) of female 
patients, however, endorsed the statement that 
“male therapists encourage female patients to 
follow the role of wife/mother” (p. 101). On 
the other hand, they were not asked if their 
therapists discouraged such traits as inde- 
pendence and assertiveness. 

Reporting on à drug rehabilitation program, 
Levy and Doyle (1974) state that “а stable 
relationship with a member of the opposite sex 
is important for a woman to complete the 
program while more credence is given to a 
male’s realistic job plans" (p. 430). They 
neglect, however, to present any data relevant 
to the degree of importance of these two factors 
in making rehabilitation judgments, so no firm 
conclusions can be drawn. 
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Summary 


Relatively little research has been conducted 
investigating treatment goals as a function of 


erally negative, There are indications, however, 
that reactions 
(Pringle, 1973) 
(Fabrikant, 1974) may be sex typed. This 


Although adherence to a Stereotype is gen- 
erally not used as a treatment goal, it is pos- 
sible that therapist-encouraged Sex roles may 
have beneficial effects on some clients, Delk 
and Ryan (1977) note, for example, that Type 

therapists, who Stereotype the most, are 
also the most Successful with schizophrenics, 
They suggest that these therapists provide 
Structure and а “ definitive role model of 
culturally valued and socially acceptable be- 
havior” (p. 258) for a patient group charac- 
terized by ambivalence, confusion, and need 
for structure. 


Conclusions 


Nonprofessionals 


as indicators of Poor mental health and adjust- 
ment. When nonprofessionals evaluate reac- 
tions to stressful situations, however, it would 


important mental health cue than the reac- 
tion’s sex role appropriateness, 


Mental Health Professionals 


Although the evidence concerning differ- 
ential mental health standards for men and 
women indicates that clinicians share the sex 
tole stereotypes of their lay contemporaries, 
there is little evidence that these stereotypes 
affect professional judgments or treatment 
goals. Although this apparent contradiction 
between studies of clinicians’ attitudes and 
their behavior may be primarily due to the 
methodological limitations of the Studies, as 
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In press), there are two other po: 
explanations. 
First, Gove (in Press) has noted that 


Second, even given stereotyped attitudes, a 
contradiction does not necessarily arise. Social 


roles than by their sex role stereotypes alone. 
In the area of fi 


! Һе apparent exception to this rule concerns 
children, As Previously Noted, Feinblatt and 
Gold (1976) found that although therapist 
trainees Were somewhat accepting of cross-sex- 
tole behavior in children, they see a bleak 
future for them, This may reflect concern that 


the behaviors could lead to later homosexu- 
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,ality, transsexuality, or social rejection (cf. 
Green, 1974; Rekers, Rosen, Lovaas, & 
Bentler, 1978). 

"Thus the influence of sex roles on psycho- 
therapy appears to be more limited than some 
critics charge. This does not mean that the 
critics are necessarily wrong; it is more likely 
that therapists’ attitudes have changed. In- 
deed, until recently, sex role congruence and 
mental health have been closely identified 
in the professional literature (e.g, Garai, 

* 1970; Hurlock, 1974), and the change from 
this position may be attributable to the 
success of the women's movement in rais- 
ing the consciousness of the psychotherapeutic 
establishment. 
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What Is Biofeedback? 


and training pro- techniques that use biophysiological instru- 
r mentation to provide patients with informa- 
of advertisements tion about changes in bodily functioning of 
medical and psy. Which the person is usually unaware. The 


pain syndromes, 


xhaustive coverage 
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them to voluntarily control some aspect of 
Aheir physiology that purportedly is causally 
linked to the pain experienced. 

A number of important assumptions un- 
derlie the clinical use of biofeedback. One 
assumption is that the etiological variables 
and the pathophysiology of the pain to be 
controlled are known and can be subjected 
to voluntary control. A second assumption is 
that learned control of a bodily response is 
facilitated by information about activity in 
the relevant organ system. A third assump- 
tion is that through the biofeedback training, 
the patient will be able to recognize some of 
the situational factors that are related to the 
maladaptive physiological responding. A final 
assumption is that the skills learned during 
the biofeedback training will generalize to 
situations in the patient’s natural environ- 


ey ment, and as à result, these skills will be main- 


" tained over time and settings. That is, the 
patient will be able to engage in conscious con- 
trol of relevant physiological responses outside 
the clinic or laboratory setting. Since most 


mag, of the biofeedback training studies concerned 


with pain regulation have used patients with 
headaches (tension, migraine), this review first 
examines this literature and then examines 
other clinical pain populations. 


Muscle Contraction Headache 


The exact etiology of muscle contraction 


7" headache is unclear (Bakal, 1975). There is, 


however, a consensus that muscle contraction 


* ij 2e = 
headaches (a) are an individual's response to 


psychological stress (American Medical Asso- 
ciation, 1962; Dalessio, 1972; Martin, 1966) 
and (b) may result from excessive and sus- 
tained contraction of the frontalis (forehead), 
scalp, or neck muscles (Bakal, 1975; Martin, 
1972). The traditional treatment of muscle 
, contraction headaches usually entails symp- 


*X tomatic medication, for example, tranquilizers, 


muscles relaxants, OT analgesics, and occa- 


sionally psychotherapy. 

In 1954 Sainsbury and Gibson reported 
that the resting levels of frontalis electro- 
myographic (EMG) activity were higher in 


than in normals. In msk 
Stoyva also demonstrated an association be- 
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tween frontalis EMG activity and tension in 
varying scalp and neck muscles. This asso- 
ciation provided the basis for Budzynski, 
Stoyva, and Adler's (1970) suggestion that 
biofeedback would facilitate a patient's ability 
to attain “deep levels of relaxation" (p. 206) 
and would subsequently enable the client to 
“consciously control muscular tension" (р. 
206). This led to а series of biofeedback 
studies (Budzynski et al, 1970; Budzynski, 
Stoyva, Adler, & Mullaney, 1973) in which 
tension headache patients were given frontalis 
EMG feedback. 

In their initial report, Budzynski et al. 
(1970) provided five muscle contraction head- 
ache patients with frontalis EMG biofeedback 
training. Surface electrodes attached to the 
forehead provided information that was fed 
back by way of a tone, with the frequency 
of the tone proportional to the integrated 
EMG activity. Patients were instructed to 
attempt to keep the tone at the lowest fre- 
quency possible and were not provided with 
instructions as to how this might be accom- 
plished. In addition, the patients were in- 
structed to practice at home twice a day the 
skills that they had acquired during the 
feedback training. The amount of training 
received by the patients varied from 4 to 13 
weeks, with two 30-minute sessions each week. 

Budzynski et al. (1970) reported а steady 
decline in headache intensity and duration 
and EMG activity over the course of the 
training. A follow-up was conducted 3 months 
after the conclusion of the training. Muscle 
contraction headaches were eliminated in two 
patients and were reduced markedly in à third. 
For the remaining two patients, headaches 
returned shortly after the end of the feedback 
training. 

Although the Budzynski et al. (1970) study 
demonstrates that patients can learn to reduce 
headache intensity as well as frontalis EMG 
activity, the effects cannot necessarily be 
concluded to be a direct result of the EMG 
feedback training. Because of the preliminary 
nature of the investigation, Budzynski et al. 
did not include any group to control for 
placebo or expectancy efiects. Without such 
controls it is impossible to separate any 
active effect of the EMG training from non- 
specific treatment effects associated with the 
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impressive biofeedback “ritual” (Miller, 1974 ; 
Shapiro & Surwit, 1976). Since the biofeed- 
back treatment in Budzynski et ај study 
involved such components as EMG training, 
home practice, expectancy of alleviation of 
headaches, self-monitoring of Stress-inducing 
situation, and a format that engendered a sense 
of control (cf. Epstein & Blanchard, 1977; 
Glass & Levy, in press), it is difficult to 
attribute improvement to the EMG feedback 
Component. Another limitation of the Bu- 
dzynski et а]. study is the use of only a 3-month 


further research, 

In a better controlled study of biofeedback 
for the control of muscle contraction headache, 
Budzynski et al, 


potential importance 
Practice in the biofeedback regimen. 

Budzynski et al. (1973) contacted four of 
the six patients in the EMG feedback group 
18 months after the completion of training, 
Three of the four patients reported that their 
headaches remained at Very low levels with 
the fourth subject indicating that she had 
received some relief from headache activity 
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following the training. This maintenance 0 
improvement over such ап extended peri 
is impressive. Since i 

feedback and no-feedback 


the training or of Spontaneous fluctuations of 
Budzynski et al. did not 
provide any information as to whether the 
biofeedback subjects demonstrated reductions | 
i at the 18-month follow-up 
concordant with the reduction in headache 
incidence, | 
Although the second Budzynski et al. study | 
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Epstein, Hersen, & 
, 1974; Haynes, Griffin, Mooney, 


biofeedback training that 
gestions for home Practice, 

All of the Studies reviewed so far provide 
at least evidence that frontalis 
Сап reduce muscle con- 
incidence. An important 
Issue, however, ig how useful biofeedback is 


several investi- 
& Abel, 1976; 
1973; Tasto & 


sufficient to significantly teduce muscle con- 
traction headaches, Another approach to al- 
leviate muscle contraction headaches has 
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recently been used successfully by Holroyd, 
yAndrasik, and Westbrook (1977). The thera- 
|. peutic regimen used by Holroyd et al. was 
designed to train patients in a number of 
stress-coping skills to enable them to cope 
more adaptively with environmental and indi- 
vidual stylistic responses that instigated mus- 
cular tension. Further research should de- 
termine whether biofeedback techniques, which 
require sophisticated and expensive psycho- 
physiological apparatus, are any more efficient 
, or effective than more readily available and 
less expensive interventions. 

Several recent studies (Chesney & Shelton, 
ч Freundlich, & Meyer, 1975; 
Haynes et al, 1975; Hutchings & Reinking, 
1976; Holroyd et al, 1977) have addressed 
this issue by comparing the efficacy of frontalis 
EMG biofeedback with relaxation training or 
y. stress coping training. The data reported is 
equivocal but does not in general support the 
contention that biofeedback training is any 
more effective than various relaxation or 
cognitive control techniques. 

In the Hutchings and Reinking (1976) 
study, three groups received EMG biofeedback 
training alone, relaxation training alone, or а 
combination of biofeedback plus relaxation 
training. All three groups were instructed to 
practice at home twice daily the skills that 
they had acquired. The groups who received 
the EMG training demonstrated significant 
reduction in headaches during à 28-day fol- 
low-up period compared to the relaxation 
training group. The EMG feedback and com- 
4 bined EMG feedback plus relaxation groups 

did not differ significantly in the incidence 
training. Although 

patients in all three groups revealed reductions 

in frontalis muscle action potential following 
treatment, no significant between group dif- 
ferences were obtained, and no correlational 
data were reported between the amount of 
,^reduction in headache activity and the degree 
of EMG reduction. These data do suggest 
that biofeedback training is more effective 
than relaxation training; however, they also 
question the significance of frontalis EMG 
control for the reduction of muscle contrac- 


tion headache. 
Studies by Cox et al. (1975), Haynes et al. 
(1975), and Chesney and Shelton (1976) pro- 


fS, 


1976; Cox, 


| 
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vide evidence that contradicts the Hutchings 
and Reinking (1976) findings. Although Cox 
et al and Haynes et al. both report that 
EMG biofeedback was effective in alleviating 
headaches, groups who received the EMG 
training did not differ significantly from 
groups who received relaxation training. In 
the Chesney and Shelton (1976) study, only 
the relaxation training group demonstrated 
significant reductions in headache incidence, 
the EMG biofeedback group reduced headache 
incidence no more than a no-treatment con- 
trol group. 

In a recent study, Holroyd et al. (1977) 
used a therapeutic intervention that focused 
on altering maladaptive cognitive responses 
that were assumed to mediate the occurrence 
of muscle contraction headaches. Patients 
were provided with a rationale for treatment, 
which emphasized the function of specifiable 
maladaptive cognitions in the creation of sub- 
sequent disturbing emotional and behavioral 
responses (based on Beck, 1976 and Meichen- 
baum, 1977). Patients were encouraged to 
attribute their headaches to relatively specific 
cognitive self-statements rather than to situ- 
ational or complex internal dispositions. A va- 
riety of stressful situations were identified, 
and patients were taught to focus (a) on the 
situational cues that trigger tension and 
anxiety for them, (b) on their response to 
these cues, (c) on their thoughts before be- 
coming tense and after the development of 
tension, and (d) on the way in which these 
cognitions contributed to the tension head- 
aches. Following this sequence, patients were 
instructed to deliberately interrupt the se- 


quence of thoughts preceding their emotional 
response at the earliest possible point and to 
engage in cognitive control techniques incom- 
patible with further stress and tension (e.g. 
cognitive reappraisal, attention deployment, 
fantasy). 

This cognitive control regimen was em- 
ployed with 40 tension headache patients who 
were compared to patients who received 
either biofeedback or no specific treatment. 
Training consisted of 8 biweekly sessions with 
а 15-week follow-up. At the termination of 
treatment and at follow-up, only the cognitive 
control group demonstrated substantial im- 


provement on frequency, duration, and in- 
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tensity of headaches. Interestingly, only the 
biofeedback group demonstrated significant 
reductions in EMG activity. This latter 
finding raises the question of the assumed 
causal relationship between frontalis EMG 
activity and muscle contraction headaches, 
Hutchings and Reinking (1976) also ques- 
tioned the contribution of frontalis muscle 
tension in the development of muscle con- 
traction headaches. The questionable rela- 
tionship between EMG activity and experience 
of tension headaches is underscored further 
by Holroyd et al.'s (1977) data and the ob- 


Abel, 1977; Holroyd et al, 1977) These 


Budzynski et al, (1973). Cox et al. suggest 
that the discrepancies in the various studies 


EMG levels in a method analogous to that 


cant relationship between frontalis EMG and 
headache incidence, thus questioning Cox 
et al.'s explanation. 

Other interpretations àre possible, First, 
the etiology of muscle Contraction headache 
may not result in high levels of frontalis 
EMG activity but rather in muscle contrac- 
tion in other parts of the head, neck, and 
shoulders, with little generalization across the 
various muscle groups (Haynes et al., 1975). 


Second, changing frontalis muscle activi 
тау not be sufficient for changes in sg 


and self-report of tension headaches are con- 
cordant. Thus it Seems premature to conclude 
that the positive effects of biofeedback ap- 
Proaches, when indeed they occur, are a func- 
tion of increasingly voluntary control of 
frontalis muscle activity. 

Table 1, which contains a summary of the 


dence, but when contrasted with alternative 
therapeutic interventions and no-treatment 


dition, only four of nine studies reported any " 
Concordance between reductions in EMG levels 
and headache incidence, 


Although the data are not conclusive, from 
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1976). If an essential element of biofeedback ^ other psychological approaches. Although bj 
training is the development of such cognitive feedback procedures have bee 
and behavioral Stratagems, then perhaps a useful, they should still be con 
more efficient and less expensive method of зогу at best, with many quest 
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achieving relaxation would simply be to in- unanswered. 
Struct patients in various methods of relaxa- 


tion. The EMG biofeedback could then be 
used for a brief period to help the patient 
identify the m 
Coursey (1975) asks, “what sort of relaxation 
technique is effective 
With what sort of 
with what other procedures?" (p. 833). 

It is clear that biofeedback training for 
the control of muscle contraction headaches 
directly addresses only the maladaptive physio- 
logical responses to Stressful situations and 
ignores those Psychological factors that initiate 
contribute to such résponses. As was 
noted previously, 
generally believed 
traction headaches: 
sponds to psychological Stress, which (b) may 
produce prolonged contraction of the muscles caffei 
in the head, neck, or shoulders and which eed 
(c) may subse ike f 

American Medical Assoc un ministered prior to the estab! 
). To Successfully treat 
headaches, a therapeutic 
es the first stage (response 
to psychological Stress), such as the stress- 


and 


coping training 
1976), 

that fo 
physiol 
1979, 
faceted 


to specifically addr: 
appraisals and bel 
vironment may ac 
data, (b) for the 
patients who drop 
ment prematurely, 
differences in the 
feedback training (| 
baum, 

In summary, 
peutic applicati 
back for the со 
headaches exceei 


problems in conjunction 


effective strategy, As 


with what sort of people 


(Dalessio, 1972). 


with autogenic trainin: 


Migraine (Vascular) Headache 


The existing data on 
of migraine headaches is 
physiological evidence su 
headache is associated 
cranial vasculature responsivity 
nomic nervous System inst 
1975). The symptoms of mig 
are thought to be mediated 
autonomic nervous System 
evidenced by increased blood flow in the 
head, which results in painful dilation and 
distention of the cranial arteries 
Tange of substances that produce vasocon- 
Striction, including ergotamine tartrate, pitu- 
itrin, ephedrine, benzidrine, ephinephrine, and 
have been shown to be 
effective in relieving migraine, when ad- 
lishment of edema 


the pathophysiology 
Sparse. The available 
ggests that migraine 


ability (Bakal, 


Sargent, Green, and Walters (1972), who 
noted the association between migraine head- 
ache and cold extremities (Dalessio, 1972) 

hypothesized that the voluntary i rease in 

olroyd et al., 1977; Vieni s 
shouts Miss d ol zc Бег temperature should be correlated with 
cus on the second stage 
ogical responding). (See Turk & Genest, 
for an extensive review of such multi- 
treatment approaches With pain pa. 17 

of biofeedback с ке bined finger-temperature-w. 
ess individuals" maladaptive а 
haviors in the natural en- (Schultz & Luthe, 1959) 
count for (a) the equivocal AP E. 
relatively high number of bo by сава 
out of biofeedback treat- 4 M 9. relaxation 
as well (у the individual (66 “3 Mes | E en 
ability to beneft f Eon). 
1 54 1977; Mei seat The temperature biofeedback is designed 
1977; Turk & Genest, 1979), sue the patient learn 
the enthusiasm for the thera- 
ion of frontalis EMG biofeed- 
mtrol of muscle contraction 
ds the available evidence of ferential tem 
the efficacy of this approach as compared to and index finger is fed p, 


on. In designing a 


treatment for migraine, these “authors com- 


and self-instruction 


to vasodilate the 


blood vessels, Sensitive thermisters are 


аск to the patient. 
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Patients are instructed to use the autogenic 
phrases to induce increased finger temperature. 
- In essence the training is designed to teach 
the patient to abort the vasospastic phase of 
the migraine attack. 
first attempts to use bio- 
control migraine, 
(1972) provided 


without the biofeedback apparatus. 
On the basis of self-reports of the amount 
d and frequency 


74% of the patients, as assessed by psychol- 
ogists, from the i 
feedback training. However, of 
sample of patients, adequate clinical ratings 
were available on 62 (again, the number of 


confirm some degree of clinical improvement 
in 29% to 39% of the original sample of 75 
patients. The Sargent et 
contrasted to the results of à study conducted 
by Mitchell and Mitchell (1971) that reported 
significant improvements in migraine headache 
patients treated with 


relaxation and other behavioral approaches. 


The methodology 
(1972) study. has been seriousl 
by Blanchard and Young (1974). They note 
the following : 


/The procedures for evaluating hand-warming are un- 
satisfactory for thre (1) little or no data 
g on results are given and it is reported that the post- 
ЕК: ent results do not reach statistical significance; 
(2) treatment package itself is a mixture of several 
factors, suggestion, relaxation training and 
training, any OF all of which may have accounted for 
the results; and (3) no-treatment OF attention-placebo 
treatment control groups were not included. (p. 


Several other studies have investigated the 
relative efficacy of biofeedback training for 
the amelioration of migraine headaches (Friar 
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& Beatty, 1976; Kewman, 1978; Medina, 
Diamond, & Franklin, 1976; Mitch, McGrady, 
& Tannone, 1976; Turin & Johnson, 1976) 

Mitch et al. (1976) employed Sargent et 
al.’s (1972) autogenic feedback training with 
20 migraine patients. Training was conducted 
over a 12-week period. During the first month 
of training, patients were instructed to practice 
with the biofeedback device 30 minutes each 
day. In the second month subjects were in- 
structed to practice daily and whenever they 
identified the onset of headaches. 
during the last month of training, patients 
were to practice 1-2 times per week, at the 
onset of headache, and on the evening of the 
day a headache had occurred. For 10 patients, 
follow-up was conducted 6 months after the 
completion of training. 

During the training period, patients main- 
tained records on four dependent measures: 
duration, frequency, and intensity of head- 
aches and amount of medication taken. Pa- 
tients were also asked to report on perceived 
changes of symptoms at the end of the treat- 
ment period. These ratings were contrasted 
with reports of headache and medication use 
during the 6 months prior to training. Judg- 
ment of the efficacy of this procedure was 
based on improvements on the subjective 
headache incidence and medication use reports. 
The authors reported that 65% of the pa- 
tients improved on two OF more of the de- 
pendent measures. At the 6-month follow“, 
9 out of 10 patients reported average to ex- 
cellent improvement compared to that in the 
6 months preceding treatment, which suggests 
an ability to control headaches over an ex- 
tended period of time. 

The criticisms offered by Blanchard and 
Young (1974) concerning the Sargent et al. 
(1972) study (failure to identify the effective 
components of the treatment, failure to employ 
adequate controls, and failure to provide 
adequate statistical analysis) can be applied 
to the Mitch et al. (1976) study as well. 
The Mitch et al. study has а number of 
additional flaws that make interpretation of 
the results tenuous. The efficacy of the auto- 
genic training was based on retrospective 
self-reports of headache incidence and cannot 
be considered а valid baseline against which 
to compare treatment efficacy. No attempt 
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was made to measure forehead or finger 
temperature prior to, during, or following 
training. Thus determination of the influence 
of the autogenic feedback training on under- 
lying physiological processes cannot be estab- 
lished. Selection of the 10 patients included 
in the follow-up is not specified by the authors, 
and thus sampling bias may be introduced. 
In sum, the study is inadequate on a number 
of grounds and does not permit any con- 
clusion regarding the efficacy of autogenic 
feedback training per se. 

Medina et al. (1976) reported retrospective 
data on 27 patients with migraine or mixed 
migraine and muscle contraction headaches, 
In this study, 
EMG training, 
phrases, 


autogenic 
contributed 


the treatment. The Same methodologica] 
Weaknesses found in the previous studies apply 
to the Medina et a]. study. 

The relative contribution of the autogenic 
phrases, a component of the autogenic feedback 
training package that was used in these three 


1976; Sargent et al., 1972), was examined by 
Turin and Johnson (1976). Seven Patients 


week baseline and throughout the study. 
Following the baseline period, the biofeedback 
а 6-14-week 
period. During each session the first 25 
minutes was devoted to acclimating the pa- 
tients to the apparatus. Finger temperature 
was recorded during this habituation Period 
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and during the 20-minute biofeedback trainj 
period. In addition, patients were instruct 
to practice the temperature control skills, 
twice a day and at the first sign of a headach 
In contrast with the Sargent et al, and Mitch 
et al. studies, no portable device was provided 
for the patients to use in home practice, 

Turin and Johnson (1976) provided three 
of their seven patients with temperature 
cooling training for 6 weeks prior to tempera- 
ture warming training. The authors reasoned 
that if both cooling and warming produced 
significant effects, then a placebo expectancy. 
hypothesis could explain the data, 

All seven of the patients learned the pe- 
ripheral warming task rapidly. Of the three 
patients who received the temperature cooling 
training, none showed clinical improvement 
under this condition. Subsequent to the tem- 
perature warming training, all patients re- 


ported significant reductions in both headache ' 


incidence and amount of medication taken. 
These data are consistent with case studies 
that used analogous temperature cooling 
training and temperature warming paradigm 
(Johnson & Turin, 1975; Wickramasekera, 
1973). 


ing, but they self-monitored and recorded 


headache incidence cannot be attributed un- 
equivocally to the specific effects of tempera- 
ture warming biofeedback training. 


of the Turin and Johnson study question the 
hypothesis that th oe 


Surprisingly, all three ' 


ned: 


ж 


period. All three groups 
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trols, and by the failure to provide correlations 
between temperature reduction and decrease 
in headache incidence. 

Thus the exact relationship between pe- 
ripheral finger temperature and migraine 
headache remains obscure. In the Turin and 
Johnson (1976) study, reduction in finger 
temperature accounted for at most 25% of 
the variance in headache improvement and 
was essentially unrelated to decrease in the 
Such findings suggest 
that nonspecific treatment effects associate 
with finger i ini 
for a significant portion 
tained with this treatment (Miller, 
Shapiro & Surwit, 1976). 

Finally, in а carefully controlled study, 
Andreychuk and Skriver (1975) compared 
finger-temperature biofeedback, biofeedback 
for electroencephalograph (EEG) alpha en- 
hancement, and self-hypnosis. The alpha en- 
hancement group Was included as an atten- 
tion-placebo control that included the use of 
the biofeedback ritual (actually а rather 
powerful, potentially useful approach, used 
with some limited success by Gannon and 
Sternbach, 1971, in a single case study of 
migraine control). Andreychuk and Skriver 
obtained baseline headache information for 
the 6 weeks preceding train- 
then received 


ing. Each of the three groups 
10 45-minute therapeutic sessions, and ра- 
tients were as ed to practice the various 


skills at least twice а day between Jaboratory 
sessions. The con- 
trasted with 


reductions in the 


the groups. 
was collected, and thus no à 
the maintenance of headache reduction can 
be made from the Andreychuk and Skriver 
study. to provide 
information about alterations in physiological 
activity as а function 
regimens. 
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The hypothesis that the efficacy of the 
temperature feedback training is attributable 
to the specific biofeedback training does not 
receive support from the Andreychuk and 
Skriver (1975) study. All three treatment 
groups shared a number of components, 
namely, relaxation, home practice, an ex- 
pectancy for relief, and a fostering of a sense 
of control, each of which may have engen- 
dered change. Interestingly, Andreychuk and 
Skriver assessed the hypnotic susceptibility 
of each patient and noted that the degree of 
headache improvement reported was strongly 
related to hypnotizability. This correlational 
data raises some intriguing possibilities re- 
garding the effects of individual differences in 
suggestibility on the efficacy of biofeedback 
training. Further examinations of the rela- 
tionship between such individual difference 
measures and biofeedback training need to be 
conducted before any firm conclusions can 
be drawn. 

Friar and Beatty (1976) used a different 
approach to the control of migraine headaches 
with biofeedback techniques. In contrast to 
the other studies reviewed, Friar and Beatty 
did not use finger temperature biofeedback 
training as their experimental treatment. 
Rather, patients were trained to decrease 
pulse amplitude in either the head (experi- 
mental group) or à peripheral site (hand 
control group). The authors inferred that 
training to reduce pulse amplitude in the 
peripheral site should produce only non- 
specific effects and would not influence mi- 
graine incidence. 

Nineteen migraine sufferers were included 
in the Friar and Beatty (1976) study. Baseline 
ratings of the frequency and intensity of 
headache and amount of medication were 
obtained from all patients. The training con- 
sisted of eight sessions extended over à 3-week 
period. No specific instruction for regular 
home practice appears to have been offered. 
A ninth, no-feedback session was conducted 
to assess patients' ability to control response 
independent of feedback. In this session, 
patients were instructed to produce the vaso- 
constriction that they had learned in the 
laboratory whenever they became aware of 


developing headaches. 
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Friar and Beatty (1976) reported no sta- 
b tistically significant differences between the 
> two groups in the number of headache episodes 
or in the rating of mean intensity of headache 
episodes. The groups did differ significantly 
in the number of major migraine attacks 
(defined as lasting 3 hours or more); the ex- 
perimental patients had fewer headaches. The 
authors suggest that the purpose of training 
in vasoconstriction is to “abbreviate the 
headache attack rather than prevent the 
onset” (p. 51). 

No correlational data between pulse am- 
plitude and headache incidence is presented 
by Friar and Beatty (1976), nor do they 
present any information about pulse amplitude 
before or after training, which limits the de- 
termination of the relationship between pulse 
amplitude and intensity of migraine. It is not 
clear if the migraines were reduced signifi- 
cantly during training or whether the reduc- 
tion was maintained after treatment. No long- 
term follow-up was reported. Although the 
Friar and Beatty approach looks promising, 
it awaits replications with more attention to 
maintenance of effects. 

Table 2 contains a summary of studies that 
used biofeedback to ameliorate migraine head- 
aches, A pattern of results similar to that 
observed with muscle contraction headache 
is also evident in the biofeedback studies with 
migraine headaches. Although biofeedback 
techniques do seem to reduce the incidence 
of migraine headache, training in muscular 
relaxation produces similar results. 

An underlying feature of these biofeedback 
studies is the hypothesis that learned vaso- 
motor control is central to effective treatment 
of migraine headaches. Unfortunately the ex- 
perimental uncertainties and the lack of quanti- 
fiable data, and necessary control procedures 
present serious difficulties, making endorse- 
ment of biofeedback methodologies tentative 
at best. One problem in particular stems from 
the failure to demonstrate that reduction in 
headache incidence is correlated with altera- 
tion of peripheral vasodilation or pulse am- 
plitude, a premise on which the biofeedback 
training is based. The absence of such demon- 
strated relationships should qualify endorse- 
ments of the therapeutic efficacy of biofeed- 
back treatments, Such caution is indicated 
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by the absence of (a) a careful consideration 
of the placebo effect and expectancy demands 
and (b) comparisons with other less expensive 
and more readily available treatment ap- 
proaches (e.g., relaxation and self-control pro- 
cedures). Many important issues remain un- 
resolved, for example, evaluation of the 
training parameters, duration of treatment 
effects, and the role of individual differences. 


Chronic Pain Other Than Headache 


In contrast to the host of studies that 
examined the effects of various biofeedback 
approaches for headaches, relatively few in- 
vestigations have applied biofeedback tech- 
niques to other forms of chronic pain, Un- 
systematic case studies that reported the 
efficacy of biofeedback for chronic pain have 
been offered by Coger and Werbach (1975) 
and by Gentry and Bernal (1977). Other 
investigators have incorporated biofeedback 
into a variety of other procedures but have 
not assessed the contribution of the biofeed- 
back training alone (e.g., Gottlieb et al., 1977; 
Newman, Seres, Yospe, & Garlington, 1978; 
Seres & Newman, 1976; Swanson, Swenson, 
Maruta, & McPhee, 1976; Khatami & Rush, 
Note 1). 

Recently, two studies (Hendler, Derogatis, 
Avella, & Long, 1977; Melzack & Perry, 1975) 
have specifically examined the efficacy of 
biofeedback with groups of chronic pain 
patients. Hendler et al. used EMG biofeedback 
with chronic pain sufferers. They used frontalis 
EMG training for two reasons. First, Bonica 
(1974) had suggested that stress and anxiety 
could induce reflex muscle spasm, vasomotor 
changes, and local ischemia and thereby ex- 
acerbate pain syndromes that involved mus- 
cles, tendons, and reflex muscle spasms. 
Second, Budzynski and Stoyva (1969) had 
suggested that frontalis EMG relaxation was 
an indication of generalized muscle relaxation. 

Thirteen patients suffering from a variety 
of pain syndromes were treated by Hendler 
et al. (1977) with five sessions of frontalis 
EMG feedback training. At a 1-month fol- 
low-up 6 patients reported that they were 
obtaining continued relief. The other 7 pa- 
tients reported no benefit from the biofeedback 
training. No control procedures were used, 
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nor was specific information provided regard- 
ing the length of baseline and initial post- 
treatment and follow-up levels of EMG. 
Hendler et al. reported the absence of a sig- 
nificant correlation between muscle tension 
and reduction of pain. The absence of a rela- 
tionship questions the conclusion that frontalis 
EMG training contributed to pain reduction. 
Hendler et al. concluded that 


the beneficial effects of biofeedback for these responders 
may be explained in terms of an increased sense of 
mastery over their environment, which resulted in a 
reduction of obsessive concern about their somatic 
problems and improvement in their self-esteem as a 
result of their increased environmental control. (p. 508) 


Such a conclusion must be viewed only as a 
speculation, but a speculation that has been 
voiced by Lazarus (1977) and Meichenbaum 
(1977), who have also stressed the role of 
cognitive factors in biofeedback training. 
In the second study that examined the 
efficacy of biofeedback in the reduction of 
chronic pain, Melzack and Perry (1975) com- 
pared the relative effects of EEG alpha bio- 
feedback, hypnotic training, and a combina- 
tion of both alpha and hypnotic training. 
The hypnotic training focused on increasing 
relaxation, energy levels, and mental calmness 
and on a reduction in the level of worry 
prior to the patients becomi g upset. Six pa- 
tients received the alpha feedback, 6 received 
self-hypnosis training alone, and 12 patients 
received the combined training. All three 
groups showed increased levels of alpha ac- 
tivity. The groups who received combined 
biofeedback and hypnosis training or hypnotic 
training alone demonstrated substantial re- 
ductions in pain compared to baseline. Pa- 
tients who received the alpha training alone 
showed virtually no change in pain. These 
data suggest that the cognitive approach of 
hypnosis may be effective in reducing pain 
from unbearable to bearable levels but provide 
no support for the efficacy of EEG alpha 
biofeedback training as a tool in reducing 
pain. As Melzack (1975) suggested in the 
subtitle of his article that reviewed biofeedback 
approaches to reduce chronic pain, “Don’t 
Hold the Party Yet.” We concur! 
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Summary and Conclusions 


What conclusions can be drawn from the 
studies that examined the efficacy of bio~ 
feedback for the regulation of pain? 

1. The relationship between the experience 
of pain and the various physiological responses 
that biofeedback techniques are designed to 
control has not been established. Without the 
determination of some relationship between 
a physiological response and the experience 
of pain, the rationale for selecting a physio- 
logical function for voluntary control remains ў 
unclear. 

2. Although it seems likely that most indi- 
viduals can acquire some degree of voluntary 
control over autonomic functioning with bio- 
feedback training, there are large individual 
differences among subjects. Investigations of 
the utility of biofeedback for pain regulation 
suffer from both "patient and treatment ¢ 
uniformity myths" (Kiesler, 1966, p. 129). 
Simply stated, investigators have treated all 
patients, regardless of individual differences, 
with ostensibly the same biofeedback therapy. k: 

3. Many health care providers view bio- 
feedback methodology as some unified thera- 
peutic treatment. In actuality, biofeedback is | 
a generic term for a wide range of approaches, # 
which include some form of biological feedback 
that is intended to increase voluntary control 
of physiological responses. The question “Is 
biofeedback effective for regulating pain?" 
should be replaced by the questions “What | 
Combination of cognitive, behavioral, and * 
biofeedback approaches would benefit which i $ 
Patients, with what symptoms, under what 
circumstances, and at what expense?" The 
Present review questions the relative efficacy 
of biofeedback training in comparison with | 
other more readily available methods. The | 
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4. The active ingredients of biofeedback 
therapy have not been identified. The neces- 
Sary and sufficient components of such a 
multifaceted approach as biofeedback have 
not been established, 


AM Effective control procedures have not 
oe used in the majority of studies. The 
inclusion of appropriate control groups is 
essential in the study of such conditions as 
migraine and muscle contraction headaches 
for several important reasons. Two sources 
of rival hypotheses that can be offered to 
explain the efficacy of biofeedback training 
are the regression-to-the-mean problem and 
the placebo effect. Miller (1974) argued that 
patients are much more likely to seek treat- 
„ment when they are feeling worse. Since 
. physiological systems tend to fluctuate be- 
tween periods of exacerbation and those of 
Kk amelioration, it is possible to obtain a sample 
of volunteers whose pain will show reduction 
due to spontaneous fluctuation, regressing 
toward the mean level. This point also under- 
scores the necessity of incorporating extended 
d follow-up periods to assess relative efficacy of 
biofeedback training. 

Placebo effects are potent factors in any 
treatment of pain (Beecher, 1959; Evans, 
1974; Ostfeld, 1961; Shapiro, 1963) and are 

M especially potent in an impressive treatment 
such as biofeedback, with its complex me- 
chanical equipment. The important potential 

x role of placebo effects indicates the need for 
conducting double-blind studies that include 
credibility checks of patient expectancies (see 
Kazdin & Wilcoxon, 1976). The studies that 
have used feedback that involved temperature 
warming and cooling groups comprise a needed 

As Step in this direction. 

, 6. There is relatively little information 

4 available concerning generalization of learning 
from the laboratory or clinic to the natural 
environment over extended periods of time. 
In many of the biofeedback studies, patients 
are instructed to become aware of the onset 
of headaches or some other physiological 
condition, but little or no attention is directed 

b what this entails. A more careful analysis 
of these processes should enhance the gen- 
eralization process. A related issue is the need 
for more careful study of the dependent mea- 
sures that have been employed in biofeedback 
studies, Three classes of measures have been 
used, namely, measures of physiological 
changes such as EMG, self-report ratings 
(e.g., headache intensity), and amount of 

КУЯ Much more concern should be 
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directed at specifying the variables such as 
demand characteristics, scoring formats, and 
so forth, that influence the latter two mea- 
sures. A much more careful analysis of these 
indices will provide implications for treatment 
interventions (see Frederiksen, Lynd, & Ross, 
1978). 

7. It is clear that pain conditions cannot 
be viewed as independent phenomena repre- 
sented simply by maladaptive physiological 
responding. The particular physiological data 
gathered on any given day is a response de- 
termined by situational and diurnal variations 
and complex physiology and personality char- 
acteristics, as well as the particular environ- 
mental context. All of these factors need 
careful research consideration, particularly in 
view of the stress-producing potential of vari- 
ous environments (Insel &* Moos, 1974). 
Research is needed to evaluate the inter- 
actions between situational effects and the 
person variables that contribute to individual 
differences in response to stress. 

8. The focus of biofeedback training has 
been on increasing one's awareness of mal- 
adaptive physiological response by means of 
feedback and on developing voluntary control 
by conscious effort. As with any therapeutic 
regimen, biofeedback requires compliance over 
time. The question at the simplest level is 
how to motivate the patient to spend the 
necessary time practicing the desired behavior, 
especially when the novelty wears off. The 
problems inherent in convincing patients to 
continue to use prescribed medical or training 
regimens has been discussed by Agras (Note 2), 
Blackwell (1972), and Marston (1970). 

9. Biofeedback training places the greatest 
emphasis on maladaptive physiological func- 
tioning. But this is a too restrictive view of 
pain. In pain syndromes, consideration must 
also be given to the patient's coping patterns 
and life style (Genest & Turk, 1979). Teaching 
voluntary control of physiological functioning 
may not be sufficient, since patients not only 
must control their physiology but they must 
be capable of dealing effectively with their 
environment (Shapiro & Schwartz, 1972). It 
may prove more feasible to consider bio- 
feedback as an adjunctive technique to be 
used with other physiological and psycho- 
logical approaches rather than as the sole 
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treatment modality (cf. Genest & Turk, 1979; 
Gottlieb et al., 1977; Mitchell & White, 1977; 
Turk & Genest, 1979; Khatami & Rush, 
Note 1), Training in the use of self-control 
strategies should enhance the utility of bio- 
feedback techniques and should foster gen- 
eralization beyond the relatively non-stress- 
producing, quiescent laboratory setting. (See 
Goldfried & Trier, 1974; Meichenbaum, 1977; 
Meichenbaum & Turk, 1976; Turk, 1977, 
for examples of such self-control interventions.) 

10. The reviewed studies reveal a con- 
sistent lack of concern for the subjects’ ap- 
praisals of the biofeedback techniques. What 
does the patient think of the training, and 
how does the patient use the skills acquired? 
These questions are rarely addressed in the 
literature. (cf, Meichenbaum, 1976, for a dis- 
cussion of this físue.) Examination of patients 
who fail to benefit from biofeedback training 
and analyses of those patients who prematurely 
drop out of training would provide valuable 
information. The Presentation of group data 
often tends to obscure individual differences 

in the ability to benefit from the training. 
Examination of single subject data would help 
to answer some of the questions raised 
previously, 

In sum, the biofeedback studies reviewed 
do not yield consistent results. The evidence 
for the efficacy of biofeedback per se in 
reducing pain is marginal at best, resting 
mainly on case studies and poorly controlled 
research, 

In drawing this conclusion, it is important 
to recognize that the diagnosis of various t. 
of pain disorders is less than reliable. The 
lack of such reliability tends to limit the 
efficacy of any treatment, including bio. 
feedback. Although the present review focused 
on chronic pain, similar conclusions can be 
offered about the questionable value of bio- 
feedback with hypertension (Surwit, Shapiro, 
& Good, 1978) and even in the control of 
heart rate (White, Holmes, & Bennett, 1977), 
In some cases the evidence indicates that 
cheaper, more readily available relaxation and 

coping skills interventions are as effective or 
more effective than biofeedback training. 
Moreover, one cannot conclude from the ag- 
gregate of biofeedback studies that physio- 


logical feedback training per se is an indis- 
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pensible (or even necessary) part of the 

therapeutic regimen. The only conclusion акі 
seems warranted at this time is that опе ог! 

а combination of the components of bio. 

feedback training is an effective method of. 
pain regulation for some patients in certain 

situations. Studies to date have dealt with 

diverse populations, a variety of biofeedback 

approaches that contain a number of poten- 

tially active components, and a wide array 

of research designs. Consequently, generaliza- 

tions must be tentative. Current enthusiasm 
for biofeedback methodology must be tempered 

with an appreciation that the approaches are 
Still experimental, with many issues as yet' 
unexamined. Evidence for the widely ac- 
claimed benefits of biofeedback is lacking, 
and biofeedback should be considered only 
a research tool at this time. Caution must be 
maintained to prevent the misapplication of 
biofeedback techniques. 
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specialization of the two halves of the brain 
became known as cerebral dominance. 
Approximately 15 years later, Carl Wernicke 
described another language disorder (cited in 
Geschwind, 1972). Clinical symptoms included 
both quick articulate speech that was devoid 
of meaningful content and severe loss of un- 
derstanding of spoken verbal material. Post- 
humous examination of the brains of these 
patients revealed lesions located between 
Heschl's gyrus and the angular gyrus in an 
area adjacent to the cortical auditory region. 
This area, now known as Wernicke's area, was 
located in the left hemisphere for most pa- 
tients; damage to the equivalent area in the 
right hemisphere did not cause equivalent 
behavioral deficits. 
Wernicke proposed a model of how this area 
interacted with Broca's area to provide normal 
speech capabilities. The key points of this 
model are as follows: (a) When a word is heard, 
it is conveyed to the auditory cortex, then re- 
layed to Wernicke’s area, where comprehension 
occurs. If the word is to be spoken, it is further 
conveyed to Broca’s area via the arcuate 
fasciculus, a large band of fibers that connects 
the two regions. In Broca’s area the spoken 
form of the word is aroused and passed on to 
the motor area that controls the muscles of 
speech. (b) When à word is read, output from 
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the visual cortex is relayed to the angular 
gyrus and further to Wernicke's area. In 
Wernicke's area the auditory form of the word 
is aroused and processing continues as de- 
scribed above. In terms of clinical value, this 
model has been effective in predicting which 
areas of the brain are involved in specific 
language disorders (Geschwind, 1972). 

One of the most important findings in these 
early studies was that only one side of an 
individual's brain seems to be involved in 
language processing. For the vast majority of 
the patients studied, damage to the left hemi- 
sphere resulted in the language disorders de- 
Scribed, whereas patients with equivalent 
damage to the right hemisphere did not develop 
language deficits. For a small percentage of 
the population, the opposite condition exists; 
that is, language abnormalities develop only 
when there is damage to the right hemisphere. 

It was further noted that the great majority 

of right-handed persons were also left dominant 
for speech; that is, lesions in the left hemi- 
sphere of right-handed persons produced lan- 
guage disorders. However, cerebral dominance 
of the left-handed person was not nearly so 
well-defined; the left-handed person could be 
either right or left dominant for language 
functions. Currently accepted studies that 
correlate cerebral dominance with handedness 
indicate that 99% of all right-handed persons 
are left dominant for speech (Rossi & Rosadini, 
1967), whereas for approximately half of all 
left-handers, language functions appear to be 
localized in the right hemisphere (Goodglass & 
Quadfasel, 1954). It is also accepted that 
lateralization of cerebral dominance is less 
clearly defined in left-handed persons. In many 
behavioral studies this correlation between 
handedness and cerebral dominance has been 
used as a convenient means for assuming 
cerebral dominance in a subject. This assump- 
tion is probably more valid for right- than for 
left-handed persons. 

Since the landmark studies by Broca and 
Wernicke, a great deal of effort has been ex- 
pended in attempting to define the exact nature 
of cerebral dominance. A short, selective 
review of both physiological and behavioral 
studies is provided here, with emphasis on the 
research that provides a perspective for current 
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studies of asymmetrical processing of nop. 
verbal auditory stimuli. 


Physiological Studies 


The central auditory system in both huma 
and animals is physically a bilaterally p 
jecting system. However, most of the stimula 
tion received by either ear is conveyed to t| 
opposite temporal cortex via the strong contra 
lateral auditory pathways. Much of the i 
formation that does reach the ipsilater 
cortex is derived from input that has travel 
along the contralateral pathway and been ri 
directed back to the side of original stimul 
tion. The secondary and efferent pathways ai 
so complex that many details concerning (ће 
are still uncertain (Carpenter, 1976). Neve 
theless, it is clear that the auditory pathway 
function as a primarily contralateral system, 

Dominance is usually associated with th 
cerebral hemispheres only. Although much i 
formation on the anatomy of the brain h 
been derived from animal studies, research ol 
cerebral asymmetries using animal subjects i 
still considered by some writers to be o 
questionable value because many of the ! 
havioral manifestations of cerebral dominanc 
like language function, are absent in animals 
Summarizing the evidence presented by severa 
investigators concerning lateralization in ani 
mals, in particular "pawedness" in cats, rats 
and primates, Jung (1962) concluded thai 
"real hemispheric dominance does not occur i 


cut. His inference was that birdsong is primarily} 
controlled by the left side of the brain in 
these birds, 


related with the functional asymmetries noted 
earlier. For many years it was generally ac- 
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cepted that the language dominance found in. 
) humans is not associated with any significant 
differences in anatomy between the right and 
left hemispheres. Von Bonin (1962) reviewed a 
large number of studies on this subject and con- 
cluded that “these morphological differences 
are, after all, quite small. How to correlate 
these with the astonishing differences in func- 
tion on the left side, is an entirely different 
question" (p. 6). 

In 1968 Geschwind and Levitsky published 
the results of their postmortem examinations 
of 100 adult human brains. The hemispheres 
were divided and each hemisphere was sec- 
tioned along the plane of the sylvian fissure to 
expose the upper surface of the temporal lobe. 
They found that the planum temporale was 
larger on the left for 65% of their specimens 
and on the right for 11% (р < .001), but 24% 
of the specimens showed equality of the two 
sides. This is important because the planum 
temporale contains the auditory association cor- 
tex. (The primary auditory cortex is located in 
Heschl's gyrus.) In the dominant hemisphere, 
these regions of the auditory association cortex 
are the classical Wernicke's area. Unfortu- 
nately, no information was available about 
the cerebral behavioral dominance or handed- 
ness of the living subjects. However, since 
93% of the adult population is right-handed, 
the authors concluded with some small 
tolerance that they were probably working 
with left-dominant specimens. Their conclusion 
was as follows: 

Our data show that this area is significantly larger on 
the left side, and the differences observed are easily of 


sufficient magnitude to be compatible with the known 
functional asymmetries. (Geschwind & Levitsky, 1968, 


p. 187) 


Recently, Yeni-Komshian and Benson (1976) 
have reported that chimpanzee brains have a 
similar asymmetry but to a lesser degree than 
do human brains. The brains of rhesus monkeys 

‘did not show any significant differences be- 

‘tween the right and left temporal lobes. Since 
there have been demonstrations of some lan- 
guage capability among chimpanzees, these 
investigators suggested that neuroanatomical 
asymmetry may be a prerequisite for language 
functions. 

To date, however, the Geschwind and Levit- 
sky (1968) research is the most definitive study 
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available on the anatomical asymmetries that 
may be associated with the functional differ- 
ences found in human clinical and behavioral 
studies. 


Behavioral Studies 


The behavioral studies of the role of cerebral 
dominance in auditory perception are almost 
exclusively dichotic-listening experiments. Di- 
chotic listening is a technique in which different 
inputs are simultaneously delivered to the 
right and left ears. 

Kimura (1961) adapted the dichotic method 
to the study of the cerebral dominance effect in 
audition. She showed that when two digits are 
simultaneously presented to the two ears of a 
normal subject, digits arriving at the ear con- 
tralateral to the dominant hemisphere are 
more readily recognized than those arriving at 
the ipsilateral ear. At the end of her study, she 
concluded that the crossed auditory pathway 
from the contralateral ear to the speech hemi- 
sphere is more effective than the slightly 
smaller uncrossed pathway from the ipsilateral 
ear and that the dominant temporal lobe is 
more important than the nondominant tem- 
poral lobe in the perception of speech. 

In a later study Kimura (1967) noted that 
in auditory studies, a cerebral dominance 
effect is evident only when there is simul- 
taneous input to the two ears, that is, when 
dichotic presentation is used. Laterality, or 
ear superiority, is not evident when identical 
material is presented monaurally. If there is 
only one source of stimulation at a time, each 
ear performs equally well. Summarizing the 
difference between the dichotic and monaural 
studies, Kimura proposed that the cerebral 
dominance effect is the result of competition 
between simultaneous inputs to opposite cere- 
bral hemispheres. When there is dichotic 
stimulation to the two hemispheres, competi- 
tion between the coincident stimuli auto- 
matically occurs. Superior responses to the 
stimuli presented to the dominant hemisphere 
result from the conflict between the disparate 
perceptions of the two hemispheres. 

This conclusion went somewhat beyond the 
data presented, and shortly thereafter several 
efforts were made to disprove it. In particular, 
two major objections were tested. The first 
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objection to Kimura’s (1967) hypothesis 
focused on the role of memory. Laterality in 
response might be due to asymmetries in recall 
rather than to asymmetries in perception. The 
objective, then, was to separate the perceptual 
from the storage or response phases of the 
dichotic listening method. 

Bryden (1967) conducted a series of experi- 
ments to study this issue. He suggested that 
the material from the ear that was reported 
first would be identified more accurately than 
material from the other ear. This would occur 
because the time elapsed decreases the ac- 
curacy of memory for the second channel. 
Therefore, a tendency to consistently report 
material from a preferred ear would account 
for the laterality effect, even if initial percep- 
tion of material to both ears were equal. He 
examined the difference between free recall, 
in which the subject was allowed to report 
material from either ear at will, and ordered 
recall, in which the subject was required to re- 
port all material from one or the other ear first. 

He found a high correlation between right- 
ear advantage in free recall and right-ear 
dominance in ordered recall. His data indicated 
that material presented to the right ear was 
more accurately identified than material pre- 
sented to the left ear. (He had primarily used 
right-handed subjects.) This was true for all 
rates and list lengths investigated. When the 
two ears were compared as channels of imme- 
diate recall, the right ear was superior to the 
left. Also, the right ear was better as a storage 
channel than the left. In addition, there was a 
general tendency to report the right ear first. 
He concluded that the data supported the 
notion that right-ear superiority is due to a 
perceptual difference rather than to an order 
effect. This conclusion clearly supports Ki- 
mura’s (1967) hypothesis that the cerebral 
dominance effect is a result of competition 
between simultaneous inputs to the two 

cerebral hemispheres. 

The second objection to Kimura’s (1967) 
hypothesis concerned the role of attention in 
the results of dichotic listening studies. 
Bryden (1969) tested a hypothesis that the 
laterality effect obtained by Kimura was due 
to division of attention rather than to com- 
petition of simultaneously arriving stimuli. In 
the first experiment subjects listened to mon- 
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aural stimuli but had no prior knowledge of 
which ear would receive the next stimulus - 
(monaural presentation with division of atten- 
tion). Responses in these conditions showed 
no laterality whatsoever. 

Bryden (1969) further tested his subjects 
using dichotic input under two different condi- 
tions. In the first condition, subjects were told 
to which ear to attend; therefore, they osten- 
sibly were attending to only one channel while 
receiving competitive stimulation through both 
ears. In the second condition, subjects were 
not told to which ear they should attend; this 
condition offered both stimulus competition 
and division of attention. In both of these 
conditions a statistically significant laterality 
effect was obtained. To summarize, regardless 
of instructions or deliberate direction of atten- 
tion, lateralization of response occurred in the 
dichotic listening paradigm. Bryden concluded 
that these results supported Kimura's (1967) 
hypothesis that the laterality effects obtained 
in dichotic listening experiments are due to 
signal competition rather than to attention 
factors. 

Another variation of the dichotic listening 
studies of cerebral asymmetry is the research 
in which pathological subjects have been used. 
Milner, Taylor, and Sperry (1968) found 
that right-handed commissurotomized patients 
(those who have surgical disconnection of the 
cerebral hemispheres because of epilepsy or 
other reasons) could not report verbal input to 
the left ear if a different verbal input was si- 
multaneously delivered to the right ear. How- 
ever, all known auditory pathways remained 
intact and the subjects could report with total 
accuracy monaural input to either ear. 

The results of this study were duplicated by 
Sparks and Geschwind (1968). A review of the 
data led them to propose another model for 
dichotic auditory asymmetries, which in- 
corporated Kimura's (1967) model but sug- 
gested in addition a callosal auditory pathway 
between the two cerebral hemispheres, This | 
model could account for the cerebral domi- 
nance effect evident in normal subjects and 
the left-ear suppression by right-handed com- 
missurotomized patients in dichotic studies. 
The main points of this model were as follows: 
_ 1. In dichotic listening contralateral ear 
input virtually suppresses ipsilateral input. 
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9. There is competition for report by the 
Jeft-hemisphere speech system between in- 
formation arriving directly from the right ear 
via the contralateral pathway and information 
from the left ear. 

3. Since information from the left ear has 
also traveled along a contralateral pathway 
to the right hemisphere, it must in addition be 
projected to the left hemisphere for report. 
This projection probably involves а callosal 
pathway. 

In a further study, Sparks, Goodglass, and 

Nickel (1970) used this model to explain data 
gathered from left-brain-injured aphasic pa- 
tients and right-brain-injured nonaphasic pa- 
tients. The right-brain-damaged group could 
not report the signals received by the left ear 
after listening to dichotic verbal stimuli. How- 
ever, the left-brain-damaged group was di- 
vided between those who experienced inhibi- 
tion of right-ear input and those who experi- 
enced inhibition of the left-ear input. One 
possible explanation for these results was that 
competition between signals received by both 
ears occurs exclusively in the left hemisphere. 
Therefore, they revised the earlier model to 
state that only damage to the left hemisphere 
can affect information from either the con- 
tralateral or ipsilateral ear. 

No review of the literature that deals with 
cerebral asymmetries and audition is complete 
without some mention of the studies that have 
functions of the minor 
hemisphere. Originally this work began by 
drawing an analogy from studies of cerebral 
asymmetries in vision. These studies showed 
that damage to the nondominant temporal 
lobe produced impaired performance on many 
visual, nonverbal tasks. A natural extension 
of this work was to determine if such а division 
of function also existed in the auditory 
modality. 

Milner (1962) examined the effects of tem- 

ral lobectomy on nonverbal auditory dis- 
criminations. Her subjects were left dominant 
for speech; in addition, each subject had à 
lesion in either the right or left temporal lobe. 
These subjects responded 
Measures of Musical Talents, whi : 
tests for pitch, loudness, rhythm, time, timbre, 
and tonal memory. Her data showed that the 


group with right temporal lesions made more 


concentrated on the 
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errors than the group with left temporal 
lesions. The difference between the two groups 
was strongest for tonal memory and timbre. 
"This research definitely indicates that the right 
hemisphere is strongly involved in processing 
certain types of musical sounds in left-domi- 
nant subjects. 

Later, Kimura (1964) verified these results 
using normal subjects. Her rationale was that 
if the nondominant hemisphere is more strongly 
involved in certain musical abilities than the 
dominant, then normal subjects who show 
right-ear advantage for verbal materials should 
also show a left-ear advantage for musical 
material. This reasoning was based on a 
knowledge of the strong contralateral auditory 
pathways and was consistent with her model 
for dichotic auditory processing. 

In preliminary tests, subjects listened to 
different numbers of clicks presented simulta- 
neously to both ears. They were required to 
report the number of clicks presented to each 
ear. The subjects responded with a small but 
nonsignificant bias in favor of the left ear. She 
then presented melodic patterns dichotically. 
Left-ear melodies were reported correctly 
significantly more often than right-ear melodies 
(p < 01). Kimura (1964) concluded on the 
basis of these data and Milner’s (1962) studies 
that the difference in function between major 
and minor hemispheres is along a verbal-non- 
verbal dimension. Kimura also noted that this 
asymmetry is obtained only in dichotic 
listening conditions. 

Kimura's model provided a simple, easily 
applied paradigm for explaining large groups 
of existing data and experimental results. One 
of the first aspects of the model to receive 
attention concerned the meaningfulness of 
stimuli. Curry (1967) investigated this prob- 
lem in a three-condition task using dichotic 
words (meaningful verbal), dichotic nonsense 
syllables (nonmeaningful verbal), and dichotic 
environmental sounds (nonverbal). His sub- 
jects were instructed to identify both stimuli 
in a free-recall paradigm. They obtained 
higher scores for right-ear stimuli when both 
words and nonsense syllables were used but 
higher left-ear scores with the nonverbal 
stimuli. This study shows that meaningfulness 
is not critical for the functional division ob- 
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tained in this and in other studies and supports 
Kimura's model. 

Shortly after this time, research began to 
appear which indicated that the situation is 
not a simple dichotomy of function. Studdert- 
Kennedy and Shankweiler (1970) presented 
data which suggest that consonants are pro- 
cessed by the left hemisphere, whereas vowel 
Sounds are processed by both hemispheres. 
The stimuli were Spoken consonant-vowel- 
consonant syllables presented in dichotic pairs. 
For any pair of stimuli, only the initial con- 
sonants, the final consonants, or the vowels 
differed. Subjects were tested separately for 
each of the three types of stimuli. They were 
told to report both initial consonants in the 
dichotic pair, both final consonants, or both 
vowels. Significant right-ear advantages were 
obtained for the initial (P < .001) and final 
(P < .01) consonants, Data showed mixed ear 
superiority for the vowel Sounds, which sug- 
gests that both hemispheres are involved in 
speech analysis, 

Studies that investigated the processing of 
nonverbal dichotic stimuli also report a com- 
plex division of function. These studies Suggest 
that different acoustical attributes of non- 
verbal stimuli are differentially Processed by 
the cerebral hemispheres. Spellacy (1970) 
found a significant left-ear advantage for 
dichotic melodies but found no significant 
difference between ears for timbre, temporal, 
Or frequency patterns. Stimuli used for the 
melodies test were unfamiliar violin solo melo- 
dies. Frequency patterns were composed of four 
500-msec consecutive tones. Each tone was of a 
different frequency and all tones were between 
440 Hz and 880 Hz. Temporal stimuli 
were tone pulses arranged in Morse code 
imuli consisted of single 
notes played on a pipe organ, using varying 
combinations of pipes. After listening to the 
dichotic test stimuli, subjects listened to 
binaural identification stimuli and then re- 
ported whether the identification stimulus 
matched either of the test stimuli. 

Gordon (1970) also attempted to separate 
the different acoustical qualities found in many 
nonverbal stimuli. He devised two tests for his 
subjects. The first consisted of melodies largely 
devoid of timbre and chordal properties, The 
second test consisted of electric organ chords 
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that were rich in timbre. The first test there 
fore varied pitch over time to produce melodies; 
the second test avoided melodic sequence 
Gordon’s results, in contrast with those of 
Spellacy (1970), showed no asymmetry in 
recognizing and reporting the melodies buf 
showed a strong left-ear advantage in the 
chords test. | 
One of the things that makes this report by 
Gordon interesting is his explanation of these 
results. He noted that it is impossible to rule 
out a hypothesis that rhythm may be lateral- 
ized to the left hemisphere: 


Pitch Processing 


Deutsch (1974) also reported some highly 
unusual data. She described perceptions of 
certain dichotic pitch patterns as “auditory 
illusions” because they differed considerably 
from the actual physical stimuli presented, She | 
reported that perception of these illusions was 
related to the handedness and presumed 
Cerebral dominance of the subjects. The 
illusion was based on opposing octave pitch 
patterns and is hereafter called the “octave 
illusion,” 

The octave illusion occurred under the 
following conditions, A sequence of tones was 
Presented to one ear, alternating in frequency 
between 400 Hz and 800 Hz. Tone duration 
there were no intervals 


in the right ear and a 800-Hz tone was pre- 
sented to the left ear. 


None of the 86 subjects accurately reported 
the stimuli, 


tendency to hear a single tone oscillating from 


~ 


Y subjects reported complex perceptions, 


39% of the left-handed 
for 
example, two alternating pitches in one ear 
with a third pitch intermittently in the other 
or two alternating pitches oscillating from 
ear to ear and two further alternating pitches 
localized at the back of the head. Right- and 
left-handed subjects differed significantly in 
the relative distribution of their percepts. 
Deutsch (1974) further reported that right- 
handed subjects had a significant tendency 
to report the high tones in the right ear and 
the low tones in the left ear. However, she 
indicated that this localization pattern often 
reversed with continued listening, much as 
perceptions of ambiguous visual patterns 
reverse. 

Not all investigators agree on the role of 
cerebral asymmetries in the processing of pitch 
information. In particular, Efron and Yund 
(1976) reported data in which subjects favored 
one ear or the other in reporting certain di- 
chotic chords but found no correlation between 
the favored ear and the cerebral dominance of 
their subjects. They described results of this 
type as an “ear dominance effect." 

A typical example of the type of stimulation 
used by Efron and Yund (1976) in many 
experiments was as follows: One ear received 
a 1900-Hz tone for 320 msec; after a 4-msec 
interval, a second 320-msec pure tone of 
1500 Hz was presented (high-low pattern). 
The opposite ear received simultaneous tones, 
with the order of the frequencies reversed (low- 
high pattern). Subjects reported whether the 
pitch of the first chord was higher or lower 
than that of the second. Describing their re- 
sults across many experiments, Efron, Dennis, 
and Yund (1977) stated that 


attern, too. However, 


although both frequencies comprising the dichotic 
chord are heard by all subjects with normal hearing, 
only one-third of the subjects hear the two frequencies 
with approximately equal salience. For another third of 
the population, the pitch mixture of the dichotic chord 
is unequivocally dominated by the frequency d-"vered 
to the left ear; for the final third of subjects, the fre- 
quency of the tone presented to the right ear dominated 
the pitch mixture of the dichotically presented chord. 


(p. 538) 


The distribution of these data does not corre- 
spond to any known factor related to cerebral 


dominance or handedness. In addition, Efron 
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and Yund (1976) investigated the relation of 
ear dominance to the handedness of their sub- 
jects. They found that there was no correlation 
with handedness when dichotic tones of about 
1700 Hz were presented to the subjects. 

They further suggested that the difference 
between their data and those reported by 
Deutsch (1974) might be due to the higher 
frequency of their stimuli. Deutsch used 
stimuli in the 400 to 800 Hz range, which is 
within the frequency range of speech vowel 
sounds. Although Stevens and House (1972) 
summarized data which show that important 
features of the speech frequency envelope range 
from 300 to 3000 Hz, Efron and Yund (1976) 
concluded that “correlation between ear domi- 
nance for pitch and handedness (hemispheric 
dominance for speech) exists only in the fre- 
quency band which carries speech information" 
(p. 898). We must assume that they were 
referring to the frequency band for vowel 
sounds. 

Such a hypothesis did 
Christensen and Gregory (1977) reported an 
experiment designed to examine the ear domi- 
nance effect in the speech vowel frequencies. 
They used 400- and 800-Hz tones in an experi- 
mental paradigm that closely resembled that 
used by Efron and Yund (1976). Subjects 
received two consecutive pairs of tones. Tones 
lasted 250 msec; one channel consisted of a 
400-Hz tone followed by an 800-Hz tone, 
whereas frequencies were reversed for the 
second channel. Most subjects perceived only 
a single tone from each of the dichotic pairs. 
This is similar to Deutsch’s (1974) results. 
However, in agreement with Efron and Yund’s 
results, they found no evidence of any type 
of a cerebral dominance effect in the responses. 

From the previous review it is apparent that 
there is in the literature а dichotomy pertaining 
to pitch processing. When some dichotic pitch 
stimuli are presented, asymmetries in response 
show a relationship with handedness; when 
other dichotic stimuli are used, no relationship 
between handedness and responses is found. 
Since both sets of stimuli were designed to 
study pitch processing, it is unclear why this 
happens. It is possible that one of the para- 
digms may be unwittingly contaminated by 
the inclusion of some factor other than simple 
pitch discrimination. 1f this is true, then there 


not go unnoticed. 
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is a possibility that the presence of a cerebral 
dominance effect in one group of data and an 
unrelated ear dominance function in the other 
group is due to some factor other than differ- 
ential pitch processing. 

There are two major dimensions present in 
any acoustical pattern: pitch and time. The 
two dimensions are probably inseparable on 
an absolute basis. When we speak of frequency, 
we describe sound as a certain number of 
Cycles per second. When we speak of any 
type of acoustical time or rhythm pattern, 
the elements of that pattern are composed of 
specific frequency profiles. In dichotic listening 
experiments, study of one dimension has been 
accomplished by exactly matching the other 
dimension in both ears, For example, in both 
the Deutsch (1974) paradigm and the Efron 
and Yund (1976) paradigm, the on-off times 
of stimuli to opposite ears are exactly syn- 
chronized, whereas only the frequency of the 
stimuli presented to each ear varies. 

However, there is another time-related 
variable that is introduced in comparisons of 
the two sets of stimuli: the number of fre- 
quency changes or transitions in each pattern. 
Halperin, Nachson, and Carmon (1973) found 
a shift in ear advantage related to the number 
of frequency and duration transitions within 
temporally patterned nonverbal stimuli. They 
found that as the complexity of the pattern 
increased (the number of transitions within the 
pattern), there was a shift from left- to right- 
ear superiority with right-handed Subjects. 

In the Deutsch (1974) paradigm there are 
approximately 20 frequency transitions in each 
channel within a 5-sec stimulation period, 
Deutsch and Gregory (1978) have confirmed 
that no cerebral dominance effect is obtained 

when the octave stimuli are not part of long 
repetitive sequences of tones, The number of 
transitions in the Efron and Yund (1976) para- 
digm is much smaller. In view of the data 
presented by Halperin et al. (1973), the 
strong cerebral dominance effect achieved with 
the Deutsch octave stimuli may be due to the 
high number of transitions, If this is true, 
then the cerebral dominance effect evident in 
the octave illusion may depend more Strongly 
on time variables than on pitch variables. 
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Rhythm Processing 


Relatively little is known about the way in: 
which dichotic time patterns are processed! 
Milner (1962) found a significant increase in 
error scores after right temporal lobectomy oy 
the time test of the Seashore Measures of 
Musical Talents, which require the subject 
to judge the relative duration of two consecu. 
tive tones. Although this is not a dichotic 
listening task, Milner's data suggest that the 
right temporal lobe is involved in duration 
discrimination. 

Spellacy (1970) reported that there was no 
preference for either ear when subjects re- 
sponded to dichotic Morse code patterns, 
However, in his experiment the two channels 
were presented at different frequencies (1000. 
Hz and 1500 Hz) to maximize discrimination, 
It is possible that by introducing frequency 
differences between the ears, Spellacy's data 
reflect the complex interaction between time 
and frequency suggested by Gordon (1970). 
In other words, an opposing cerebral domi- 
nance effect for each of these variables might 
cancel the effects of both. 

Robinson and Solomon (1974) presented 
right-handed subjects with dichotic rhythm 
patterns that consisted of from four to seven, 
short or long, sine wave pulses. The dichotic 
patterns were presented to subjects at matched 
frequencies, with simultaneous onset and offset 
of the two patterns. The data supported a 
hypothesis that thythm is processed by the 
left hemisphere in most right-handed sub jects. 
A similar hypothesis was satisfactorily de- 
fended by Gordon (1978), using rhythmic ele- 
ments in dichotic melodies. This is interesting 
in view of the earlier model for dichotic listen- 
ing proposed by Sparks et al. (1970). In this 
model competition between signals received 
by both ears occurs exclusively in the left 
hemisphere of left-dominant sub jects, 

An attempt to interpret the results of the 
thythm experiments meets with a number of 
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subjects. However, there is no concrete reason 
to assume that the right-dominant person 
processes information as does the left-dominant 
person, except with the directionality reversed. 
To the contrary, Corballis and Beale (1976) 
suggested that left-handedness, except in cases 
of pathology, is due to the cancellation. of 
asymmetry, not its reversal. This implies that 
those persons normally described as right 
dominant should more accurately be classified 
as bidominant. If this is true, then information 
processing and related performance of the left- 
handed population cannot be inferred from 
experiments that use only right-handed sub- 
jects. One method of determining the exact 
contribution of asymmetrical function to the 
performance of the rhythm tasks might be to 
compare the responses of strongly dominant 
(right-handed) subjects to those of bidominant 
(left-handed) subjects. This has not been done. 

In addition, many of the cerebral dominance 
effect experiments infer cerebral lateralization 
from responses that require the accurate 
lateralization of auditory stimuli. This is à 
questionable practice. The cues that are 
normally used to determine laterality in audi- 
tion are phase and intensity differences between 
the stimuli to the two ears. In most experi- 
ments, intensities of the stimuli to the two 
ears have been matched. However, phase has 
not been controlled in any rhythm experiment 
that has been reported in the literature ; in fact, 
it is impossible to match phase when using 
stimuli of different frequencies Or verbal 
stimuli. 

То summarize, there is no way to determine 
which aspects or amounts of reported laterality 
or favoritism are due to differences in the 
acoustical attributes of the stimuli arriving 
at the two ears and to determine how much 
laterality is due to differential cerebral pro- 
cessing of those stimuli. It would be easy to 

the right-ear preference of left- 
dominant subjects for rhythm may be due to 
some sort of bias toward the phase information 
presented to that ear. This would not under- 
mine the validity of the results of experiments 
in the current literature; it does, however, 
illustrate the type of problem that is faced 
when one attempts to interpret these results. 


suggest that 
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Continued studies of rhythm may increase 
our understanding of many complex cogni- 
tive functions. The elements that constitute 
rhythm, such as sequence, order, and relative 
duration, are probably not limited to the 
auditory modality. Colavita (1977) supported 
a hypothesis that the insular temporal cortex 
in cats (which is equivalent to Wernicke's 
area in the dominant human hemisphere) is 
a polysensory association area with special im- 

rtance for the perception of temporal pat- 
terns. Although he found no evidence of 
asymmetrical hemispheric function in the 
animals, he concluded that temporal pattern 
discrimination appears to represent the basis or 
antecedents of the higher order perceptual 
abilities in humans that are not specific to any 
single modality. 

Neisser (1967; see summary, PP- 279-305) 
anticipated this hypothesis. He suggested that 
rhythm provides а framework to which verbal 
information can be attached and that such a 
framework serves both to integrate incoming 
information and to expedite recall. Investiga- 
tion of such а hypothesis has proved difficult 
because of the logistical problems of measuring 
rhythm components in verbal information. 
In view of the more recent research, it may be 
that rhythm structure provides a framework 
not only for verbal material, as Neisser pro- 
posed, but also for the synthesis and analysis 
of all incoming perceptual information. 


Conclusions 


1. Pitch perception is probably а direct 
result of the frequency properties of the stimuli. 
Therefore, pitch sensation alone does not re- 
quire differential cerebral processing. Only 
when some type of novel or complex time 
structure is generated in the stimulus presenta- 
tion are the responses of subjects influenced by 
handedness or cerebral dominance. 

2. The cerebral dominance effect evident 
in the results of the few existing rhythm experi- 
ments closely parallels that found in verbal 
experiments. Since adequate stimulus control 
is easier to achieve when using tonal stimuli 
than when using verbal stimuli, results of 
further rhythm experiments may add sub- 
stantially to our knowledge of language. In 
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addition, recent reports suggest that rhythm 
Structure may also provide a framework for the 
synthesis and analysis of all incoming per- 
ceptual information. If this should prove true, 
it is difficult to assess the effects it might 
have on our knowledge of human information 
processing. 

3. A review of this area serves best to 
emphasize what we do not know. There is 
currently a trend in the literature toward 
Separate investigation of pitch and rhythm 
variables. This is probably an excellent ap- 
proach, although it may be difficult or im- 
possible to separate them operationally. How- 
ever, it should be possible to define the pro- 
cessing mechanisms peculiar to each variable 
through careful, controlled experimentation. 
Only when this elementary information is 
obtained can we hope to make any definitive 
statement about how they interact in complex 
stimuli such as speech signals, Renewed effort 
should produce sizeable increases in our 
knowledge in the near future. 
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In an earlier article (Olson, 1976), I Pre- duced the excessive variance in the one 
sented evidence from Power and robustness anomalous group was concentrated [D = C(d)] 
Studies that the Pillai-Bartlett trace Statistic in a single canonical Variate, error rates for 
V should be favored for general use as the 7 ; W, and V remained about equally close to 
test statistic in multivariate analysis of the Nominal level, although V was most often 
variance (MaNova) and that Wilks's likeli- closest. When the anomalous group came from 
hood ratio W and the Hotelling-Lawley trace а population more variable in all dimensions 
T could be considered equivalent toVinvery (D = 41), the V test was almost always dis- 
large samples, Now Stevens (1979) has argued — turbed less than its rivals, Exploring the notion 
(a) that except in certain extremo cases, Tand that y generally performed no more than 1.5 
W are nearly as robust as V, (b) that such and 25 Percentage points better than I’ and 
exceptions occur very infrequently in real T, Tespectively, Stevens found that “eight of 
data, and (c) that the y test should be used the nine cases in Table 1 in which the differ- 
when there are Population mean differences ences in error rates are larger. . . correspond to 
In several canonica] dimensions of the multi- Very large subgroup variance differences on all 
variate space, but any of the V, W, or T variables (36])» (p. 355). There are two 
Statistics may be used when the Population major reasons for this finding. 
differences are concentrated in one dimension. 1. Stevens omitted some of Olson’s ( 1973) 
The following discussion of these arguments examples of heteroscedasticity of the noncon- 
Serves to clarify some Practical considerations centrated type (D = dl). Table 1 of the 


W by Stevens's criterion 


vantage is large enough to make much differ- 26 cases, and ofthe Foe Eus i21 ofthe 


shows empirical Type I error rates for 45 the sample s; j 
examples of heteroscedasticity from Olson амо vey large relative to HE 


(1973). When the contamination that pro- 


Requests for reprints should be sent to Chester І, еПСез were not extreme (d = 4 and 9). 
Olson, Department of Psychology, Camrose Lutheran 2. Olson (1973) examined far more cases 
College, Camrose, Alberta, Canada T4y 2R3. with d = 36 than with any other value of d. 
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Table 1 
Empirical Type I Error Rates for W and V at the Nominal .05 Level When Variances in 


One Group Were d Times the Variances in the Other Groups 
d-4 d=9 d = 36 
LIU TE Б ЕЕЕ 
No. of No.of Group Differ- Differ- Differ- 
variables groups size W V ence w V ee W V ence 
2 6 50 163 162 001» 
2 10 5 235 189 046° 
2 10 10 208 194 O14 
3 3 5 080 067 013 134 093  041* 245 163 082° 
3 3 50 098 094 004 
3 6 5 088 070 018° 162 100 062° 289 162 127* 
3 6 10 092 077 015° 146 124 022° 224 186 038° 
3 6 50 173 164 009^ 
3 10 5 307 167 140° 
3 10 10 251 202 049° 
6 3 5 489 092 397° 
6 6 5 532 047 485° 
6 6 10 361 177 184* 
6 10 5 517 053 464° 
6 10 10 356 171 185° 
10 3 5 451 183 268 
10 6 5 707 073 634° 
10 6 10 580 098 482° 
10 10 5 732 051 681° 


Note. Data from Olson (1973). The decimal point preceding each digit triplet has been omitted ; for example, 


080 denotes .080. 4 
* The standard error of such differences is approximately .007. М 
* W and V were almost equivalent because of the very large sample size. 


© Differences of .015 or more. 


The reader can verify from the data in Table 1 nine is not quite real: Smith, Gnanadesikan, 
(and in Stevens’s Table 1) that as d increased and Hughes, 1962, arbitrarily divided their 
from 1 (at which point the actual error rate data into the four groups “í just for illustrative 
equals the nominal level) to 4 to 9 to 36, the purposes" Гр. 28]) Of course, with real 
actual error rate increased regularly, with a (sample) data, one has no way of being sure 
tendency to increase proportionately more in what the underlying populations looked like, 
the range from 1 to 9 than in the range from but even in Stevens's intended counter- 


9 to 36. Thus one can get a fairly good idea of examplés, there is evidence of the type of 


error rates and differences between error rates heteroscedasticity in question: Wright's (1975) 
at intermediate values of d by interpolation first group was more variable on all nine sub- 
between d = 1 and d= 36. Any reasonable tests than the other three groups, which had 


i ion i i 1 variances.! This example 
interpolati Table 1 shows that d does not approximately equal mr 
оноо c Tuus as 36 for V to be suggests I iam icd RD aus 
7 i moderate with d ~ 2.2. In Meichen aum's 
E. pes Dub iri from the three subscales of the consequences 
^ us turi now to the second argument: Jsit test, variances were largest for the self-instruc- 
true that the dI pattem of heteroscedasticity, tional group but unequal for the other two 


in which one population is more variable than 

the others in all dimensions, occurs very in- ——— 

frequently in practice? To support an affirma- 1 Note that the variance on the spelling subtest 5 
tive answer, Stevens summarized real data girls A traditional e wie е терот 
from nine investigations. (Actually, one of the · in Stevens s article as 2.25 rather 82. 
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groups, 


by Stevens from the same 
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as noted by Stevens. Data not reported 


experiment revealed 


that the self-instructional group was more 


variable 


on all three subscales of the unusual 


uses test than the other two groups, which 
were about equally variable, suggesting non- 
concentrated heteroscedasticity with d = 1,9, 


The values of 


d are small, 


but these ex- 


amples illustrate that the 41 pattern of hetero- 
scedasticity is entirely realistic, 


Finally, 


What recommendation can be made to 


noncentrality is probably common in behavioral 


research, However, there is also 


moderately 


1966, pp 


Klett, 1972, pp. 285-292), and 


concentrated noncentralit 


evidence that 


use noncentrality js by no 
means uncommon in practice 


(e.g., Bock, 1975, 
- 258-265; Overall & 


it is certainly not appropriate to 


y, must always choose 


V, a choice compatible with my original recom. 
mendation (Olson, 1976, p. 585). i 
In summary, the V test is Sometimes more | 
Powerful than W or T and sometimes еә 
Powerful, but it is consistently more robust, 
sometimes by a substantial margin, and all of 
these situations are realistic possibilities in 
practice. It must be noted that the conclusion 
to be drawn from these facts depends on one’s | 
relative distaste for Type I and Type IT} 
errors, and my own preference is to ensure that 
the Type I error rate remains close to the) 
nominal level. Therefore, the V test is recom- | 
mended for routine use, 
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This index updates a previous compilation by Thomas Andrews and 
Frances Kerr, which was published in 1967 in the Psychological Bulletin (68, 
178-212), covering the years 1940-1966. The Psychological Bulletin has con- 
tinued to be the major source of literature reviews and summaries on topics 
of interest to psychologists, and the present index was designed to facilitate 
access to that material. Like the earlier index, the current one includes all 
articles that were wholly or largely reviews of a specialized area of literature. 
Published comments on those review articles have also been included. Those 
articles that were judged not to qualify as literature reviews were excluded 
from this index. The author index is completely cross-referenced. Articles are 
designated by author, volume number, and page number(s). In the subject 
index, the number of items in the reference section of each article is also 
provided. The letter P following à subject listing indicates that the partic- 
ular subject is only part of the review article. In preparing the subject index 
the articles were cross-referenced on a large number of headings in an at- 
tempt to make topical retrieval by the researcher as fruitful as possible. 
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